Learn about BlueXP classification
BlueXP classification (Cloud Data Sense) is a data governance service for BlueXP that scans your corporate on-premises and cloud data sources to map and classify data, and to identify private information. This can help reduce your security and compliance risk, decrease storage costs, and assist with your data migration projects.
|
Beginning with version 1.31, BlueXP classification is available as a core capability with BlueXP. There's no additional charge. No Classification license or subscription is required. If you've been using legacy version 1.30 or earlier, that version is available until your subscription expires. See a list of deprecated features. |
Features
BlueXP classification uses artificial intelligence (AI), natural language processing (NLP), and machine learning (ML) to understand the content that it scans in order to extract entities and categorize the content accordingly. This allows BlueXP classification to provide the following areas of functionality.
BlueXP classification provides several tools that can help with your compliance efforts. You can use BlueXP classification to:
-
Identify Personal Identifiable Information (PII).
-
Identify a wide scope of sensitive personal information as required by GDPR, CCPA, PCI, and HIPAA privacy regulations.
-
Respond to Data Subject Access Requests (DSAR) based on name or email address.
BlueXP classification can identify data that is potentially at risk for being accessed for criminal purposes. You can use BlueXP classification to:
-
Identify all the files and directories (shares and folders) with open permissions that are exposed to your entire organization or to the public.
-
Identify sensitive data that resides outside of the initial, dedicated location.
-
Comply with data retention policies.
-
Use Policies to automatically detect new security issues so security staff can take action immediately.
BlueXP classification provides tools that can help with your storage total cost of ownership (TCO). You can use BlueXP classification to:
-
Increase storage efficiency by identifying duplicate or non-business-related data.
-
Save storage costs by identifying inactive data that you can tier to less expensive object storage. Learn more about tiering from Cloud Volumes ONTAP systems. Learn more about tiering from on-premises ONTAP systems.
Supported working environments and data sources
BlueXP classification can scan and analyze structured and unstructured data from the following types of working environments and data sources:
Working environments
-
Amazon FSx for ONTAP
-
Azure NetApp Files
-
Cloud Volumes ONTAP (deployed in AWS, Azure, or GCP)
-
On-premises ONTAP clusters
-
StorageGRID
Data sources
-
NetApp file shares
-
Databases:
-
Amazon Relational Database Service (Amazon RDS)
-
MongoDB
-
MySQL
-
Oracle
-
PostgreSQL
-
SAP HANA
-
SQL Server (MSSQL)
-
BlueXP classification supports NFS versions 3.x, 4.0, and 4.1, and CIFS versions 1.x, 2.0, 2.1, and 3.0.
Cost
BlueXP classification is free to use. No Classification license or paid subscription is required.
Infrastructure costs
-
Installing BlueXP classification in the cloud requires deploying a cloud instance, which results in charges from the cloud provider where it is deployed. See the type of instance that is deployed for each cloud provider. There is no cost if you install BlueXP classification on an on-premises system.
-
BlueXP classification requires that you have deployed a BlueXP Connector. In many cases you already have a Connector because of other storage and services you are using in BlueXP. The Connector instance results in charges from the cloud provider where it is deployed. See the type of instance that is deployed for each cloud provider. There is no cost if you install the Connector on an on-premises system.
Data transfer costs
Data transfer costs depend on your setup. If the BlueXP classification instance and data source are in the same Availability Zone and region, then there are no data transfer costs. But if the data source, such as a Cloud Volumes ONTAP system, is in a different Availability Zone or region, then you'll be charged by your cloud provider for data transfer costs. See these links for more details:
The BlueXP classification instance
When you deploy BlueXP classification in the cloud, BlueXP deploys the instance in the same subnet as the Connector. Learn more about Connectors.
Note the following about the default instance:
-
In AWS, BlueXP classification runs on an m6i.4xlarge instance with a 500 GiB GP2 disk. The operating system image is Amazon Linux 2. When deployed in AWS, you can choose a smaller instance size if you are scanning a small amount of data.
-
In Azure, BlueXP classification runs on a Standard_D16s_v3 VM with a 500 GiB disk. The operating system image is Ubuntu 22.04.
-
In GCP, BlueXP classification runs on an n2-standard-16 VM with a 500 GiB Standard persistent disk. The operating system image is Ubuntu 22.04.
-
In regions where the default instance isn't available, BlueXP classification runs on an alternate instance. See the alternate instance types.
-
The instance is named CloudCompliance with a generated hash (UUID) concatenated to it. For example: CloudCompliance-16bb6564-38ad-4080-9a92-36f5fd2f71c7
-
Only one BlueXP classification instance is deployed per Connector.
You can also deploy BlueXP classification on a Linux host on your premises or on a host in your preferred cloud provider. The software functions exactly the same way regardless of which installation method you choose. Upgrades of BlueXP classification software are automated as long as the instance has internet access.
|
The instance should remain running at all times because BlueXP classification continuously scans the data. |
Deploy on different instance types
Review the following specifications for instance types:
System size | Specs | Limitations |
---|---|---|
Extra Large |
32 CPUs, 128 GB RAM, 1 TiB SSD |
Can scan up to 500 million files. |
Large (default) |
16 CPUs, 64 GB RAM, 500 GiB SSD |
Can scan up to 250 million files. |
When deploying BlueXP classification in Azure or GCP, email ng-contact-data-sense@netapp.com for assistance if you want to use a smaller instance type.
How BlueXP classification scanning works
At a high-level, BlueXP classification scanning works like this:
-
You deploy an instance of BlueXP classification in BlueXP.
-
You enable high-level mapping (called Mapping only scans) or deep-level scanning (called Map & Classify scans) on one or more data sources.
-
BlueXP classification scans the data using an AI learning process.
-
You use the provided dashboards and reporting tools to help in your compliance and governance efforts.
After you enable BlueXP classification and select the repositories that you want to scan (these are the volumes, database schemas, or other user data), it immediately starts scanning the data to identify personal and sensitive data. You should focus on scanning live production data in most cases instead of backups, mirrors, or DR sites. Then BlueXP classification maps your organizational data, categorizes each file, and identifies and extracts entities and predefined patterns in the data. The result of the scan is an index of personal information, sensitive personal information, data categories, and file types.
BlueXP classification connects to the data like any other client by mounting NFS and CIFS volumes. NFS volumes are automatically accessed as read-only, while you need to provide Active Directory credentials to scan CIFS volumes.
After the initial scan, BlueXP classification continuously scans your data in a round-robin fashion to detect incremental changes. This is why it's important to keep the instance running.
You can enable and disable scans at the volume level or the database schema level.
|
BlueXP classification does not impose a limit on the amount of data it can scan. Each Connector supports scanning and displaying 500 TiB of data. To scan more than 500 TiB of data, install another Connector then deploy another instance of BlueXP classification. The BlueXP UI displays data from a single Connector. For tips on viewing data from multiple Connectors, see Work with multiple Connectors. |
What's the difference between Mapping and Classification scans
You can conduct two types of scans in BlueXP classification:
-
Mapping-only scans provide only a high-level overview of your data and are performed on selected data sources. Mapping-only scans take less time than map and classify scans because they do not access files to see the data inside. You might want to do this initially to identify areas of research and then perform a Map & Classify scan on those areas.
-
Map & Classify scans provide deep-level scanning of your data.
For details about the differences between Mapping and Classification scans, see What's the difference between Mapping and Classification scans?.
Information that BlueXP classification categorizes
BlueXP classification collects, indexes, and assigns categories to the following data:
-
Standard metadata about files: the file type, its size, creation and modification dates, and so on.
-
Personal data: Personally identifiable information (PII) such as email addresses, identification numbers, or credit card numbers, which BlueXP classification identifies using specific words, strings, and patterns in the files. Learn more about personal data.
-
Sensitive personal data: Special types of sensitive personal information (SPII), such as health data, ethnic origin, or political opinions, as defined by General Data Protection Regulation (GDPR) and other privacy regulations. Learn more about sensitive personal data.
-
Categories: BlueXP classification takes the data that it scanned and divides it into different types of categories. Categories are topics based on AI analysis of the content and metadata of each file. Learn more about categories.
-
Types: BlueXP classification takes the data that it scanned and breaks it down by file type. Learn more about types.
-
Name entity recognition: BlueXP classification uses AI to extract people's natural names from documents. Learn about responding to Data Subject Access Requests.
Networking overview
BlueXP classification deploys a single server, or cluster, wherever you choose — in the cloud or on premises. The servers connect via standard protocols to the data sources and index the findings in an Elasticsearch cluster, which is also deployed on the same servers. This enables support for multi-cloud, cross-cloud, private cloud, and on-premises environments.
BlueXP deploys the BlueXP classification instance with a security group that enables inbound HTTP connections from the Connector instance.
When you use BlueXP in SaaS mode, the connection to BlueXP is served over HTTPS, and the private data sent between your browser and the BlueXP classification instance are secured with end-to-end encryption using TLS 1.2, which means NetApp and third parties can't read it.
Outbound rules are completely open. Internet access is needed to install and upgrade the BlueXP classification software and to send usage metrics.
If you have strict networking requirements, learn about the endpoints that BlueXP classification contacts.