Learn about BlueXP classification

01/07/2025 Contributors

BlueXP classification (Cloud Data Sense) is a data governance service for BlueXP that scans your corporate on-premises and cloud data sources to map and classify data, and to identify private information. This can help reduce your security and compliance risk, decrease storage costs, and assist with your data migration projects.

IMPORTANT

Starting in May 2024 with version 1.31, BlueXP classification is now available as a core capability within BlueXP at no additional charge. No Classification license or subscription is required. We have also focused BlueXP classification functionality on NetApp storage systems, so some unused, or underused, features have been deprecated.

See a list of deprecated features.

Users who have been using legacy versions 1.30 or earlier will continue to be able to use that version until their subscription expires.

Features

BlueXP classification uses artificial intelligence (AI), natural language processing (NLP), and machine learning (ML) to understand the content that it scans in order to extract entities and categorize the content accordingly. This allows BlueXP classification to provide the following areas of functionality.

Learn more about the use cases for BlueXP classification.

Maintain compliance

BlueXP classification provides several tools that can help with your compliance efforts. You can use BlueXP classification to:

Identify Personal Identifiable Information (PII).
Identify a wide scope of sensitive personal information as required by GDPR, CCPA, PCI, and HIPAA privacy regulations.
Respond to Data Subject Access Requests (DSAR) based on name or email address.

Strengthen security

BlueXP classification can identify data that is potentially at risk for being accessed for criminal purposes. You can use BlueXP classification to:

Identify all the files and directories (shares and folders) with open permissions that are exposed to your entire organization or to the public.
Identify sensitive data that resides outside of the initial, dedicated location.
Comply with data retention policies.
Use Policies to automatically detect new security issues so security staff can take action immediately.

Optimize storage usage

BlueXP classification provides tools that can help with your storage total cost of ownership (TCO). You can use BlueXP classification to:

Increase storage efficiency by identifying duplicate or non-business-related data.
Save storage costs by identifying inactive data that you can tier to less expensive object storage. Learn more about tiering from Cloud Volumes ONTAP systems. Learn more about tiering from on-premises ONTAP systems.

Supported working environments and data sources

BlueXP classification can scan and analyze structured and unstructured data from the following types of working environments and data sources:

Working environments

Amazon FSx for ONTAP
Azure NetApp Files
Cloud Volumes ONTAP (deployed in AWS, Azure, or GCP)
On-premises ONTAP clusters
StorageGRID

Data sources

NetApp file shares
Databases:
- Amazon Relational Database Service (Amazon RDS)
- MongoDB
- MySQL
- Oracle
- PostgreSQL
- SAP HANA
- SQL Server (MSSQL)

BlueXP classification supports NFS versions 3.x, 4.0, and 4.1, and CIFS versions 1.x, 2.0, 2.1, and 3.0.

Cost

BlueXP classification is now free to use. No Classification license or paid subscription is required.

Infrastructure costs

Installing BlueXP classification in the cloud requires deploying a cloud instance, which results in charges from the cloud provider where it is deployed. See the type of instance that is deployed for each cloud provider. There is no cost if you install BlueXP classification on an on-premises system.
BlueXP classification requires that you have deployed a BlueXP Connector. In many cases you already have a Connector because of other storage and services you are using in BlueXP. The Connector instance results in charges from the cloud provider where it is deployed. See the type of instance that is deployed for each cloud provider. There is no cost if you install the Connector on an on-premises system.

Data transfer costs

Data transfer costs depend on your setup. If the BlueXP classification instance and data source are in the same Availability Zone and region, then there are no data transfer costs. But if the data source, such as a Cloud Volumes ONTAP system, is in a different Availability Zone or region, then you'll be charged by your cloud provider for data transfer costs. See these links for more details:

The BlueXP classification instance

When you deploy BlueXP classification in the cloud, BlueXP deploys the instance in the same subnet as the Connector. Learn more about Connectors.

A diagram that shows a BlueXP instance and a BlueXP classification instance running in your cloud provider.

Note the following about the default instance:

In AWS, BlueXP classification runs on an m6i.4xlarge instance with a 500 GiB GP2 disk. The operating system image is Amazon Linux 2. When deployed in AWS, you can choose a smaller instance size if you are scanning a small amount of data.
In Azure, BlueXP classification runs on a Standard_D16s_v3 VM with a 500 GiB disk. The operating system image is Ubuntu 22.04.
In GCP, BlueXP classification runs on an n2-standard-16 VM with a 500 GiB Standard persistent disk. The operating system image is Ubuntu 22.04.
In regions where the default instance isn't available, BlueXP classification runs on an alternate instance. See the alternate instance types.
The instance is named CloudCompliance with a generated hash (UUID) concatenated to it. For example: CloudCompliance-16bb6564-38ad-4080-9a92-36f5fd2f71c7
Only one BlueXP classification instance is deployed per Connector.

You can also deploy BlueXP classification on a Linux host on your premises or on a host in your preferred cloud provider. The software functions exactly the same way regardless of which installation method you choose. Upgrades of BlueXP classification software is automated as long as the instance has internet access.

The instance should remain running at all times because BlueXP classification continuously scans the data.

Deploy on different instance types

You can deploy BlueXP classification on a system with fewer CPUs and less RAM.

System size	Specs	Limitations
Extra Large	32 CPUs, 128 GB RAM, 1 TiB SSD	Can scan up to 500 million files.
Large (default)	16 CPUs, 64 GB RAM, 500 GiB SSD	Can scan up to 250 million files.

System size

Specs

Limitations

Extra Large

32 CPUs, 128 GB RAM, 1 TiB SSD

Can scan up to 500 million files.

Large (default)

16 CPUs, 64 GB RAM, 500 GiB SSD

Can scan up to 250 million files.

When deploying BlueXP classification in Azure or GCP, email ng-contact-data-sense@netapp.com for assistance if you want to use a smaller instance type.

How BlueXP classification works

At a high-level, BlueXP classification works like this:

You deploy an instance of BlueXP classification in BlueXP.
You enable high-level mapping or deep-level scanning on one or more data sources.
BlueXP classification scans the data using an AI learning process.
You use the provided dashboards and reporting tools to help in your compliance and governance efforts.

How scans work

After you enable BlueXP classification and select the repositories that you want to scan (these are the volumes, database schemas, or other user data), it immediately starts scanning the data to identify personal and sensitive data. You should focus on scanning live production data in most cases instead of backups, mirrors, or DR sites. Then BlueXP classification maps your organizational data, categorizes each file, and identifies and extracts entities and predefined patterns in the data. The result of the scan is an index of personal information, sensitive personal information, data categories, and file types.

BlueXP classification connects to the data like any other client by mounting NFS and CIFS volumes. NFS volumes are automatically accessed as read-only, while you need to provide Active Directory credentials to scan CIFS volumes.

A diagram that shows a BlueXP instance and a BlueXP classification instance running in your cloud provider. The BlueXP classification instance connects to NFS and CIFS volumes and databases to scan them.

After the initial scan, BlueXP classification continuously scans your data in a round-robin fashion to detect incremental changes (this is why it's important to keep the instance running).

You can enable and disable scans at the volume level or at the database schema level.

What's the difference between Mapping and Classification scans

BlueXP classification enables you to run a general "mapping" scan on selected data sources. Mapping provides only a high-level overview of your data, whereas Classification provides deep-level scanning of your data. Mapping can be done on your data sources very quickly because it does not access files to see the data inside.

Many users like this functionality because they want to quickly scan their data to identify the data sources that require more research - and then they can enable classification scans only on those required data sources or volumes.

The table below shows some of the differences:

Feature	Classification	Mapping
Scan speed	Slow	Fast
Pricing	Free	Free
Capacity	Limited to 500 TB	Limited to 500 TB
List of file types and used capacity	Yes	Yes
Number of files and used capacity	Yes	Yes
Age and size of files	Yes	Yes
Ability to run a Data Mapping Report	Yes	Yes
Data Investigation page to view file details	Yes	No
Search for names within files	Yes	No
Create policies that provide custom search results	Yes	No
Ability to run other reports	Yes	No
Ability to see metadata from files*	No	Yes

Feature

Classification

Mapping

Scan speed

Slow

Fast

Pricing

Free

Capacity

Limited to 500 TB

List of file types and used capacity

Yes

Number of files and used capacity

Yes

Age and size of files

Yes

Ability to run a Data Mapping Report

Yes

Data Investigation page to view file details

Yes

Search for names within files

Yes

Create policies that provide custom search results

Yes

Ability to run other reports

Yes

Ability to see metadata from files*

Yes

*The following metadata is extracted from files during mapping scans:

Working environment
Working environment type
Storage repository
File type
Used capacity
Number of files
File size
File creation
File last access
File last modified
File discovered time
Permissions extraction

Governance dashboard differences:

Feature	Map & Classify	Map
Stale data	Yes	Yes
Non-business data	Yes	Yes
Duplicated files	Yes	Yes
Predefined policies	Yes	No
Custom policies	Yes	Yes
DDA report	Yes	Yes
Mapping report	Yes	Yes
Sensitivity level detection	Yes	No
Sensitive data with wide permissions	Yes	No
Open permissions	Yes	Yes
Age of data	Yes	Yes
Size of data	Yes	Yes
Categories	Yes	No
File types	Yes	Yes

Feature

Map & Classify

Map

Stale data

Yes

Non-business data

Yes

Duplicated files

Yes

Predefined policies

Yes

Custom policies

Yes

DDA report

Yes

Mapping report

Yes

Sensitivity level detection

Yes

Sensitive data with wide permissions

Yes

Open permissions

Yes

Age of data

Yes

Size of data

Yes

Feature	Map & Classify	Map
Personal information	Yes	No
Sensitive personal information	Yes	No
Privacy risk assessment report	Yes	No
HIPAA report	Yes	No
PCI DSS report	Yes	No

Feature	Map & Classify	Map
Policies	Yes	Yes
Working environment type	Yes	Yes
Working environment	Yes	Yes
Storage repository	Yes	Yes
File type	Yes	Yes
File size	Yes	Yes
Created time	Yes	Yes
Discovered time	Yes	Yes
Last modified	Yes	Yes
Last access	Yes	Yes
Open permissions	Yes	Yes
File directory path	Yes	Yes
Category	Yes	No
Sensitivity level	Yes	No
Number of identifiers	Yes	No
Personal data	Yes	No
Sensitive personal data	Yes	No
Data subject	Yes	No
Duplicates	Yes	Yes
Classification status	Yes	Status is always "Limited insights"
Scan analysis event	Yes	Yes
File hash	Yes	Yes
Number of users with access	Yes	Yes
User/group permissions	Yes	Yes
File owner	Yes	Yes
Directory type	Yes	Yes

How quickly does BlueXP classification scan data

The scan speed is affected by network latency, disk latency, network bandwidth, environment size, and file distribution sizes.

When performing Mapping scans, BlueXP classification can scan between 100-150 TiBs of data per day.
When performing Classification scans, BlueXP classification can scan between 15-40 TiBs of data per day.

Information that BlueXP classification categorizes

BlueXP classification collects, indexes, and assigns categories to your data (files). The data that BlueXP classification indexes includes the following:

Standard metadata about files: the file type, its size, creation and modification dates, and so on.
Personal data: Personally identifiable information (Pii) such as email addresses, identification numbers, or credit card numbers. Learn more about personal data.
Sensitive personal data: Special types of sensitive personal information (SPii), such as health data, ethnic origin, or political opinions, as defined by GDPR and other privacy regulations. Learn more about sensitive personal data.
Categories: BlueXP classification takes the data that it scanned and divides it into different types of categories. Categories are topics based on AI analysis of the content and metadata of each file. Learn more about categories.
Types: BlueXP classification takes the data that it scanned and breaks it down by file type. Learn more about types.
Name entity recognition: BlueXP classification uses AI to extract people's natural names from documents. Learn about responding to Data Subject Access Requests.

Networking overview

BlueXP deploys the BlueXP classification instance with a security group that enables inbound HTTP connections from the Connector instance.

When using BlueXP in SaaS mode, the connection to BlueXP is served over HTTPS, and the private data sent between your browser and the BlueXP classification instance are secured with end-to-end encryption using TLS 1.2, which means NetApp and third parties can't read it.

Outbound rules are completely open. Internet access is needed to install and upgrade the BlueXP classification software and to send usage metrics.

If you have strict networking requirements, learn about the endpoints that BlueXP classification contacts.

User roles in BlueXP classification

The role each user has been assigned provides different capabilities within BlueXP and within BlueXP classification. For details, refer to the following:

BlueXP IAM roles (when using BlueXP in standard mode)
BlueXP account roles (when using BlueXP in restricted mode or private mode)

Learn about BlueXP classification

Creating your file...

Features

Supported working environments and data sources

Cost

Infrastructure costs

Data transfer costs

The BlueXP classification instance

How BlueXP classification works

How scans work

What's the difference between Mapping and Classification scans

How quickly does BlueXP classification scan data

Information that BlueXP classification categorizes

Networking overview

User roles in BlueXP classification