Frequently asked questions about BlueXP classification
This FAQ can help if you're just looking for a quick answer to a question.
BlueXP classification service
The following questions provide a general understanding of BlueXP classification.
What is BlueXP classification?
BlueXP classification is a cloud offering that uses Artificial Intelligence (AI) driven technology to help you understand data context and identify sensitive data across your storage systems. The systems can be working environments that you've added to the BlueXP Canvas and many types of data sources that BlueXP classification can access over your networks. See the full list below.
BlueXP classification provides pre-defined parameters (such as sensitive information types and categories) to address new data compliance regulations for data privacy and sensitivity, such as GDPR, CCPA, HIPAA, and more.
How does BlueXP classification work?
BlueXP classification deploys another layer of Artificial Intelligence alongside your BlueXP system and storage systems. It then scans the data on volumes, buckets, databases, and other storage accounts and indexes the data insights that are found. BlueXP classification leverages both artificial intelligence and natural language processing, as opposed to alternative solutions that are commonly built around regular expressions and pattern matching.
BlueXP classification uses AI to provide contextual understanding of data for accurate detection and classification. It is driven by AI because it is designed for modern data types and scale. It also understands data context in order to provide strong, accurate, discovery and classification.
What about the architecture of BlueXP classification?
BlueXP classification deploys a single server, or cluster, wherever you choose — in the cloud or on premises. The servers connect via standard protocols to the data sources and index the findings in an Elasticsearch cluster, which is also deployed on the same servers. This allows support for multi-cloud, cross-cloud, private cloud, and on-premises environments.
Which cloud providers are supported?
BlueXP classification operates as part of BlueXP and supports AWS, Azure, and GCP. This provides your organization with unified privacy visibility across different cloud providers.
Does BlueXP classification have a REST API, and does it work with third-party tools?
No, BlueXP classification does not have a REST API.
Is BlueXP classification available through the marketplaces?
Yes, BlueXP and BlueXP classification are available from the AWS, Azure, and GCP marketplaces.
BlueXP classification scanning and analytics
The following questions relate to BlueXP classification scanning performance and the analytics available to users.
How often does BlueXP classification scan my data?
While the initial scan of your data might take a little bit of time, subsequent scans only inspect the incremental changes, which reduces system scan times. BlueXP classification scans your data continuously in a round-robin fashion, six repositories at a time, so that all changed data is classified very quickly.
Note that BlueXP classification scans databases only once per day - databases are not continuously scanned like other data sources.
Data scans have a negligible impact on your storage systems and on your data. However, if you are concerned with even a very small impact, you can configure BlueXP classification to perform "slow" scans. See how to reduce the scan speed.
Can I search my data using BlueXP classification?
BlueXP classification offers extensive search capabilities that make it easy to search for a specific file or piece of data across all connected sources. BlueXP classification empowers users to search deeper than just what the metadata reflects. It is a language-agnostic service that can also read the files and analyze a multitude of sensitive data types, such as names and IDs. For example, users can search across both structured and unstructured data stores to find data that may have leaked from databases to user files, in violation of corporate policy. Searches can be saved for later, and policies can be created to search and take action on the results at a set frequency.
Once the files of interest are found, characteristics can be listed, including tags, working environment account, bucket, file path, category (from classification), file size, last modified, permission status, duplicates, sensitivity level, personal data, sensitive data types within the file, owner, file type, file size, created time, file hash, whether the data was assigned to someone seeking their attention, and more. Filters can be applied to screen out characteristics that are not pertinent. BlueXP classification also has RBAC controls to allow files to be moved or deleted, if the right permissions are present. If the right permissions are not present, the tasks can be assigned to someone in the organization who does have the right permissions.
Does BlueXP classification offer reports?
Yes. The information offered by BlueXP classification can be relevant to other stakeholders in your organizations, so we enable you to generate reports to share the insights. The following reports are available for BlueXP classification:
- Privacy Risk Assessment report
-
Provides privacy insights from your data and a privacy risk score. Learn more.
- Data Subject Access Request report
-
Enables you to extract a report of all files that contain information regarding a data subject's specific name or personal identifier. Learn more.
- PCI DSS report
-
Helps you identify the distribution of credit card information across your files. Learn more.
- HIPAA report
-
Helps you identify the distribution of health information across your files. Learn more.
- Data Mapping report
-
Provides information about the size and number of files in your working environments. This includes usage capacity, age of data, size of data, and file types. Learn more.
- Data Discovery Assessment report
-
Provides a high-level analysis of the scanned environment to highlight the system's findings and to show areas of concern and potential remediation steps. Learn mode.
- Reports on a specific information type
-
Reports are available that include details about the identified files that contain personal data and sensitive personal data. You can also see files broken down by category and file type. Learn more.
Does scan performance vary?
Scan performance can vary based on the network bandwidth and the average file size in your environment. It can also depend on the size characteristics of the host system (either in the cloud or on-premises). See The BlueXP classification instance and Deploying BlueXP classification for more information.
When initially adding new data sources you can also choose to only perform a "mapping" scan instead of a full "classification" scan. Mapping can be done on your data sources very quickly because it does not access files to see the data inside. See the difference between a mapping and classification scan.
BlueXP classification management and privacy
The following questions provide information on how to manage BlueXP classification and privacy settings.
How do I enable BlueXP classification?
First you need to deploy an instance of BlueXP classification in BlueXP, or on an on-premises system. Once the instance is running, you can enable the service on existing working environments, databases, and other data sources from the Configuration tab or by selecting a specific working environment.
Activating BlueXP classification on a data source results in an immediate initial scan. Scan results display shortly after. |
How do I disable BlueXP classification?
You can disable BlueXP classification from scanning an individual working environment, database, or file share group from the BlueXP classification Configuration page.
To completely remove the BlueXP classification instance, you can manually remove the BlueXP classification instance from your cloud provider's portal or on-prem location. |
Can I customize the service to my organization's needs?
BlueXP classification provides insights to your data. These insights can be extracted and used for your organization's needs.
Additionally, BlueXP classification provides many ways for you to add a custom list of "personal data" that BlueXP classification will identify in scans, giving you the full picture about where potentially sensitive data resides in all your organizations' files.
-
You can add unique identifiers based on specific columns in databases you are scanning — we call this Data Fusion.
-
You can add custom keywords from a text file.
-
You can add custom patterns using a regular expression (regex).
Can I instruct the service to exclude scanning data in certain directories?
Yes. If you want BlueXP classification to exclude scanning data that resides in certain data source directories, you can provide that list to the classification engine. After you apply that change, BlueXP classification will exclude scanning data in the specified directories.
Are snapshots that reside on ONTAP volumes scanned?
No. BlueXP classification does not scan snapshots because the content is identical to the content in the volume.
What happens if data tiering is enabled on your ONTAP volumes?
When BlueXP classification scans volumes that have cold data tiered to object storage, it scans all of the data—data that's on local disks and cold data tiered to object storage. This is also true for non-NetApp products that implement tiering.
The scan doesn't heat up the cold data—it stays cold and remains in object storage.
Types of source systems and data types
The following questions relate to the types of storage that can be scanned, and the types of data that is scanned.
What sources of data can be scanned with BlueXP classification?
BlueXP classification can scan data from working environments that you've added to the BlueXP Canvas and from many types of structured and unstructured data sources that BlueXP classification can access over your networks.
Are there any restrictions when deployed in a Government region?
BlueXP classification is supported when the Connector is deployed in a Government region (AWS GovCloud, Azure Gov, or Azure DoD) - also known as "Restricted mode". When deployed in this manner, BlueXP classification has the following restrictions:
NOTE This information is relevant only for BlueXP classification legacy versions 1.30 and earlier.
-
OneDrive accounts, SharePoint accounts, and Google Drive accounts can't be scanned.
-
Microsoft Azure Information Protection (AIP) label functionality can't be integrated.
What data sources can I scan if I install BlueXP classification in a site without internet access?
BlueXP classification can only scan data from data sources that are local to the on-premises site. At this time, BlueXP classification can scan the following local data sources in "Private mode" - also known as a "dark" site:
-
On-premises ONTAP systems
-
Database schemas
-
Object Storage that uses the Simple Storage Service (S3) protocol
Which file types are supported?
BlueXP classification scans all files for category and metadata insights, and displays all file types in the file types section of the dashboard.
When BlueXP classification detects Personal Identifiable Information (PII), or when it performs a DSAR search, only the following file formats are supported:
.CSV, .DCM, .DICOM, .DOC, .DOCX, .JSON, .PDF, .PPTX, .RTF, .TXT, .XLS, .XLSX, Docs, Sheets, and Slides
What kinds of data and metadata does BlueXP classification capture?
BlueXP classification enables you to run a general "mapping" scan or a full "classification" scan on your data sources. Mapping provides only a high-level overview of your data, whereas Classification provides deep-level scanning of your data. Mapping can be done on your data sources very quickly because it does not access files to see the data inside.
-
Data mapping scan: BlueXP classification scans the metadata only. This is useful for overall data management and governance, quick project scoping, very large estates, and prioritization. Data mapping is based on metadata and is considered a fast scan.
After a fast scan, you can generate a Data Mapping Report. This report is an overview of the data stored in your corporate data sources to assist you with decisions about resource utilization, migration, backup, security, and compliance processes.
-
Data classification (deep) scan: BlueXP classification scans using standard protocols and read-only permission throughout your environments. Select files are opened and scanned for sensitive business-related data, private information, and issues related to ransomware.
After a full scan there are many additional BlueXP classification features you can apply to your data, such as view and refine data in the Data Investigation page, search for names within files, copy, move, and delete source files, and more.
BlueXP classification captures metadata such as: file name, permissions, creation time, last access, and last modification. This includes all of the metadata that appears in the Data Investigation Details page and in Data Investigation Reports.
BlueXP classification can identify many types of private data such as personal information (Pii) and sensitive personal information (SPii). For details about private data, refer to Categories of private data that BlueXP classification scans.
Can I limit BlueXP classification information to specific users?
Yes, BlueXP classification is fully integrated with BlueXP. BlueXP users can only see information for the working environments they are eligible to view according to their permissions.
Additionally, if you want to allow certain users to just view BlueXP classification scan results without having the ability to manage BlueXP classification settings, you can assign those users the Classification viewer role (when using BlueXP in standard mode) or the Compliance Viewer role (when using BlueXP in restricted mode).
Can anyone access the private data sent between my browser and BlueXP classification?
No. The private data sent between your browser and the BlueXP classification instance are secured with end-to-end encryption using TLS 1.2, which means NetApp and non-NetApp parties can't read it. BlueXP classification won't share any data or results with NetApp unless you request and approve access.
The data that is scanned stays within your environment.
How is sensitive data handled?
NetApp does not have access to sensitive data and does not display it in the UI. Sensitive data is masked, for example, the last four numbers are displayed for credit card information.
Where is the data stored?
Scan results are stored in Elasticsearch within your BlueXP classification instance.
How is the data accessed?
BlueXP classification accesses data stored in Elasticsearch through API calls, which require authentication and are encrypted using AES-128. Accessing Elasticsearch directly requires root access.
Licenses and costs
The following question relates to licensing and costs to use BlueXP classification.
How much does BlueXP classification cost?
BlueXP classification is a BlueXP core capability and is not charged.
Connector deployment
The following questions relate to the BlueXP Connector.
What is the Connector?
The Connector is software running on a compute instance either within your cloud account, or on-premises, that enables BlueXP to securely manage cloud resources. You must deploy a Connector to use BlueXP classification.
Where does the Connector need to be installed?
-
When scanning data in Cloud Volumes ONTAP in AWS or Amazon FSx for ONTAP, you use a connector in AWS.
-
When scanning data in Cloud Volumes ONTAP in Azure or in Azure NetApp Files, you use a connector in Azure.
-
When scanning data in Cloud Volumes ONTAP in GCP, you use a Connector in GCP.
-
When scanning data in on-premises ONTAP systems, NetApp file shares, or databases, you can use a connector in any of these cloud locations.
So if you have data in many of these locations, you may need to use multiple Connectors.
Does BlueXP classification require access to credentials?
BlueXP classification itself doesn't retrieve storage credentials. Instead, they are stored within the BlueXP Connector.
BlueXP classification uses data plane credentials, for example, CIFS credentials to mount shares before scanning.
Can I deploy the Connector on my own host?
Yes. You can deploy the Connector on-premises on a Linux host in your network or on a host in the cloud. If you're planning to deploy BlueXP classification on-premises, then you may want to install the Connector on-premises as well; but it's not required.
Does communication between the service and the Connector use HTTP?
Yes, BlueXP classification communicates with the BlueXP Connector using HTTP.
What about secure sites without internet access?
Yes, that's also supported. You can deploy the Connector on an on-premises Linux host that doesn't have internet access. This is also known as "Private mode". Then you can discover on-premises ONTAP clusters and other local data sources and scan the data using BlueXP classification.
BlueXP classification deployment
The following questions relate to the separate BlueXP classification instance.
What deployment models does BlueXP classification support?
BlueXP allows the user to scan and report on systems virtually anywhere, including on-premises, cloud, and hybrid environments. BlueXP classification is normally deployed using a SaaS model, in which the service is enabled via the BlueXP interface and requires no hardware or software installation. Even in this click-and-run deployment mode, data management can be done regardless of whether the data stores are on premises or in the public cloud.
What type of instance or VM is required for BlueXP classification?
When deployed in the cloud:
-
In AWS, BlueXP classification runs on an m6i.4xlarge instance with a 500 GiB GP2 disk. You can select a smaller instance type during deployment.
-
In Azure, BlueXP classification runs on a Standard_D16s_v3 VM with a 500 GiB disk.
-
In GCP, BlueXP classification runs on an n2-standard-16 VM with a 500 GiB Standard persistent disk.
Can I deploy the BlueXP classification on my own host?
Yes. You can install BlueXP classification software on a Linux host that has internet access in your network or in the cloud. Everything works the same and you continue to manage your scan configuration and results through BlueXP. See Deploying BlueXP classification on premises for system requirements and installation details.
What about secure sites without internet access?
Yes, that's also supported. You can deploy BlueXP classification in an on-premises site that doesn't have internet access for completely secure sites.