Frequently asked questions about Cloud Data Sense

Contributors netapp-tonacki netapp-ivanad

This FAQ can help if you’re just looking for a quick answer to a question.

Cloud Data Sense service

The following questions provide a general understanding of Cloud Data Sense.

What is Cloud Data Sense?

Cloud Data Sense is a cloud offering that uses Artificial Intelligence (AI) driven technology to help you understand data context and identify sensitive data across your storage systems. The systems can be working environments that you’ve added to the BlueXP Canvas and many types of data sources that Data Sense can access over your networks. See the full list below.

Cloud Data Sense provides pre-defined parameters (such as sensitive information types and categories) to address new data compliance regulations for data privacy and sensitivity, such as GDPR, CCPA, HIPAA, and more.

How does Cloud Data Sense work?

Cloud Data Sense deploys another layer of Artificial Intelligence alongside your BlueXP system and storage systems. It then scans the data on volumes, buckets, databases, and other storage accounts and indexes the data insights that are found. Data Sense leverages both artificial intelligence and natural language processing, as opposed to alternative solutions that are commonly built around regular expressions and pattern matching. Cloud Data Sense uses AI to provide contextual understanding of data for accurate detection and classification. It is driven by AI because it is designed for modern data types and scale. It also understands data context in order to provide strong, accurate, discovery and classification.

What are the common use cases for Cloud Data Sense?

  • Identify Personal Identifiable Information (PII).

  • Easily locate and report on specific data in response to data subjects, as required by GDPR, CCPA, HIPAA, and other data privacy regulations.

  • Comply with new and upcoming data privacy regulations.

  • Comply with data compliance and privacy regulations.

  • Migrate data from legacy systems to the cloud.

  • Comply with data retention policies.

What about the architecture of Cloud Data Sense?

Cloud Data Sense deploys a single server, or cluster, wherever you choose — in the cloud or on premises. The servers connect via standard protocols to the data sources and index the findings in an Elasticsearch cluster, which is also deployed on the same servers. This allows support for multi-cloud, cross-cloud, private cloud, and on-premises environments.

Which cloud providers are supported?

Cloud Data Sense operates as part of BlueXP and supports AWS, Azure, and GCP. This provides your organization with unified privacy visibility across different cloud providers.

Does Cloud Data Sense have a REST API, and does it work with third-party tools?

BlueXP supports REST API capabilities for its services. If BlueXP isn’t the preferred point of management, services such as Cloud Data Sense can also be used via a REST API. Every user action has a REST API that can be integrated with third-party systems.

Is Cloud Data Sense available through the marketplaces?

Yes, BlueXP and Cloud Data Sense are available from the AWS, Azure, and GCP marketplaces.

Cloud Data Sense scanning and analytics

The following questions relate to Cloud Data sense scanning performance and the analytics available to users.

How often does Cloud Data Sense scan my data?

Data changes frequently, so Cloud Data Sense scans your data continuously with no impact to your data. While the initial scan of your data might take longer, subsequent scans only scan the incremental changes, which reduces system scan times.

Data scans have a negligible impact on your storage systems and on your data. However, if you are concerned with even a very small impact, you can configure Data Sense to perform "slow" scans. See how to reduce the scan speed.

Can I search my data using Cloud Data Sense?

Cloud Data Sense offers extensive search capabilities that make it easy to search for a specific file or piece of data across all connected sources. Data Sense empowers users to search deeper than just what the metadata reflects. It is a language-agnostic service that can also read the files and analyze a multitude of sensitive data types, such as names and IDs. For example, users can search across both structured and unstructured data stores to find data that may have leaked from databases to user files, in violation of corporate policy. Searches can be saved for later, and policies can be created to search and take action on the results at a set frequency.

Once the files of interest are found, characteristics can be listed, including tags, working environment account, bucket, file path, category (from classification), file size, last modified, permission status, duplicates, sensitivity level, personal data, sensitive data types within the file, owner, file type, file size, created time, file hash, whether the data was assigned to someone seeking their attention, and more. Filters can be applied to screen out characteristics that are not pertinent. Data Sense also has RBAC controls to allow files to be moved or deleted, if the right permissions are present. If the right permissions are not present, the tasks can be assigned to someone in the organization who does have the right permissions.

What kind of analytics does Cloud Data Sense provide?

Data sources can be represented visually, and relationships defined and depicted graphically. For example, admins can see all stale, duplicate, and non-business-related data across data sources throughout the enterprise (on-premises systems, databases, file shares, S3 stores, OneDrive, etc.). They can then copy, move, delete, and manage data to optimize storage costs and reduce risk. Users can reduce risk by seeing what sensitive data might be exposed, and they can create jobs to manage permissions for strong data protection. Data Sense also classifies all the different types of data, so admins can investigate data by type and see what actions have been taken on the data, and when.

Does Cloud Data Sense offer reports?

Yes. The information offered by Cloud Data Sense can be relevant to other stakeholders in your organizations, so we enable you to generate reports to share the insights. The following reports are available for Data Sense:

Privacy Risk Assessment report

Provides privacy insights from your data and a privacy risk score. Learn more.

Data Subject Access Request report

Enables you to extract a report of all files that contain information regarding a data subject’s specific name or personal identifier. Learn more.

PCI DSS report

Helps you identify the distribution of credit card information across your files. Learn more.

HIPAA report

Helps you identify the distribution of health information across your files. Learn more.

Data Mapping report

Provides information about the size and number of files in your working environments. This includes usage capacity, age of data, size of data, and file types. Learn more.

Reports on a specific information type

Reports are available that include details about the identified files that contain personal data and sensitive personal data. You can also see files broken down by category and file type. Learn more.

Does scan performance vary?

Scan performance can vary based on the network bandwidth and the average file size in your environment. It can also depend on the size characteristics of the host system (either in the cloud or on-premises). See The Cloud Data Sense instance and Deploying Cloud Data Sense for more information.

When initially adding new data sources you can also choose to only perform a "mapping" scan instead of a full "classification" scan. Mapping can be done on your data sources very quickly because it does not access files to see the data inside. See the difference between a mapping and classification scan.

Cloud Data Sense management and privacy

The following questions provide information on how to manage Cloud Data Sense and privacy settings.

How do I enable Cloud Data Sense?

First you need to deploy an instance of Cloud Data Sense in BlueXP, or on an on-premises system. Once the instance is running, you can enable the service on existing working environments, databases, and other data sources from the Data Sense tab or by selecting a specific working environment.

Note Activating Cloud Data Sense on a data source results in an immediate initial scan. Scan results display shortly after.

How do I disable Cloud Data Sense?

You can disable Cloud Data Sense from scanning an individual working environment, database, file share group, OneDrive account, or SharePoint account from the Data Sense Configuration page.

Note To completely remove the Cloud Data Sense instance, you can manually remove the Data Sense instance from your cloud provider’s portal or on-prem location.

Can I customize the service to my organization’s needs?

Cloud Data Sense provides out-of-the-box insights to your data. These insights can be extracted and used for your organization’s needs.

Additionally, Data Sense provides many ways for you to add a custom list of "personal data" that Data Sense will identify in scans, giving you the full picture about where potentially sensitive data resides in all your organizations' files.

  • You can add unique identifiers based on specific columns in databases you are scanning — we call this Data Fusion.

  • You can add custom keywords from a text file.

  • You can add custom patterns using a regular expression (regex).

Can I limit Cloud Data Sense information to specific users?

Yes, Cloud Data Sense is fully integrated with BlueXP. BlueXP users can only see information for the working environments they are eligible to view according to their workspace privileges.

Additionally, if you want to allow certain users to just view Data Sense scan results without having the ability to manage Data Sense settings, you can assign those users the Cloud Compliance Viewer role.

Can anyone access the private data sent between my browser and Data Sense?

No. The private data sent between your browser and the Data Sense instance are secured with end-to-end encryption, which means NetApp and third parties can’t read it. Data Sense won’t share any data or results with NetApp unless you request and approve access.

What happens if data tiering is enabled on your ONTAP volumes?

You might want to enable Cloud Data Sense on ONTAP systems that tier cold data to object storage. If data tiering is enabled, Data Sense scans all of the data—​data that’s on disks and cold data tiered to object storage.

The compliance scan doesn’t heat up the cold data—​it stays cold and tiered to object storage.

Can Cloud Data Sense send notifications to my organization?

Yes. In conjunction with the Policies feature, you can send email alerts to BlueXP users (daily, weekly, or monthly) when a Policy returns results so you can get notifications to protect your data. Learn more about Policies.

You can also download status reports from the Governance page and Investigation page that you can share internally in your organization.

Can Cloud Data Sense work with the AIP labels I have embedded in my files?

Yes. You can manage AIP labels in the files that Cloud Data Sense is scanning if you have subscribed to Azure Information Protection (AIP). You can view the labels that are already assigned to files, add labels to files, and change existing labels.

Types of source systems and data types

The following questions relate to the types of storage that can be scanned, and the types of data that is scanned.

What sources of data can be scanned with Data Sense?

Cloud Data Sense can scan data from working environments that you’ve added to the BlueXP Canvas and from many types of structured and unstructured data sources that Data Sense can access over your networks.

Working environments:

  • Cloud Volumes ONTAP (deployed in AWS, Azure, or GCP)

  • On-premises ONTAP clusters

  • Azure NetApp Files

  • Amazon FSx for ONTAP

  • Amazon S3

Data sources:

  • Non-NetApp file shares

  • Object storage (that uses S3 protocol)

  • Databases (Amazon RDS, MongoDB, MySQL, Oracle, PostgreSQL, SAP HANA, SQL Server)

  • OneDrive accounts

  • SharePoint Online and On-Premises accounts

  • Google Drive accounts

Data Sense supports NFS versions 3.x, 4.0, and 4.1, and CIFS versions 1.x, 2.0, 2.1, and 3.0.

Are there any restrictions when deployed in a Government region?

Cloud Data Sense is supported when the Connector is deployed in a Government region (AWS GovCloud, Azure Gov, or Azure DoD). When deployed in this manner, Data Sense has the following restrictions:

  • OneDrive accounts, SharePoint accounts, and Google Drive accounts can’t be scanned.

  • Microsoft Azure Information Protection (AIP) label functionality can’t be integrated.

What data sources can I scan if I install Data Sense in a site without internet access?

Data Sense can only scan data from data sources that are local to the on-premises site. At this time, Data Sense can scan the following local data sources in a "dark" site:

  • On-premises ONTAP systems

  • Database schemas

  • SharePoint On-Premises accounts (SharePoint Server)

  • Non-NetApp NFS or CIFS file shares

  • Object Storage that uses the Simple Storage Service (S3) protocol

Which file types are supported?

Cloud Data Sense scans all files for category and metadata insights, and displays all file types in the file types section of the dashboard.

When Data Sense detects Personal Identifiable Information (PII), or when it performs a DSAR search, only the following file formats are supported:

.CSV, .DCM, .DICOM, .DOC, .DOCX, .JSON, .PDF, .PPTX, .RTF, .TXT, .XLS, .XLSX, Docs, Sheets, and Slides

What kinds of data and metadata does Cloud Data Sense capture?

Cloud Data Sense enables you to run a general "mapping" scan or a full "classification" scan on your data sources. Mapping provides only a high-level overview of your data, whereas Classification provides deep-level scanning of your data. Mapping can be done on your data sources very quickly because it does not access files to see the data inside.

  • Data mapping scan.

    Data Sense scans the metadata only. This is useful for overall data management and governance, quick project scoping, very large estates, and prioritization. Data mapping is based on metadata and is considered a fast scan.

    After a fast scan, you can generate a Data Mapping Report. This report is an overview of the data stored in your corporate data sources to assist you with decisions about resource utilization, migration, backup, security, and compliance processes.

  • Data classification (deep) scan.

    Data Sense scans using standard protocols and read-only permission throughout your environments. Select files are opened and scanned for sensitive business-related data, private information, and issues related to ransomware.

    After a full scan there are many additional Data Sense features you can apply to your data, such as view and refine data in the Data Investigation page, search for names within files, copy, move, and delete source files, and more.

Licenses and costs

The following questions relate to licensing and costs to use Cloud Data Sense.

How much does Cloud Data Sense cost?

The cost to use Cloud Data Sense depends on the amount of data that you’re scanning. The first 1 TB of data that Data Sense scans in a BlueXP workspace is free. After reaching that limit, you’ll need one of the following to continue scanning data over 1 TB:

  • A subscription to the BlueXP Marketplace listing from your cloud provider, or

  • A Bring-your-own-license (BYOL) from NetApp

See pricing for details.

What happens if I have reached the BYOL capacity limit?

If you reach a BYOL capacity limit, Data Sense continues to run, but access to the Dashboards is blocked so that you can’t view information about any of your scanned data. Only the Configuration page is available in case you want to reduce the number of volumes being scanned to potentially bring your capacity usage under the license limit. You must renew your BYOL license to regain full access to Data Sense.

Connector deployment

The following questions relate to the BlueXP Connector.

What is the Connector?

The Connector is software running on a compute instance either within your cloud account, or on-premises, that enables BlueXP to securely manage cloud resources. You must deploy a Connector to use Cloud Data Sense.

Where does the Connector need to be installed?

  • When scanning data in Cloud Volumes ONTAP in AWS, Amazon FSx for ONTAP, or in AWS S3 buckets, you use a connector in AWS.

  • When scanning data in Cloud Volumes ONTAP in Azure or in Azure NetApp Files, you use a connector in Azure.

  • When scanning data in Cloud Volumes ONTAP in GCP, you use a Connector in GCP.

  • When scanning data in on-premises ONTAP systems, non-NetApp file shares, generic S3 Object storage, databases, OneDrive folders, SharePoint accounts, and Google Drive accounts, you can use a connector in any of these cloud locations.

So if you have data in many of these locations, you may need to use multiple Connectors.

Can I deploy the Connector on my own host?

Yes. You can deploy the Connector on-premises on a Linux host in your network or in the cloud. If you’re planning to deploy Data Sense on-premises, then you may want to install the Connector on-premises as well; but it’s not required.

What about secure sites without internet access?

Yes, that’s also supported. You can deploy the Connector on an on-premises Linux host that doesn’t have internet access. Then you can discover on-premises ONTAP clusters and other local data sources and scan the data using Data Sense.

Data Sense deployment

The following questions relate to the separate Data Sense instance.

What deployment models does Cloud Data Sense support?

BlueXP allows the user to scan and report on systems virtually anywhere, including on-premises, cloud, and hybrid environments. Cloud Data Sense is normally deployed using a SaaS model, in which the service is enabled via the BlueXP interface and requires no hardware or software installation. Even in this click-and-run deployment mode, data management can be done regardless of whether the data stores are on premises or in the public cloud.

What type of instance or VM is required for Cloud Data Sense?

  • In AWS, Cloud Data Sense runs on an m5.4xlarge instance with a 500 GB GP2 disk.

  • In Azure, Cloud Data Sense runs on a Standard_D16s_v3 VM with a 512 GB disk.

  • In GCP, Cloud Data Sense runs on an n2-standard-16 VM with a 512 GB Standard persistent disk.

Note that you can deploy Data Sense on a system with fewer CPUs and less RAM, but there are limitations when using these systems. See Using a smaller instance type for details.

Can I deploy the Data Sense on my own host?

Yes. You can install Data Sense software on a Linux host that has internet access in your network or in the cloud. Everything works the same and you continue to manage your scan configuration and results through BlueXP. See Deploying Cloud Data Sense on premises for system requirements and installation details.

What about secure sites without internet access?

Yes, that’s also supported. You can deploy Data Sense in an on-premises site that doesn’t have internet access for completely secure sites.