Scan data sources with NetApp Data Classification
NetApp Data Classification scanns the data in the repositories (the volumes, database schemas, or other user data) that you select to identify personal and sensitive data. Data Classification then maps your organizational data, categorizes each file, and identifies predefined patterns in the data. The result of the scan is an index of personal information, sensitive personal information, data categories, and file types.
After the initial scan, Data Classification continuously scans your data in a round-robin fashion to detect incremental changes. This is why it's important to keep the instance running.
You can enable and disable scans at the volume level or at the database schema level.
What's the difference between Mapping and Classification scans
You can conduct two types of scans in Data Classification:
-
Mapping-only scans provide only a high-level overview of your data and are performed on selected data sources. Mapping-only scans take less time than map and classify scans because the do not access files to see the data inside. You might want to do this initially to identify areas of research and then perform a Map & Classify scan on those areas.
-
Map & Classify scans provide deep-level scanning of your data.
The table below shows some of the differences:
Feature | Map & classify scans | Mapping-only scans |
---|---|---|
Scan speed |
Slow |
Fast |
Pricing |
Free |
Free |
Capacity |
Limited to 500 TiB* |
Limited to 500 TiB* |
List of file types and used capacity |
Yes |
Yes |
Number of files and used capacity |
Yes |
Yes |
Age and size of files |
Yes |
Yes |
Ability to run a Data Mapping Report |
Yes |
Yes |
Data Investigation page to view file details |
Yes |
No |
Search for names within files |
Yes |
No |
Create saved queries that provide custom search results |
Yes |
No |
Ability to run other reports |
Yes |
No |
Ability to see metadata from files** |
No |
Yes |
* Data Classification does not impose a limit on the amount of data it can scan. Each Console agent supports scanning and displaying 500 TiB of data. To scan more than 500 TiB of data, install another Console agent then deploy another Data Classification instance.
The Console UI displays data from a single connector. For tips on viewing data from multiple Console agents, see Work with multiple Console agents.
** The following metadata is extracted from files during mapping scans:
-
System
-
System type
-
Storage repository
-
File type
-
Used capacity
-
Number of files
-
File size
-
File creation
-
File last access
-
File last modified
-
File discovered time
-
Permissions extraction
Governance dashboard differences:
Feature | Map & Classify | Map |
---|---|---|
Stale data |
Yes |
Yes |
Non-business data |
Yes |
Yes |
Duplicated files |
Yes |
Yes |
Predefined saved queries |
Yes |
No |
Default saved queries |
Yes |
Yes |
DDA report |
Yes |
Yes |
Mapping report |
Yes |
Yes |
Sensitivity level detection |
Yes |
No |
Sensitive data with wide permissions |
Yes |
No |
Open permissions |
Yes |
Yes |
Age of data |
Yes |
Yes |
Size of data |
Yes |
Yes |
Categories |
Yes |
No |
File types |
Yes |
Yes |
Compliance dashboard differences:
Feature | Map & Classify | Map |
---|---|---|
Personal information |
Yes |
No |
Sensitive personal information |
Yes |
No |
Privacy risk assessment report |
Yes |
No |
HIPAA report |
Yes |
No |
PCI DSS report |
Yes |
No |
Investigation filters differences:
Feature | Map & Classify | Map |
---|---|---|
Saved queries |
Yes |
Yes |
System type |
Yes |
Yes |
System |
Yes |
Yes |
Storage repository |
Yes |
Yes |
File type |
Yes |
Yes |
File size |
Yes |
Yes |
Created time |
Yes |
Yes |
Discovered time |
Yes |
Yes |
Last modified |
Yes |
Yes |
Last access |
Yes |
Yes |
Open permissions |
Yes |
Yes |
File directory path |
Yes |
Yes |
Category |
Yes |
No |
Sensitivity level |
Yes |
No |
Number of identifiers |
Yes |
No |
Personal data |
Yes |
No |
Sensitive personal data |
Yes |
No |
Data subject |
Yes |
No |
Duplicates |
Yes |
Yes |
Classification status |
Yes |
Status is always "Limited insights" |
Scan analysis event |
Yes |
Yes |
File hash |
Yes |
Yes |
Number of users with access |
Yes |
Yes |
User/group permissions |
Yes |
Yes |
File owner |
Yes |
Yes |
Directory type |
Yes |
Yes |
How quickly does Data Classification scan data
The scan speed is affected by network latency, disk latency, network bandwidth, environment size, and file distribution sizes.
-
When performing Mapping-only scans, Data Classification can scan between 100 and 150 TiB of data per day.
-
When performing Map & classify scans, Data Classification can scan between 15 and 40 TiB of data per day.