Skip to main content
NetApp Data Classification

Scan data sources with NetApp Data Classification

Contributors netapp-ahibbard

NetApp Data Classification scanns the data in the repositories (the volumes, database schemas, or other user data) that you select to identify personal and sensitive data. Data Classification then maps your organizational data, categorizes each file, and identifies predefined patterns in the data. The result of the scan is an index of personal information, sensitive personal information, data categories, and file types.

After the initial scan, Data Classification continuously scans your data in a round-robin fashion to detect incremental changes. This is why it's important to keep the instance running.

You can enable and disable scans at the volume level or at the database schema level.

What's the difference between Mapping and Classification scans

You can conduct two types of scans in Data Classification:

  • Mapping-only scans provide only a high-level overview of your data and are performed on selected data sources. Mapping-only scans take less time than map and classify scans because the do not access files to see the data inside. You might want to do this initially to identify areas of research and then perform a Map & Classify scan on those areas.

  • Map & Classify scans provide deep-level scanning of your data.

The table below shows some of the differences:

Feature Map & classify scans Mapping-only scans

Scan speed

Slow

Fast

Pricing

Free

Free

Capacity

Limited to 500 TiB*

Limited to 500 TiB*

List of file types and used capacity

Yes

Yes

Number of files and used capacity

Yes

Yes

Age and size of files

Yes

Yes

Ability to run a Data Mapping Report

Yes

Yes

Data Investigation page to view file details

Yes

No

Search for names within files

Yes

No

Create saved queries that provide custom search results

Yes

No

Ability to run other reports

Yes

No

Ability to see metadata from files**

No

Yes

* Data Classification does not impose a limit on the amount of data it can scan. Each Console agent supports scanning and displaying 500 TiB of data. To scan more than 500 TiB of data, install another Console agent then deploy another Data Classification instance.
The Console UI displays data from a single connector. For tips on viewing data from multiple Console agents, see Work with multiple Console agents.

** The following metadata is extracted from files during mapping scans:

  • System

  • System type

  • Storage repository

  • File type

  • Used capacity

  • Number of files

  • File size

  • File creation

  • File last access

  • File last modified

  • File discovered time

  • Permissions extraction

Governance dashboard differences:
Feature Map & Classify Map

Stale data

Yes

Yes

Non-business data

Yes

Yes

Duplicated files

Yes

Yes

Predefined saved queries

Yes

No

Default saved queries

Yes

Yes

DDA report

Yes

Yes

Mapping report

Yes

Yes

Sensitivity level detection

Yes

No

Sensitive data with wide permissions

Yes

No

Open permissions

Yes

Yes

Age of data

Yes

Yes

Size of data

Yes

Yes

Categories

Yes

No

File types

Yes

Yes

Compliance dashboard differences:
Feature Map & Classify Map

Personal information

Yes

No

Sensitive personal information

Yes

No

Privacy risk assessment report

Yes

No

HIPAA report

Yes

No

PCI DSS report

Yes

No

Investigation filters differences:
Feature Map & Classify Map

Saved queries

Yes

Yes

System type

Yes

Yes

System

Yes

Yes

Storage repository

Yes

Yes

File type

Yes

Yes

File size

Yes

Yes

Created time

Yes

Yes

Discovered time

Yes

Yes

Last modified

Yes

Yes

Last access

Yes

Yes

Open permissions

Yes

Yes

File directory path

Yes

Yes

Category

Yes

No

Sensitivity level

Yes

No

Number of identifiers

Yes

No

Personal data

Yes

No

Sensitive personal data

Yes

No

Data subject

Yes

No

Duplicates

Yes

Yes

Classification status

Yes

Status is always "Limited insights"

Scan analysis event

Yes

Yes

File hash

Yes

Yes

Number of users with access

Yes

Yes

User/group permissions

Yes

Yes

File owner

Yes

Yes

Directory type

Yes

Yes

How quickly does Data Classification scan data

The scan speed is affected by network latency, disk latency, network bandwidth, environment size, and file distribution sizes.

  • When performing Mapping-only scans, Data Classification can scan between 100 and 150 TiB of data per day.

  • When performing Map & classify scans, Data Classification can scan between 15 and 40 TiB of data per day.