Scan data sources overview with BlueXP classification
BlueXP classification scanns the data in the repositories (the volumes, database schemas, or other user data) that you select to identify personal and sensitive data. BlueXP classification then maps your organizational data, categorizes each file, and identifies predefined patterns in the data. The result of the scan is an index of personal information, sensitive personal information, data categories, and file types.
After the initial scan, BlueXP classification continuously scans your data in a round-robin fashion to detect incremental changes. This is why it's important to keep the instance running.
You can enable and disable scans at the volume level or at the database schema level.
What's the difference between Mapping and Classification scans
You can conduct two types of scans in BlueXP classification:
-
Mapping-only scans provide only a high-level overview of your data and are performed on selected data sources. Mapping-only scans take less time than map and classify scans because the do not access files to see the data inside. You might want to do this initially to identify areas of research and then perform a Map & Classify scan on those areas.
-
Map & Classify scans provide deep-level scanning of your data.
The table below shows some of the differences:
Feature | Map & classify scans | Mapping-only scans |
---|---|---|
Scan speed |
Slow |
Fast |
Pricing |
Free |
Free |
Capacity |
Limited to 500 TiB* |
Limited to 500 TiB* |
List of file types and used capacity |
Yes |
Yes |
Number of files and used capacity |
Yes |
Yes |
Age and size of files |
Yes |
Yes |
Ability to run a Data Mapping Report |
Yes |
Yes |
Data Investigation page to view file details |
Yes |
No |
Search for names within files |
Yes |
No |
Create saved searches that provide custom search results |
Yes |
No |
Ability to run other reports |
Yes |
No |
Ability to see metadata from files* |
No |
Yes |
* include::_include/connector-limit.adoc[]
*The following metadata is extracted from files during mapping scans:
-
Working environment
-
Working environment type
-
Storage repository
-
File type
-
Used capacity
-
Number of files
-
File size
-
File creation
-
File last access
-
File last modified
-
File discovered time
-
Permissions extraction
Governance dashboard differences:
Feature | Map & Classify | Map |
---|---|---|
Stale data |
Yes |
Yes |
Non-business data |
Yes |
Yes |
Duplicated files |
Yes |
Yes |
Predefined saved searches |
Yes |
No |
Default saved searches |
Yes |
Yes |
DDA report |
Yes |
Yes |
Mapping report |
Yes |
Yes |
Sensitivity level detection |
Yes |
No |
Sensitive data with wide permissions |
Yes |
No |
Open permissions |
Yes |
Yes |
Age of data |
Yes |
Yes |
Size of data |
Yes |
Yes |
Categories |
Yes |
No |
File types |
Yes |
Yes |
Compliance dashboard differences:
Feature | Map & Classify | Map |
---|---|---|
Personal information |
Yes |
No |
Sensitive personal information |
Yes |
No |
Privacy risk assessment report |
Yes |
No |
HIPAA report |
Yes |
No |
PCI DSS report |
Yes |
No |
Investigation filters differences:
Feature | Map & Classify | Map |
---|---|---|
Saved searches |
Yes |
Yes |
Working environment type |
Yes |
Yes |
Working environment |
Yes |
Yes |
Storage repository |
Yes |
Yes |
File type |
Yes |
Yes |
File size |
Yes |
Yes |
Created time |
Yes |
Yes |
Discovered time |
Yes |
Yes |
Last modified |
Yes |
Yes |
Last access |
Yes |
Yes |
Open permissions |
Yes |
Yes |
File directory path |
Yes |
Yes |
Category |
Yes |
No |
Sensitivity level |
Yes |
No |
Number of identifiers |
Yes |
No |
Personal data |
Yes |
No |
Sensitive personal data |
Yes |
No |
Data subject |
Yes |
No |
Duplicates |
Yes |
Yes |
Classification status |
Yes |
Status is always "Limited insights" |
Scan analysis event |
Yes |
Yes |
File hash |
Yes |
Yes |
Number of users with access |
Yes |
Yes |
User/group permissions |
Yes |
Yes |
File owner |
Yes |
Yes |
Directory type |
Yes |
Yes |
How quickly does BlueXP classification scan data
The scan speed is affected by network latency, disk latency, network bandwidth, environment size, and file distribution sizes.
-
When performing Mapping-only scans, BlueXP classification can scan between 100-150 TiBs of data per day.
-
When performing Map & classify scans, BlueXP classification can scan between 15-40 TiBs of data per day.