Skip to main content
BlueXP classification

Categories of private data

Contributors netapp-tonacki amgrissino

There are many types of private data that BlueXP classification can identify in your volumes, Amazon S3 buckets, databases, OneDrive folders, SharePoint accounts, and Google Drive accounts. See the categories below.

Tip If you need BlueXP classification to identify other private data types, such as additional national ID numbers or healthcare identifiers, email ng-contact-data-sense@netapp.com with your request.

Types of personal data

The personal data found in files can be general personal data or national identifiers. The third column in the table below identifies whether BlueXP classification uses proximity validation to validate its findings for the identifier.

The languages in which these items can be recognized are identified in the table.

Note that you can add to the list of personal data that is found in your files. If you are scanning a database server, the Data Fusion feature enables you to choose additional identifiers that BlueXP classification will look for in its' scans by selecting columns in a database table. You can also add custom keywords from a text file, or custom patterns using a regular expression. See Adding personal data identifiers to your BlueXP classification scans for details.

Type Identifier Proximity validation? English German Spanish French Japanese

General

Credit card number

No

Data Subjects

No

Email Address

No

IBAN Number (International Bank Account Number)

No

IP Address

No

Password

Yes

National Identifiers

Australian TFN (Tax File Number)

Yes

Australian Driver's License

Yes

Australian Medicare Number

Yes

Australian Passport Number

Yes

Austrian SSN

Yes

Belgian ID (Numero National)

Yes

Botswana Identity Card (Omang) Number

Yes

Botswana Passport Number

Yes

Brazilian ID (CPF)

Yes

British Passport

Yes

Bulgarian ID (UCN)

Yes

Croatian ID (OIB)

Yes

Cyprus Tax Identification Number (TIC)

Yes

Czech/Slovak ID

Yes

Danish ID (CPR)

Yes

Dutch ID (BSN)

Yes

Estonian ID

Yes

Finnish ID (HETU)

Yes

French Driver's License

Yes

French ID

Yes

French INSEE

Yes

French Social Security Number

Yes

French Tax Identification Number (SPI)

Yes

German ID (Personalausweisnummer)

Yes

German Internal ID for Bank Transfers

Yes

German Social Security Number (Sozialversicherungsnummer)

Yes

German Tax Identification Number (Steuerliche Identifikationsnummer)

Yes

Greek ID

Yes

Hungarian Tax Identification Number

Yes

Irish ID (PPS)

Yes

Israeli ID

Yes

Italian Tax Identification Number

Yes

Japanese Personal Identification Number (both Personal and Corporate)

Yes

Latvian ID

Yes

Lithuanian ID

Yes

Luxembourg ID

Yes

Maltese ID

Yes

National Health Service (NHS) Number

Yes

New Zealand Bank Account

Yes

New Zealand Driver's License

Yes

New Zealand IRD Number (Tax ID)

Yes

New Zealand NHI (National Health Index) Number

Yes

New Zealand Passport Number

Yes

Polish ID (PESEL)

Yes

Portuguese Tax Identification Number (NIF)

Yes

Romanian ID (CNP)

Yes

Singapore National Registration Identity Card (NRIC)

Yes

Slovenian ID (EMSO)

Yes

South African ID

Yes

Spanish Tax Identification Number

Yes

Swedish ID

Yes

Texas Driver's License

Yes

U.K. ID (NINO)

Yes

USA California Driver's License

Yes

USA Indiana Driver's License

Yes

USA New York Driver's License

Yes

USA Social Security Number (SSN)

Yes

Types of sensitive personal data

The sensitive personal data that BlueXP classification can find in files includes the following list.

The items in this category can be recognized only in English at this time.

Criminal Procedures Reference

Data concerning a natural person's criminal convictions and offenses.

Ethnicity Reference

Data concerning a natural person's racial or ethnic origin.

Health Reference

Data concerning a natural person's health.

ICD-9-CM Medical Codes

Codes used in the medical and health industry.

ICD-10-CM Medical Codes

Codes used in the medical and health industry.

Philosophical Beliefs Reference

Data concerning a natural person's philosophical beliefs.

Political Opinions Reference

Data concerning a natural person's political opinions.

Religious Beliefs Reference

Data concerning a natural person's religious beliefs.

Sex Life or Orientation Reference

Data concerning a natural person's sex life or sexual orientation.

Types of categories

BlueXP classification categorizes your data as follows.

Most of these categories can be recognized in English, German, and Spanish.

Category Type English German Spanish

Finance

Balance Sheets

Purchase Orders

Invoices

Quarterly Reports

HR

Background Checks

Compensation Plans

Employee Contracts

Employee Reviews

Health

Resumes

Legal

NDAs

Vendor-Customer contracts

Marketing

Campaigns

Conferences

Operations

Audit Reports

Sales

Sales Orders

Services

RFI

RFP

SOW

Training

Support

Complaints and Tickets

The following Metadata is also categorized, and are identified in the same supported languages:

  • Application Data

  • Archive Files

  • Audio

  • Business Application Data

  • CAD Files

  • Code

  • Corrupted

  • Database and index files

  • BlueXP classification Breadcrumbs

  • Design Files

  • Email Application Data

  • Encrypted (files with a high entropy score)

  • Executables

  • Financial Application Data

  • Health Application Data

  • Images

  • Logs

  • Miscellaneous Documents

  • Miscellaneous Presentations

  • Miscellaneous Spreadsheets

  • Miscellaneous "Unknown"

  • Password Protected files

  • Structured Data

  • Videos

  • Zero-Byte Files

Types of files

BlueXP classification scans all files for category and metadata insights and displays all file types in the file types section of the dashboard.

But when BlueXP classification detects Personal Identifiable Information (PII), or when it performs a DSAR search, only the following file formats are supported:

.CSV, .DCM, .DICOM, .DOC, .DOCX, .JSON, .PDF, .PPTX, .RTF, .TXT, .XLS, .XLSX, Docs, Sheets, and Slides

Accuracy of information found

NetApp can't guarantee 100% accuracy of the personal data and sensitive personal data that BlueXP classification identifies. You should always validate the information by reviewing the data.

Based on our testing, the table below shows the accuracy of the information that BlueXP classification finds. We break it down by precision and recall:

Precision

The probability that what BlueXP classification finds has been identified correctly. For example, a precision rate of 90% for personal data means that 9 out of 10 files identified as containing personal information, actually contain personal information. 1 out of 10 files would be a false positive.

Recall

The probability for BlueXP classification to find what it should. For example, a recall rate of 70% for personal data means that BlueXP classification can identify 7 out of 10 files that actually contain personal information in your organization. BlueXP classification would miss 30% of the data and it won't appear in the dashboard.

We are constantly improving the accuracy of our results. Those improvements will be automatically available in future BlueXP classification releases.

Type Precision Recall

Personal data - General

90%-95%

60%-80%

Personal data - Country identifiers

30%-60%

40%-60%

Sensitive personal data

80%-95%

20%-30%

Categories

90%-97%

60%-80%