Gaining visibility and control of private data Edit on GitHub Request doc changes

Contributors netapp-bcammett

Gain control of your private data by viewing details about the personal data and sensitive personal data in your organization. You can also gain visibility by reviewing the categories and file types that Cloud Compliance found in your data.

Personal data

Cloud Compliance automatically identifies specific words, strings, and patterns (Regex) inside the data. For example, Personal Identification Information (PII), credit card numbers, social security numbers, bank account numbers, and more. See the full list.

For some types of personal data, Cloud Compliance uses proximity validation to validate its findings. The validation occurs by looking for one or more predefined keywords in proximity to the personal data that was found. For example, Cloud Compliance identifies a U.S. social security number (SSN) as a SSN if it sees a proximity word next to it—​for example, SSN or social security. The list below shows when Cloud Compliance uses proximity validation.

Viewing files that contain personal data

Steps
  1. At the top of Cloud Manager, click Compliance.

  2. Download the details for one of the top 2 file types directly from the main screen, or click View All and then download the list for any of the personal data types that were found.

    A screenshot of the personal files dialog box where you can click the download button next to a personal data type. The result is a CSV file with details about the files.

Types of personal data

The personal data found in files can be general personal data or national identifiers. The third column identifies whether Cloud Compliance uses proximity validation to validate its findings for the identifier.

Type Identifier Proximity validation?

General

Email address

No

Credit card number

No

IBAN number (International Bank Account Number)

No

IP address

Yes

National Identifiers

Belgian ID (Numero National)

Yes

Bulgarian ID (Unified Civil Number)

Yes

Cyprus Tax Identification Number (TIC)

Yes

Danish Tax Identification Number (CPR)

Yes

Estonian ID (Isikukood)

Yes

Finnish ID (henkilötunnus)

Yes

French Tax Identification Number (SPI)

Yes

German Tax Identification Number (Steuerliche Identifikationsnummer)

Yes

Hungarian Tax Identification Number (Adóazonosító jel)

Yes

Irish ID (PPS)

Yes

Israeli ID

Yes

Italian ID (Codice Fiscale)

Yes

Latvian Tax Identification Number

Yes

Lithuanian ID (Asmens kodas)

Yes

Luxembourg ID

Yes

Malta ID

Yes

Netherlands ID (BSN)

Yes

Polish Tax Identification Number

Yes

Portuguese Tax Identification Number (NIF)

Yes

Romanian Tax Identification Number

Yes

Slovakian Tax Identification Number

Yes

Slovenian Tax Identification Number

Yes

South African ID

Yes

Spanish Tax Identification Number

Yes

Swedish Tax Identification Number

Yes

U.K. National Insurance Number (NINO)

Yes

USA Social Security Number (SSN)

Yes

Sensitive personal data

Cloud Compliance automatically identifies special types of sensitive personal information, as defined by privacy regulations such as articles 9 and 10 of the GDPR. For example, information regarding a person’s health, ethnic origin, or sexual orientation. See the full list.

Cloud Compliance uses artificial intelligence (AI), natural language processing (NLP), machine learning (ML), and cognitive computing (CC) to understand the meaning of the content that it scans in order to extract entities and categorize it accordingly.

For example, one sensitive GDPR data category is ethnic origin. Because of its NLP abilities, Cloud Compliance can distinguish the difference between a sentence that reads "George is Mexican" (indicating sensitive data as specified in article 9 of the GDPR), versus "George is eating Mexican food."

Only English is supported when scanning for sensitive personal data. Support for more languages will be added later.

Viewing files that contain sensitive personal data

Steps
  1. At the top of Cloud Manager, click Compliance.

  2. Download the details for one of the top 2 file types directly from the main screen, or click View All and then download the list for any of the sensitive personal data types that were found.

    A screenshot of the sensitive personal files dialog box where you can click the download button next to a personal data type. The result is a CSV file with details about the files.

Types of sensitive personal data

The sensitive personal data that Cloud Compliance can find in files includes the following:

Criminal Procedures Reference

Data concerning a natural person’s criminal convictions and offenses.

Ethnicity Reference

Data concerning a natural person’s racial or ethnic origin.

Health Reference

Data concerning a natural person’s health.

Philosophical Beliefs Reference

Data concerning a natural person’s philosophical beliefs.

Religious Beliefs Reference

Data concerning a natural person’s religious beliefs.

Sex Life or Orientation Reference

Data concerning a natural person’s sex life or sexual orientation.

Categories

Cloud Compliance takes the data that it scanned and divides it into different types of categories. Categories are topics based on AI analysis of the content and metadata of each file. See the list of categories.

Categories can help you understand what’s happening with your data by showing you the type of information that you have. For example, a category like resumes or employee contracts can include sensitive data. When you download the CSV report, you might find that employee contracts are stored in an unsecure location. You can then correct that issue.

Only English is supported for categories. Support for more languages will be added later.

Viewing files by categories

Steps
  1. At the top of Cloud Manager, click Compliance.

  2. Download the details for one of the top 4 file types directly from the main screen, or click View All and then download the list for any of the categories.

    A screenshot of the categories dialog box where you can click the download button next to a category. The result is a CSV file with details about the files in that category.

Types of categories

Cloud Compliance categorizes your data as follows:

Finance
  • Balance Sheets

  • Purchase Orders

  • Invoices

  • Quarterly Reports

HR
  • Background Check

  • Compensation Plans

  • Employee Contracts

  • Employee Review

  • Health

  • Resumes

Legal
  • NDA

  • Vendor-Customer contracts

Marketing
  • Campaigns

  • Conferences

Operations
  • Audit Reports

Sales
  • Sales Orders

Services
  • RFI

  • RFP

  • Training

Support
  • Complaints and Tickets

Other
  • Archive Files

  • Audio

  • CAD Files

  • Code

  • Executables

  • Images

File types

Cloud Compliance takes the data that it scanned and breaks it down by file type. Cloud Compliance can display all file types found in the scans.

Reviewing your file types can help you control your sensitive data because you might find that certain file types are not stored correctly. For example, you might be storing CAD files that include very sensitive information about your organization. If they are unsecured, you can take control of the sensitive data by restricting permissions or moving the files to another location.

Viewing file types

Steps
  1. At the top of Cloud Manager, click Compliance.

  2. Download the details for one of the top 4 file types directly from the main screen, or click View All and then download the list for any of the file types.

    A screenshot of the file types dialog box where you can click the download button next to a file type. The result is a CSV file with details about the files.

Accuracy of information found

NetApp can’t guarantee 100% accuracy of the personal data and sensitive personal data that Cloud Compliance identifies. You should always validate the information by reviewing the data.

Based on our testing, the table below shows the accuracy of the information that Cloud Compliance finds. We break it down by precision and recall:

Precision

The probability that what Cloud Compliance finds has been identified correctly. For example, a precision rate of 90% for personal data means that 9 out of 10 files identified as containing personal information, actually contain personal information. 1 out of 10 files would be a false positive.

Recall

The probability for Cloud Compliance to find what it should. For example, a recall rate of 70% for personal data means that Cloud Compliance can identify 7 out of 10 files that actually contain personal information in your organization. Cloud Compliance would miss 30% of the data and it won’t appear in the dashboard.

Cloud Compliance is in a Controlled Availability release and we are constantly improving the accuracy of our results. Those improvements will be automatically available in future Cloud Compliance releases.

Type Precision Recall

Personal data - General

90%-95%

60%-80%

Personal data - Country identifiers

30%-60%

40%-60%

Sensitive personal data

80%-95%

20%-30%

Categories

90%-97%

60%-80%

What’s included in each file list report (CSV file)

The dashboard enables you to download file lists (in CSV format) that include details about the identified files. If there are more than 10,000 results, only the top 10,000 appear in the list (support for more will be added later).

Each file list includes the following information:

  • File name

  • Location type

  • Location

  • File path

  • File type

  • Category

  • Personal information

  • Sensitive personal information

  • Deletion detection date

    A deletion detection date identifies the date that the file was deleted or moved. This enables you to identify when sensitive files have been moved. Deleted files aren’t part of the file number count that appears in the dashboard. The files only appear in the CSV reports.