Gaining visibility and control of private data
Gain control of your private data by viewing details about the personal data and sensitive personal data in your organization. You can also gain visibility by reviewing the categories and file types that Cloud Compliance found in your data.
By default, the Cloud Compliance dashboard displays compliance data for all working environments and databases.
If you want to see data for only some of the working environments, select those working environments.
Personal data
Cloud Compliance automatically identifies specific words, strings, and patterns (Regex) inside the data. For example, Personal Identification Information (PII), credit card numbers, social security numbers, bank account numbers, and more. See the full list.
For some types of personal data, Cloud Compliance uses proximity validation to validate its findings. The validation occurs by looking for one or more predefined keywords in proximity to the personal data that was found. For example, Cloud Compliance identifies a U.S. social security number (SSN) as a SSN if it sees a proximity word next to it—for example, SSN or social security. The list below shows when Cloud Compliance uses proximity validation.
Viewing files that contain personal data
-
At the top of Cloud Manager, click Cloud Compliance and click the Dashboard tab.
-
To investigate the details for all personal data, click the icon next to the personal data percentage.
-
To investigate the details for a specific type of personal data, click View All and then click the Investigate Results icon for a specific type of personal data.
-
Investigate the data by searching, sorting, expanding details for a specific file, clicking Investigate Results to see masked information, or by downloading the file list.
-
You can also filter the contents of the investigation page to display only the results you want to see. The top-level tabs allow you to view data from files (unstructured data) or from databases (structured data).
Then you have filters for working environment, storage repository, category, private data, file type, last modified date, and whether the S3 object's permissions are open to public access.
Types of personal data
The personal data found in files can be general personal data or national identifiers. The third column identifies whether Cloud Compliance uses proximity validation to validate its findings for the identifier.
Type | Identifier | Proximity validation? |
---|---|---|
General |
Email address |
No |
Credit card number |
No |
|
IBAN number (International Bank Account Number) |
No |
|
National Identifiers |
Belgian ID (Numero National) |
Yes |
Brazilian ID (CPF) |
Yes |
|
Bulgarian ID (UCN) |
Yes |
|
California Driver's License |
Yes |
|
Croatian ID (OIB) |
Yes |
|
Cyprus Tax Identification Number (TIC) |
Yes |
|
Czech/Slovak ID |
Yes |
|
Danish ID (CPR) |
Yes |
|
Dutch ID (BSN) |
Yes |
|
Estonian ID |
Yes |
|
Finnish ID (HETU) |
Yes |
|
French Tax Identification Number (SPI) |
Yes |
|
German Tax Identification Number (Steuerliche Identifikationsnummer) |
Yes |
|
Greek ID |
Yes |
|
Hungarian Tax Identification Number |
Yes |
|
Irish ID (PPS) |
Yes |
|
Israeli ID |
Yes |
|
Italian Tax Identification Number |
Yes |
|
Latvian ID |
Yes |
|
Lithuanian ID |
Yes |
|
Luxembourg ID |
Yes |
|
Maltese ID |
Yes |
|
Polish ID (PESEL) |
Yes |
|
Portuguese Tax Identification Number (NIF) |
Yes |
|
Romanian ID (CNP) |
Yes |
|
Slovenian ID (EMSO) |
Yes |
|
South African ID |
Yes |
|
Spanish Tax Identification Number |
Yes |
|
Swedish ID |
Yes |
|
U.K. ID (NINO) |
Yes |
|
USA Social Security Number (SSN) |
Yes |
Sensitive personal data
Cloud Compliance automatically identifies special types of sensitive personal information, as defined by privacy regulations such as articles 9 and 10 of the GDPR. For example, information regarding a person's health, ethnic origin, or sexual orientation. See the full list.
Cloud Compliance uses artificial intelligence (AI), natural language processing (NLP), machine learning (ML), and cognitive computing (CC) to understand the meaning of the content that it scans in order to extract entities and categorize it accordingly.
For example, one sensitive GDPR data category is ethnic origin. Because of its NLP abilities, Cloud Compliance can distinguish the difference between a sentence that reads "George is Mexican" (indicating sensitive data as specified in article 9 of the GDPR), versus "George is eating Mexican food."
Only English is supported when scanning for sensitive personal data. Support for more languages will be added later. |
Viewing files that contain sensitive personal data
-
At the top of Cloud Manager, click Cloud Compliance.
-
To investigate the details for all sensitive personal data, click the icon next to the sensitive personal data percentage.
-
To investigate the details for a specific type of sensitive personal data, click View All and then click the Investigate Results icon for a specific type of sensitive personal data.
-
Investigate the data by searching, sorting, expanding details for a specific file, clicking Investigate Results to see masked information, or by downloading the file list.
Types of sensitive personal data
The sensitive personal data that Cloud Compliance can find in files includes the following:
- Criminal Procedures Reference
-
Data concerning a natural person’s criminal convictions and offenses.
- Ethnicity Reference
-
Data concerning a natural person’s racial or ethnic origin.
- Health Reference
-
Data concerning a natural person’s health.
- ICD-9-CM Medical Codes
-
Codes used in the medical and health industry.
- ICD-10-CM Medical Codes
-
Codes used in the medical and health industry.
- Philosophical Beliefs Reference
-
Data concerning a natural person’s philosophical beliefs.
- Religious Beliefs Reference
-
Data concerning a natural person’s religious beliefs.
- Sex Life or Orientation Reference
-
Data concerning a natural person’s sex life or sexual orientation.
Categories
Cloud Compliance takes the data that it scanned and divides it into different types of categories. Categories are topics based on AI analysis of the content and metadata of each file. See the list of categories.
Categories can help you understand what's happening with your data by showing you the types of information that you have. For example, a category like resumes or employee contracts can include sensitive data. When you investigate the results, you might find that employee contracts are stored in an insecure location. You can then correct that issue.
Only English is supported for categories. Support for more languages will be added later. |
Viewing files by categories
-
At the top of Cloud Manager, click Cloud Compliance.
-
Click the Investigate Results icon for one of the top 4 categories directly from the main screen, or click View All and then click the icon for any of the categories.
-
Investigate the data by searching, sorting, expanding details for a specific file, clicking Investigate Results to see masked information, or by downloading the file list.
Types of categories
Cloud Compliance categorizes your data as follows:
- Finance
-
-
Balance Sheets
-
Purchase Orders
-
Invoices
-
Quarterly Reports
-
- HR
-
-
Background Checks
-
Compensation Plans
-
Employee Contracts
-
Employee Reviews
-
Health
-
Resumes
-
- Legal
-
-
NDAs
-
Vendor-Customer contracts
-
- Marketing
-
-
Campaigns
-
Conferences
-
- Operations
-
-
Audit Reports
-
- Sales
-
-
Sales Orders
-
- Services
-
-
RFI
-
RFP
-
SOW
-
Training
-
- Support
-
-
Complaints and Tickets
-
- Metadata categories
-
-
Application Data
-
Archive Files
-
Audio
-
Business Application Data
-
CAD Files
-
Code
-
Database and index files
-
Design Files
-
Email Application Data
-
Executables
-
Financial Application Data
-
Health Application Data
-
Images
-
Logs
-
Miscellaneous Documents
-
Miscellaneous Presentations
-
Miscellaneous Spreadsheets
-
Videos
-
File types
Cloud Compliance takes the data that it scanned and breaks it down by file type. Reviewing your file types can help you control your sensitive data because you might find that certain file types are not stored correctly. See the list of file types.
For example, you might be storing CAD files that include very sensitive information about your organization. If they are unsecured, you can take control of the sensitive data by restricting permissions or moving the files to another location.
Viewing file types
-
At the top of Cloud Manager, click Cloud Compliance.
-
Click the Investigate Results icon for one of the top 4 file types directly from the main screen, or click View All and then click the icon for any of the file types.
-
Investigate the data by searching, sorting, expanding details for a specific file, clicking Investigate Results to see masked information, or by downloading the file list.
Types of files
Cloud Compliance scans all files for category and metadata insights and displays all file types in the file types section of the dashboard.
But when Cloud Compliance detects Personal Identifiable Information (PII), or when it performs a DSAR search, only the following file formats are supported:
.PDF, .DOCX, .DOC, .PPTX, .XLS, .XLSX, .CSV, .TXT, .RTF, and .JSON.
Viewing data from specific working environments
You can filter the contents of the Cloud Compliance dashboard to see compliance data for all working environments and databases, or for just specific working environments.
When you filter the dashboard, Cloud Compliance scopes the compliance data and reports to just those working environments that you selected.
-
Click the filter drop-down, select the working environments that you'd like to view data for, and click View.
Accuracy of information found
NetApp can't guarantee 100% accuracy of the personal data and sensitive personal data that Cloud Compliance identifies. You should always validate the information by reviewing the data.
Based on our testing, the table below shows the accuracy of the information that Cloud Compliance finds. We break it down by precision and recall:
- Precision
-
The probability that what Cloud Compliance finds has been identified correctly. For example, a precision rate of 90% for personal data means that 9 out of 10 files identified as containing personal information, actually contain personal information. 1 out of 10 files would be a false positive.
- Recall
-
The probability for Cloud Compliance to find what it should. For example, a recall rate of 70% for personal data means that Cloud Compliance can identify 7 out of 10 files that actually contain personal information in your organization. Cloud Compliance would miss 30% of the data and it won’t appear in the dashboard.
Cloud Compliance is in a Controlled Availability release and we are constantly improving the accuracy of our results. Those improvements will be automatically available in future Cloud Compliance releases.
Type | Precision | Recall |
---|---|---|
Personal data - General |
90%-95% |
60%-80% |
Personal data - Country identifiers |
30%-60% |
40%-60% |
Sensitive personal data |
80%-95% |
20%-30% |
Categories |
90%-97% |
60%-80% |
What’s included in each file list report (CSV file)
From each Investigation page you can download file lists (in CSV format) that include details about the identified files. If there are more than 10,000 results, only the top 10,000 appear in the list.
Each file list includes the following information:
-
File name
-
Location type
-
Working environment
-
Storage repository
-
Protocol
-
File path
-
File type
-
Category
-
Personal information
-
Sensitive personal information
-
Deletion detection date
A deletion detection date identifies the date that the file was deleted or moved. This enables you to identify when sensitive files have been moved. Deleted files aren't part of the file number count that appears in the dashboard or on the Investigation page. The files only appear in the CSV reports.