Adding personal data identifiers using Data Fusion

Contributors netapp-tonacki Download PDF of this page

A feature we call Data Fusion allows you to scan your organizations' data to identify whether unique identifiers from your databases are found in files or other databases - basically making your own list of "personal data" that is identified in Cloud Data Sense scans. This gives you the full picture about where potentially sensitive data resides in all your files.

Note The capabilities described in this section are available only if you have chosen to perform a full classification scan on your data sources. Data sources that have had a mapping-only scan do not show file-level details.

Creating custom personal data identifiers from your databases

You can choose the additional identifiers that Cloud Data Sense will look for in its' scans by selecting a specific column, or columns, in a database table. For example, the diagram below shows how data fusion is used to scan your volumes, buckets, and databases for occurrences of all your Customer IDs from your Oracle database.

A diagram showing how content from your databases can be used as a source to identify files that contain the same data.

As you can see, two unique Customer IDs have been found in two volumes and in one S3 bucket. Any matches in database tables will also be identified.

Steps

You must have added at least one database server to Cloud Data Sense before you can add data fusion sources.

  1. In the Configuration page, click Manage Data Fusion in the database where the source data resides.

    A screenshot of selecting the Manage Data Fusion button to add a source column.

  2. Click Add Data Fusion source on the next page.

  3. In the Add Data Fusion Source page:

    1. Select the Database Schema from the drop-down menu.

    2. Enter the Table name in that schema.

    3. Enter the Column, or Columns, that contain the unique identifiers you want to use.

      When adding multiple columns, enter each column name, or table view name, on a separate line.

      A screenshot of identifying the schema

  4. Click Add Data Fusion Source.

    The Data Fusion inventory page displays the database source columns that you have configured for Cloud Data Sense to scan.

    A screenshot of all the data source references you have configured with Data Fusion.

Results

After the next scan, the results will include this new information in the Dashboard under the "Personal" results section, and in the Investigation page in the "Personal Data" filter. Each source column you added appears in the filter list as "Table.Column", for example Customers.Customer ID.

Deleting a Data Fusion source

If at some point you decide not to scan your files using a certain Data Fusion source, you can select the source row from the Data Fusion inventory page and click Delete Data Fusion Source.

A screenshot showing how to remove a data fusion source.