Skip to main content
BlueXP classification

Add personal data identifiers to your BlueXP classification scans

Contributors netapp-tonacki amgrissino

BlueXP classification provides many ways for you to add a custom list of "personal data" that BlueXP classification will identify in future scans, giving you the full picture about where potentially sensitive data resides in all your organizations' files.

  • You can add unique identifiers based on specific columns in databases you are scanning.

  • You can add custom keywords from a text file — these words are identified within your data.

  • You can add a personal pattern using a regular expression (regex) — the regex is added to the existing predefined patterns.

  • You can add custom categories to identify where specific categories of information are found in your data.

All of these mechanisms to add custom scanning criteria are supported in all languages.

Note The capabilities described in this section are available only if you have chosen to perform a full classification scan on your data sources. Data sources that have had a mapping-only scan do not show file-level details.

Add custom personal data identifiers from your databases

A feature we call Data Fusion allows you to scan your organizations' data to identify whether unique identifiers from your databases are found in any of your other data sources. You can choose the additional identifiers that BlueXP classification will look for in its' scans by selecting a specific column, or columns, in a database table. For example, the diagram below shows how data fusion is used to scan your volumes, buckets, and databases for occurrences of all your Customer IDs from your Oracle database.

A diagram showing how content from your databases can be used as a source to identify files that contain the same data.

As you can see, two unique Customer IDs have been found in two volumes and in one S3 bucket. Any matches in database tables will also be identified.

Note that since you're scanning your own databases, whatever language your data is stored in will be used to identify data in future BlueXP classification scans.

Steps

You must have added at least one database server to BlueXP classification before you can add data fusion sources.

  1. In the Configuration page, click Manage Data Fusion in the database where the source data resides.

    A screenshot of selecting the Manage Data Fusion button to add a source column.

  2. Click Add Data Fusion source on the next page.

  3. In the Add Data Fusion Source page:

    1. Select the Database Schema from the drop-down menu.

    2. Enter the Table name in that schema.

    3. Enter the Column, or Columns, that contain the unique identifiers you want to use.

      When adding multiple columns, enter each column name, or table view name, on a separate line.

      A screenshot of identifying the schema

  4. Click Add Data Fusion Source.

    A screenshot of all the data source references you have configured with Data Fusion.

Results

After the next scan, the results will include this new information in the Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter. The name you used for the classifier appears in the filter list, for example Customers.CustomerID.

A screenshot showing an example of Data Fusion results in the Investigation Results pane.

Delete a Data Fusion source

If at some point you decide not to scan your files using a certain Data Fusion source, you can select the source row from the Data Fusion inventory page and click Delete Data Fusion Source.

A screenshot showing how to remove a data fusion source.

Add custom keywords from a list of words

You can add custom keywords to BlueXP classification so that it will identify where that information is found in your data. You add the keywords just by entering each word you want BlueXP classification to recognize. The keywords are added to the existing predefined keywords that BlueXP classification already uses, and the results will be visible under the personal patterns section.

For example, you may want to see where internal Product Names are mentioned in all of your files to make sure these names are not accessible in locations that are not secure.

After updating the custom keywords, BlueXP classification will restart scanning all data sources. After the scan has completed, the new results will appear in the BlueXP classification Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter.

Steps
  1. From the Classification settings tab, click Add New Classifier to launch the Add Custom Classifier wizard.

    A screenshot showing how to launch the Add Custom Classifier wizard.

  2. In the Select type page, enter the name of the classifier, provide a brief description, select Personal identifier, and then click Next.

    The name you enter will appear in the BlueXP classification UI as the heading for scanned files that match the classifier requirements, and as the name of the filter in the Investigation page.

    You can also check the box to "Mask detected results in the system" so the full result won't appear in the UI. For example, you may want to do this to hide full credit card numbers or similar personal data (the mask would appear in the UI like this: "**** **** ****" 3434).

    A screenshot showing how to name the classifier and select the type of classifier.

  3. In the Select Data Analysis Tool page, select Custom keywords as the method you want to use to define the classifier, and then click Next.

    A screenshot showing the selection of Custom keywords as the tool that BlueXP classification will use to build the pattern.

  4. In the Create Logic page, enter the keywords you want to recognize - each word on a separate line - and click Validate.

    The screenshot below shows internal Product Names (different types of owls). The BlueXP classification search for these items is not case sensitive.

    A screenshot of entering the keywords for your custom classifier.

  5. Click Done and BlueXP classification starts to rescan your data.

Results

After the scan is complete, the results will include this new information in the Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter.

A screenshot showing an example of custom keyword results in the Investigation Results pane.

As you can see, the name of the classifier is used as the name in the Personal Results panel. In this manner you can activate many different groups of keywords and see the results for each group.

Add custom personal data identifiers using a regex

You can add a personal pattern to identify specific information in your data using a custom regular expression (regex). This allows you to create a new custom regex to identify new personal information elements that don't yet exist in the system. The regex is added to the existing predefined patterns that BlueXP classification already uses, and the results will be visible under the personal patterns section.

For example, you may want to see where your internal Product IDs are mentioned in all of your files. If the Product ID has a clear structure, for example, it is a 12-digit number that starts with 201, you can use the custom regex feature to search for it in your files. The regular expression for this example is \b201\d{9}\b.

After adding the regex, BlueXP classification will restart scanning all data sources. After the scan has completed, the new results will appear in the BlueXP classification Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter.

See https://regex101.com/ if you need assistance in building the regular expression you need. Choose Python for the Flavor to see the types of results BlueXP classification will match from the regular expression.

Note Currently we do not allow the use of pattern flags when creating a regex - this means you should not use "/".
Steps
  1. From the Classification settings tab, click Add New Classifier to launch the Add Custom Classifier wizard.

    A screenshot showing how to launch the Add Custom Classifier wizard.

  2. In the Select type page, enter the name of the classifier, provide a brief description, select Personal identifier, and then click Next.

    The name you enter will appear in the BlueXP classification UI as the heading for scanned files that match the classifier requirements, and as the name of the filter in the Investigation page. You can also check the box to "Mask detected results in the system" so the full result won't appear in the UI. For example, you may want to do this to hide full credit card numbers or similar personal data.

    A screenshot showing how to name the classifier and select the type of classifier.

  3. In the Select Data Analysis Tool page, select Custom regular expression as the method you want to use to define the classifier, and then click Next.

    A screenshot showing the selection of Custom regular expression as the tool that BlueXP classification will use to build the pattern.

  4. In the Create Logic page, enter the regular expression and any proximity words, and click Done.

    1. You can enter any legal regular expression. Click the Validate button to have BlueXP classification verify that the regular expression is valid, and that it is not too broad — meaning it will return too many results.

    2. Optionally, you can enter some proximity words to help refine the accuracy of the results. These are words that will typically be found within 300 characters of the pattern you are searching for (either before or after the found pattern). Enter each word, or phrase, on a separate line.

      A screenshot of entering the regex and proximity words for your custom classifier.

Results

The classifier is added and BlueXP classification starts to rescan all your data sources. You are returned to the Custom Classifiers page where you can view the number of files that have matched your new classifier. Results from scanning all of your data sources will take some time depending on the number of files that need to be scanned.

A screenshot showing the results of a new regex classifier being added to the system with scanning in progress.

Add custom categories

BlueXP classification takes the data that it scans and divides it into different types of categories. Categories are topics based on artificial intelligence analysis of the content and metadata of each file. See the list of predefined categories.

Categories can help you understand what's happening with your data by showing you the types of information that you have. For example, a category like resumes or employee contracts may include sensitive data. When you investigate the results, you might find that employee contracts are stored in an insecure location. You can then correct that issue.

You can add custom categories to BlueXP classification so you can identify where categories of information that are unique for your data estate are found in your data. You add each category by creating "training" files that contain the categories of data that you want to identify, and then have BlueXP classification scan those files to "learn" through AI so that it can identify that data in your data sources. The categories are added to the existing predefined categories that BlueXP classification already identifies, and the results are visible under the Categories section.

For example, you may want to see where compressed installation files in .gz format are located in your files so that you can remove them, if necessary.

After updating the custom categories, BlueXP classification will restart scanning all data sources. After the scan has completed, the new results will appear in the BlueXP classification Compliance Dashboard under the "Categories" section, and in the Investigation page in the "Category" filter. See how to view files by categories.

What you'll need

You'll need to create a minimum of 25 training files that contain samples of the categories of data that you want BlueXP classification to recognize. The following file types are supported:

.CSV, .DOC, .DOCX, .GZ, .JSON, .PDF, .PPTX, .RTF, .TXT, .XLS, .XLSX, Docs, Sheets, and Slides

The files must be a minimum of 100 bytes, and they must be located in a folder that is accessible by BlueXP classification.

Steps
  1. From the Classification settings tab, click Add New Classifier to launch the Add Custom Classifier wizard.

    A screenshot showing how to launch the Add Custom Classifier wizard.

  2. In the Select type page, enter the name of the classifier, provide a brief description, select Category, and then click Next.

    The name you enter will appear in the BlueXP classification UI as the heading for scanned files that match the category of data you are defining, and as the name of the filter in the Investigation page.

    A screenshot showing how to name the classifier and select the type of classifier.

  3. In the Create Logic page, make sure you have the learning files prepared, and then click Select files.

    A screenshot of the Create Logic page where you add the files that contain data that you want BlueXP classification to learn from.

  4. Enter the IP address of the volume, and the path where the training files are located, and click Add.

    A screenshot showing how to enter the location of the training files.

  5. Verify that the training files were recognized by BlueXP classification. Click the x to remove any training files that do not meet the requirements. Then click Done.

    A screenshot showing the files that BlueXP classification will use as training files that define the new category.

Results

The new category is created as defined by the training files and added to BlueXP classification. Then BlueXP classification starts to rescan all your data sources to identify files that fit into this new category. You are returned to the Custom Classifiers page where you can view the number of files that have matched your new category. Results from scanning all of your data sources will take some time depending on the number of files that need to be scanned.

View results from your custom classifiers

You can view the results from any of your custom classifiers in the Compliance Dashboard and in the Investigation page. For example, this screenshot shows the matched information in the Compliance Dashboard under the "Personal Results" section.

A screenshot showing an example of custom regex results in the Investigation Results pane.

Click the circle with an arrow button to see the detailed results in the Investigation page.

Additionally, all of your custom classifier results appear in the Custom Classifiers tab, and the top 6 custom classifier results are displayed in the Compliance Dashboard, as shown below.

A screenshot showing the top 3 custom classifiers based on returned results.

Manage custom classifiers

You can change any of the custom classifiers that you have created by using the Edit Classifier button.

Tip You can't edit Data Fusion classifiers at this time.

And if you decide at some later point that you don't need BlueXP classification to identify the custom patterns that you added, you can use the Delete Classifier button to remove each item.

A screenshot of the Custom Classifiers page with the buttons to edit and delete a classifier.