Add personal data identifiers to your BlueXP classification scans
BlueXP classification provides many ways for you to add a custom list of "personal data" that BlueXP classification will identify in future scans, giving you the full picture about where potentially sensitive data resides in all your organizations' files.
NOTE This information is relevant only for BlueXP classification legacy versions 1.30 and earlier.
-
You can add unique identifiers based on specific columns in databases you are scanning.
-
You can add custom keywords from a text file — these words are identified within your data.
-
You can add a personal pattern using a regular expression (regex) — the regex is added to the existing predefined patterns.
-
You can add custom categories to identify where specific categories of information are found in your data.
All of these mechanisms to add custom scanning criteria are supported in all languages.
The capabilities described in this section are available only if you have chosen to perform a full classification scan on your data sources. Data sources that have had a mapping-only scan do not show file-level details. |
Add custom personal data identifiers from your databases
A feature we call Data Fusion allows you to scan your organizations' data to identify whether unique identifiers from your databases are found in any of your other data sources. You can choose the additional identifiers that BlueXP classification will look for in its' scans by selecting a specific column, or columns, in a database table. For example, the diagram below shows how data fusion is used to scan your volumes, buckets, and databases for occurrences of all your Customer IDs from your Oracle database.
As you can see, two unique Customer IDs have been found in two volumes and in one S3 bucket. Any matches in database tables will also be identified.
Note that since you're scanning your own databases, whatever language your data is stored in will be used to identify data in future BlueXP classification scans.
You must have added at least one database server to BlueXP classification before you can add data fusion sources.
-
In the Configuration page, click Manage Data Fusion in the database where the source data resides.
-
Click Add Data Fusion source on the next page.
-
In the Add Data Fusion Source page:
-
Select the Database Schema from the drop-down menu.
-
Enter the Table name in that schema.
-
Enter the Column, or Columns, that contain the unique identifiers you want to use.
When adding multiple columns, enter each column name, or table view name, on a separate line.
-
-
Click Add Data Fusion Source.
After the next scan, the results will include this new information in the Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter. The name you used for the classifier appears in the filter list, for example Customers.CustomerID
.
Delete a Data Fusion source
If at some point you decide not to scan your files using a certain Data Fusion source, you can select the source row from the Data Fusion inventory page and click Delete Data Fusion Source.
Add custom keywords from a list of words
You can add custom keywords to BlueXP classification so that it will identify where that information is found in your data. You add the keywords just by entering each word you want BlueXP classification to recognize. The keywords are added to the existing predefined keywords that BlueXP classification already uses, and the results will be visible under the personal patterns section.
For example, you may want to see where internal Product Names are mentioned in all of your files to make sure these names are not accessible in locations that are not secure.
After updating the custom keywords, BlueXP classification will restart scanning all data sources. After the scan has completed, the new results will appear in the BlueXP classification Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter.
-
From the Classification settings tab, click Add New Classifier to launch the Add Custom Classifier wizard.
-
In the Select type page, enter the name of the classifier, provide a brief description, select Personal identifier, and then click Next.
The name you enter will appear in the BlueXP classification UI as the heading for scanned files that match the classifier requirements, and as the name of the filter in the Investigation page.
You can also check the box to "Mask detected results in the system" so the full result won't appear in the UI. For example, you may want to do this to hide full credit card numbers or similar personal data (the mask would appear in the UI like this: "**** **** ****" 3434).
-
In the Select Data Analysis Tool page, select Custom keywords as the method you want to use to define the classifier, and then click Next.
-
In the Create Logic page, enter the keywords you want to recognize - each word on a separate line - and click Validate.
The screenshot below shows internal Product Names (different types of owls). The BlueXP classification search for these items is not case sensitive.
-
Click Done and BlueXP classification starts to rescan your data.
After the scan is complete, the results will include this new information in the Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter.
As you can see, the name of the classifier is used as the name in the Personal Results panel. In this manner you can activate many different groups of keywords and see the results for each group.
Add custom personal data identifiers using a regex
You can add a personal pattern to identify specific information in your data using a custom regular expression (regex). This allows you to create a new custom regex to identify new personal information elements that don't yet exist in the system. The regex is added to the existing predefined patterns that BlueXP classification already uses, and the results will be visible under the personal patterns section.
For example, you may want to see where your internal Product IDs are mentioned in all of your files. If the Product ID has a clear structure, for example, it is a 12-digit number that starts with 201, you can use the custom regex feature to search for it in your files. The regular expression for this example is \b201\d{9}\b.
After adding the regex, BlueXP classification will restart scanning all data sources. After the scan has completed, the new results will appear in the BlueXP classification Compliance Dashboard under the "Personal Results" section, and in the Investigation page in the "Personal Data" filter.
If you need assistance in building the regular expression, refer to Regular expressions 101. Choose Python for the Flavor to see the types of results BlueXP classification will match from the regular expression. The Python Regex Tester page is also useful by displaying a graphical representation of your patterns.
Currently we do not allow the use of pattern flags when creating a regex - this means you should not use "/". |
-
From the Classification settings tab, click Add New Classifier to launch the Add Custom Classifier wizard.
-
In the Select type page, enter the name of the classifier, provide a brief description, select Personal identifier, and then click Next.
The name you enter will appear in the BlueXP classification UI as the heading for scanned files that match the classifier requirements, and as the name of the filter in the Investigation page. You can also check the box to "Mask detected results in the system" so the full result won't appear in the UI. For example, you may want to do this to hide full credit card numbers or similar personal data.
-
In the Select Data Analysis Tool page, select Custom regular expression as the method you want to use to define the classifier, and then click Next.
-
In the Create Logic page, enter the regular expression and any proximity words, and click Done.
-
You can enter any legal regular expression. Click the Validate button to have BlueXP classification verify that the regular expression is valid, and that it is not too broad — meaning it will return too many results.
-
Optionally, you can enter some proximity words to help refine the accuracy of the results. These are words that will typically be found within 300 characters of the pattern you are searching for (either before or after the found pattern). Enter each word, or phrase, on a separate line.
-
The classifier is added and BlueXP classification starts to rescan all your data sources. You are returned to the Custom Classifiers page where you can view the number of files that have matched your new classifier. Results from scanning all of your data sources will take some time depending on the number of files that need to be scanned.
Add custom categories
BlueXP classification takes the data that it scans and divides it into different types of categories. Categories are topics based on artificial intelligence analysis of the content and metadata of each file. See the list of predefined categories.
Categories can help you understand what's happening with your data by showing you the types of information that you have. For example, a category like resumes or employee contracts may include sensitive data. When you investigate the results, you might find that employee contracts are stored in an insecure location. You can then correct that issue.
You can add custom categories to BlueXP classification so you can identify where categories of information that are unique for your data estate are found in your data. You add each category by creating "training" files that contain the categories of data that you want to identify, and then have BlueXP classification scan those files to "learn" through AI so that it can identify that data in your data sources. The categories are added to the existing predefined categories that BlueXP classification already identifies, and the results are visible under the Categories section.
For example, you may want to see where compressed installation files in .gz format are located in your files so that you can remove them, if necessary.
After updating the custom categories, BlueXP classification will restart scanning all data sources. After the scan has completed, the new results will appear in the BlueXP classification Compliance Dashboard under the "Categories" section, and in the Investigation page in the "Category" filter. See how to view files by categories.
You'll need to create a minimum of 25 training files that contain samples of the categories of data that you want BlueXP classification to recognize. The following file types are supported:
.CSV, .DOC, .DOCX, .GZ, .JSON, .PDF, .PPTX, .RTF, .TXT, .XLS, .XLSX, Docs, Sheets, and Slides
The files must be a minimum of 100 bytes, and they must be located in a folder that is accessible by BlueXP classification.
-
From the Classification settings tab, click Add New Classifier to launch the Add Custom Classifier wizard.
-
In the Select type page, enter the name of the classifier, provide a brief description, select Category, and then click Next.
The name you enter will appear in the BlueXP classification UI as the heading for scanned files that match the category of data you are defining, and as the name of the filter in the Investigation page.
-
In the Create Logic page, make sure you have the learning files prepared, and then click Select files.
-
Enter the IP address of the volume, and the path where the training files are located, and click Add.
-
Verify that the training files were recognized by BlueXP classification. Click the x to remove any training files that do not meet the requirements. Then click Done.
The new category is created as defined by the training files and added to BlueXP classification. Then BlueXP classification starts to rescan all your data sources to identify files that fit into this new category. You are returned to the Custom Classifiers page where you can view the number of files that have matched your new category. Results from scanning all of your data sources will take some time depending on the number of files that need to be scanned.
View results from your custom classifiers
You can view the results from any of your custom classifiers in the Compliance Dashboard and in the Investigation page. For example, this screenshot shows the matched information in the Compliance Dashboard under the "Personal Results" section.
Click the button to see the detailed results in the Investigation page.
Additionally, all of your custom classifier results appear in the Custom Classifiers tab, and the top 6 custom classifier results are displayed in the Compliance Dashboard, as shown below.
Manage custom classifiers
You can change any of the custom classifiers that you have created by using the Edit Classifier button.
You can't edit Data Fusion classifiers at this time. |
And if you decide at some later point that you don't need BlueXP classification to identify the custom patterns that you added, you can use the Delete Classifier button to remove each item.