Skip to main content
BlueXP classification

Investigate the data stored in your organization

Contributors netapp-tonacki amgrissino

You can investigate the data from your organization by viewing details in the Data Investigation page. You can navigate to this page from many areas of the BlueXP classification UI, including the Governance and Compliance dashboards.

Note The capabilities described in this section are available only if you have chosen to perform a full classification scan on your data sources. Data sources that have had a mapping-only scan do not show file-level details.

Filter data in the Data Investigation page

You can filter the contents of the investigation page to display only the results you want to see. This is a very powerful feature because after you've refined the data, you can use the button bar at the top of the page to perform a variety of actions, including copying files, moving files, adding a tag or AIP label to the files, and more.

If you want to download the contents of the page as a report after you've refined it, click the download button button. Go here for details about the Data Investigation report.

A screenshot of the filters available when refining the results in the investigation page.

  • The top-level tabs enable you to view data from files (unstructured data), directories (folders and file shares), or from databases (structured data).

  • The controls at the top of each column enable you to sort the results in numerical or alphabetical order.

  • The left-pane filters enable you to refine the results by selecting the attributes described in the next sections.

Filter data by sensitivity and content

Use the following filters to view how much sensitive information is contained in your data.

Filter Details

Category

Select the types of categories.

Sensitivity Level

Select the sensitivity level: Personal, Sensitive personal, or Non sensitive.

Number of identifiers

Select the range of detected sensitive identifiers per file. Includes personal data and sensitive personal data. When filtering in Directories, BlueXP classification totals the matches from all files in each folder (and sub-folders).

NOTE: The December 2023 (version 1.26.6) release temporarily removed the option to calculate the number of personal identifiable information (PII) data by Directories.

Personal Data

Select the types of personal data.

Sensitive Personal Data

Select the types of sensitive personal data.

Data Subject

Enter a data subject's full name or known identifier. Learn more about data subjects here.

Filter data by user owner and user permissions

Use the following filters to view file owners and permissions to access your data.

Filter Details

Open Permissions

Select the type of permissions within the data and within folders/shares.

User / Group Permissions

Select one or multiple user names and/or group names, or enter a partial name.

File Owner

Enter the file owner name.

Number of users with access

Select one or multiple category ranges to show which files and folders are open to a certain number of users.

Filter data by time

Use the following filters to view data based on time criteria.

Filter Details

Created Time

Select a time range when the file was created. You can also specify a custom time range to further refine the search results.

Discovered Time

Select a time range when BlueXP classification discovered the file. You can also specify a custom time range to further refine the search results.

Last Modified

Select a time range when the file was last modified. You can also specify a custom time range to further refine the search results.

Last Accessed

Select a time range when the file, or directory (CIFS or NFS only), was last accessed. You can also specify a custom time range to further refine the search results. For the types of files that BlueXP classification scans, this is the last time BlueXP classification scanned the file.

Note that BlueXP classification does not extract the "last accessed time" from the following data sources: SharePoint Online, SharePoint On-premises (SharePoint Server), OneDrive, Google Drive, and Amazon S3.

Filter data by metadata

Use the following filters to view data based on location, size, and directory or file type.

Filter Details

File Path

Enter up to 20 partial or full paths that you want to include or exclude from the query. If you enter both include paths and exclude paths, BlueXP classification finds all files in the included paths first, then it removes files from excluded paths, and then it displays the results. Note that using "*" in this filter has no effect, and that you can't exclude specific folders from the scan - all the directories and files under a configured share will be scanned.

Directory Type

Select the directory type; either "Share" or "Folder".

File Type

Select the types of files.

File Size

Select the file size range.

File Hash

Enter the file's hash to find a specific file, even if the name is different.

Filter data by storage type

Use the following filters to view data by storage type.

Filter Details

Working Environment Type

Select the type of working environment. OneDrive, SharePoint, and Google Drive are categorized under "Apps".

Working Environment name

Select specific working environments.

Storage Repository

Select the storage repository, for example, a volume or a schema.

Filter data by tags, labels, assigned users, and policies

Use the following filters to view data by AIP labels or tags.

Filter Details

Policies

Select a policy or policies. Go here to view the list of existing policies and to create your own custom policies.

Label

Select AIP labels that are assigned to your files.

Tags

Select the tag or tags that are assigned to your files.

Assigned To

Select the name of the person to which the file is assigned.

Filter data by analysis status

Use the following filter to view data by the BlueXP classification scan status.

Filter Details

Analysis Status

Select an option to show the list of files that are Pending First Scan, Completed being scanned, Pending Rescan, or that have Failed to be scanned.

Scan Analysis Event

Select whether you want to view files that were not classified because BlueXP classification couldn't revert last accessed time, or files that were classified even though BlueXP classification couldn't revert last accessed time.

See details about the "last accessed time" timestamp for more information about the items that appear in the Investigation page when filtering using the Scan Analysis Event.

Filter data by duplicates

Use the following filter to view files that are duplicated in your storage.

Filter Details

Duplicates

Select whether the file is duplicated in the repositories.

View file metadata

In the Data Investigation results pane you can click down-caret for any single file to view the file metadata.

A screenshot showing the metadata details for a file in the Data Investigation page.

In addition to showing you the working environment and volume where the file resides, the metadata shows much more information, including the file permissions, file owner, whether there are duplicates of this file, and assigned AIP label (if you have integrated AIP in BlueXP classification). This information is useful if you're planning to create Policies because you can see all the information that you can use to filter your data.

Note that not all information is available for all data sources - just what is appropriate for that data source. For example, volume name, permissions, and AIP labels are not relevant for database files.

When viewing the details for a single file there are a few actions you can take on the file:

View permissions for files and directories

To view a list of all users or groups who have access to a file or to a directory, and the types of permissions they have, click View all Permissions. This button is available only for data in CIFS shares, SharePoint Online, SharePoint On-Premise, and OneDrive.

Note that if you see SIDs (Security IDentifiers) instead of user and group names, you should integrate your Active Directory into BlueXP classification. See how to do this.

A screenshot showing detailed file permissions.

You can click down-caret for any group to see the list of users who are part of the group.

Additionally, you can click the name of a user or a group and the Investigation page is displayed with the name of that user or group populated in the “User / Group Permissions” filter so you can see all the files and directories that the user or group has access to.

Check for duplicate files in your storage systems

You can view if duplicate files are being stored in your storage systems. This is useful if you want to identify areas where you can save storage space. It can also be helpful to make sure certain files that have specific permissions or sensitive information are not unnecessarily duplicated in your storage systems.

All of your files (not including databases) that are 1 MB or larger, and that contain personal or sensitive personal information, are compared to see if there are duplicates. You can use the Investigation page filters "File Size" along with "Duplicates" to see which files of a certain size range are duplicated in your environment.

BlueXP classification uses hashing technology to determine duplicate files. If any file has the same hash code as another file, we can be 100% sure that the files are exact duplicates — even if the file names are different.

You can download the list of duplicate files and send it to your storage admin so they can decide which files, if any, can be deleted. Or you can delete the file yourself if you are confident that a specific version of the file is not needed.

View all duplicated files

If you want a list of all files that are duplicated in the working environments and data sources you are scanning, you can use the filter called Duplicates > Has duplicates in the Data Investigation page.

All duplicated files are displayed in the Results page.

View if a specific file is duplicated

If you want to see if a single file has duplicates, in the Data Investigation results pane you can click down-caret for any single file to view the file metadata. If there are duplicates of a certain file, this information appears next to the Duplicates field.

To view the list of duplicate files and where they are located, click View Details. In the next page click View Duplicates to view the files in the Investigation page.

A screenshot showing how to view where duplicated files are located.

Tip You can use the "file hash" value provided in this page and enter it directly in the Investigation page to search for a specific duplicate file at any time - or you can use it in a Policy.

Data Investigation Report

The Data Investigation Report is a download of the filtered contents of the Data Investigation page.

The report is available in two different formats:

  • As a .CSV file that you can save to the local machine.

    This report can include a maximum of 10,000 rows of data.

  • As a .JSON file that you export to an NFS Share.

    If there are more than 250,000 rows of data, additional .JSON files are created.

    When exporting to a file share, make sure BlueXP classification has the correct permissions for export access.

There can be up to three report files downloaded if BlueXP classification is scanning files (unstructured data), directories (folders and file shares), and databases (structured data).

Generate the Data Investigation Report

Steps
  1. From the Data Investigation page, click the download button button on the top, right of the page.

  2. Select whether you want to download a .CSV report or .JSON report of the data, and click Download Report.

    When selecting a .JSON report, enter the name of the NFS share where the report will be downloaded in the format <host_name>:/<share_path>.

    A screenshot of the Download Investigation Report page with multiple options.

Result

A dialog displays a message that the reports are being downloaded.

You can view the progress of JSON report generation in the Actions Status pane.

What's included in each Data Investigation Report

The Unstructured Files Data Report includes the following information about your files:

  • File name

  • Location type

  • Working environment name

  • Storage repository (for example, a volume, bucket, shares)

  • Repository type

  • File path

  • File type

  • File size (in MB)

  • Created time

  • Last modified

  • Last accessed

  • File owner

  • Category

  • Personal information

  • Sensitive personal information

  • Open permissions

  • Scan Analysis Error

  • Deletion detection date

    A deletion detection date identifies the date that the file was deleted or moved. This enables you to identify when sensitive files have been moved. Deleted files aren't part of the file number count that appears in the dashboard or on the Investigation page. The files only appear in the CSV reports.

The Unstructured Directories Data Report includes the following information about your folders and file shares:

  • Working environment type

  • Working environment name

  • Directory name

  • Storage repository (for example, a folder or file shares)

  • Directory owner

  • Created time

  • Discovered time

  • Last modified

  • Last accessed

  • Open permissions

  • Directory type

The Structured Data Report includes the following information about your database tables:

  • DB Table name

  • Location type

  • Working environment name

  • Storage repository (for example, a schema)

  • Column count

  • Row count

  • Personal information

  • Sensitive personal information