View data collections in AI Data Engine

04/29/2026 Contributors

After data engineers or data scientists create and publish data collections from workspaces, you need visibility into their status, size, and impact on the AI Data Engine (AIDE) cluster.

The following instructions assume a NetApp DCN-based AIDE deployment.

If you're a storage administrator, data engineer, or data scientist, you can view data collections across ONTAP System Manager and AIDE Console.

Before you begin

You need either storage administrator privileges in ONTAP System Manager or data engineer or data scientist privileges in AIDE Console (https://<cluster_management_ip>/console) to view data collections.
At least one workspace exists with successfully extracted metadata.
Data engineers or data scientists have created and published at least one data collection from AIDE Console.
The AIDE premium services license is installed and inferencing features are enabled, so that vectorization and retrieval endpoints are active.

View cluster-wide data collections

For storage administrators, ONTAP System Manager provides a cluster-wide view of data collections and their footprint but does not allow admins to create or modify them.

Steps

In System Manager, navigate to Data Engine > Data collections.
Review the inventory summary at the top of the page:
- Total number of data collections by status
- Total space consumed by the vector database across all collections
- Vector space as a percentage of overall cluster capacity
Select an individual data collection and review:
- Collection name and description
- UUID
- Associated workspace
- Status
- Collection size
- Creator
- Last refresh time

Result

You now have a high-level view of all data collections in the cluster and their storage impact. Use this view to identify collections that are large, stale, or stuck in a non-ready state.

You can also see whether an individual data collection is actively being updated and whether any failures are blocking RAG usage.

As a storage administrator, you can monitor jobs that build and update collections from the cluster-wide Activity page and from the workspace details.

Steps

In System Manager, navigate to Data Engine > Activity.
On the Events tab:
1. Filter by type (for example, workspace, data collection) or severity.
2. Expand any event related to data collections (for example, "Data collection publish failed") to see more details.
On the Jobs tab:
1. Filter to focus on data collection indexing and publishing jobs.
2. For each job, open the peek view to see:
  - Progress percentage.
  - Start and end times.
  - Any reported error messages or warnings.
Optionally, navigate back to the affected workspace (Data Engine > Workspaces) and open its Activity tab to see events and jobs scoped only to that workspace.

Result

You can track the lifecycle of data collections, identify stalled or failed jobs, and gather contextual information to pass to data engineers, data scientists, or support.

When a data collection remains in Publishing state for an extended period, check for a corresponding long-running job in the Activity page before assuming a failure.

View data collections from AIDE Console

Data engineers and data scientists typically monitor data collections directly from AIDE Console, where they are created and published.

Steps

Log in to AIDE Console as a data engineer or data scientist.
Navigate to Data Collections and select the desired data collection.
For each collection:
1. Check the state (Draft, Publishing, Ready, or Failed).
2. Select the data collection name to review definition details (filters, included file types, classifier options, embedding settings).
3. Inspect timestamps for last publish or update.
If needed, open job details or logs (where available) to understand failures or incomplete runs.

Result

Data engineers and data scientists can iterate on collection definitions and publish them again while monitoring status and health, without involving storage administrators.

What's next?

Create data collections for RAG in AIDE Console

View data collections in AI Data Engine

Creating your file...

View cluster-wide data collections

Monitor collection-related jobs and events

View data collections from AIDE Console

What's next?