Learn how AI Data Engine data engineers and data scientists work with AIDE components

04/29/2026 Contributors

As a data engineer or data scientist, you use AI Data Engine (AIDE) Console to explore workspaces you have been granted access to, create and manage data collections, perform semantic searches, and integrate retrieval endpoints into AI/ML workflows.

Data engineers focus on transforming raw data into AI-ready datasets by building collections, configuring embedding pipelines, and controlling which users can access published collections. Data scientists focus on leveraging curated datasets for analysis, model training, and GenAI applications, without managing access control or infrastructure.

Data user component access

Component	Access level	Data engineer workflow	Data scientist workflow
AIDE Console	Manage (create, edit, delete)	AIDE Console is your primary interface for day-to-day tasks, including data discovery, collection management, pipeline configuration, and publishing RAG or retrieval endpoints, for the workspaces you are authorized to access.	AIDE Console is your primary interface for data exploration, refining and versioning collections within workspaces you can access, and connecting curated datasets and retrieval endpoints to analysis, modeling, and GenAI workflows.
ONTAP REST API	Manage (create, edit, delete)	You use the REST API to automate collection lifecycle operations, trigger and monitor embedding pipelines, and programmatically integrate data workflows with external tools.	You use the REST API to programmatically access data collections, run vector search queries, and integrate retrieval endpoints into AI/ML applications and agentic frameworks.
Workspaces	View/use (read-only)	You explore your assigned workspaces to identify and understand available data sources before building collections.	You search your assigned workspaces to locate files and objects relevant to specific research or modeling tasks.
Data collections	Manage (create, edit, delete)	You build data collections by selecting and filtering source data using tags, classification, and other attributes, and you manage the full collection lifecycle from creation and versioning through publishing as RAG endpoints for AI use. You also manage which data scientists and other users can access each collection.	You create, select, annotate, version, and refine data collections within the workspaces you have been given access to. You use these collections as the basis for semantic search and GenAI workflows.
Metadata catalog	Query/use (consume for workflows)	You use the metadata catalog to evaluate and select data sources for ingestion, running queries to locate relevant files and confirm they meet the requirements of the collections you are building within your assigned workspaces.	You search and filter metadata across the workspaces you can access to locate files and objects needed for analysis or model training, relying on the catalog structure that has been built and maintained by data engineers.
Vector database	Manage embeddings/search (data engineer) Use/search (data scientist)	You trigger embedding pipelines, monitor vectorization status, configure chunking and embedding parameters, and expose retrieval endpoints backed by vector search. Applications and agents then query these endpoints via the API for semantic search and RAG workflows.	You run semantic search queries against embeddings generated by data engineer-managed pipelines and integrate retrieval results into GenAI or RAG workflows for context-aware model responses. You do not configure chunking, embeddings, or pipeline parameters.
Classifiers	Use (consume classified data)	You use classification results to annotate and tag source data during collection preparation, ensuring that content entering your pipelines is properly labeled for downstream AI workflows.	You consume pre-classified data to ensure that only compliant and relevant content is used in your analysis and modeling.

Component

Access level

Data engineer workflow

Data scientist workflow

AIDE Console

Manage (create, edit, delete)

AIDE Console is your primary interface for day-to-day tasks, including data discovery, collection management, pipeline configuration, and publishing RAG or retrieval endpoints, for the workspaces you are authorized to access.

AIDE Console is your primary interface for data exploration, refining and versioning collections within workspaces you can access, and connecting curated datasets and retrieval endpoints to analysis, modeling, and GenAI workflows.

ONTAP REST API

Manage (create, edit, delete)

You use the REST API to automate collection lifecycle operations, trigger and monitor embedding pipelines, and programmatically integrate data workflows with external tools.

You use the REST API to programmatically access data collections, run vector search queries, and integrate retrieval endpoints into AI/ML applications and agentic frameworks.

Workspaces

View/use (read-only)

You explore your assigned workspaces to identify and understand available data sources before building collections.

You search your assigned workspaces to locate files and objects relevant to specific research or modeling tasks.

Data collections

Manage (create, edit, delete)

You build data collections by selecting and filtering source data using tags, classification, and other attributes, and you manage the full collection lifecycle from creation and versioning through publishing as RAG endpoints for AI use. You also manage which data scientists and other users can access each collection.

You create, select, annotate, version, and refine data collections within the workspaces you have been given access to. You use these collections as the basis for semantic search and GenAI workflows.

Metadata catalog

Query/use (consume for workflows)

You use the metadata catalog to evaluate and select data sources for ingestion, running queries to locate relevant files and confirm they meet the requirements of the collections you are building within your assigned workspaces.

You search and filter metadata across the workspaces you can access to locate files and objects needed for analysis or model training, relying on the catalog structure that has been built and maintained by data engineers.

Vector database

Manage embeddings/search (data engineer)
Use/search (data scientist)

You trigger embedding pipelines, monitor vectorization status, configure chunking and embedding parameters, and expose retrieval endpoints backed by vector search. Applications and agents then query these endpoints via the API for semantic search and RAG workflows.

You run semantic search queries against embeddings generated by data engineer-managed pipelines and integrate retrieval results into GenAI or RAG workflows for context-aware model responses. You do not configure chunking, embeddings, or pipeline parameters.

Classifiers

Use (consume classified data)

You use classification results to annotate and tag source data during collection preparation, ensuring that content entering your pipelines is properly labeled for downstream AI workflows.

You consume pre-classified data to ensure that only compliant and relevant content is used in your analysis and modeling.

Learn how AI Data Engine data engineers and data scientists work with AIDE components

Creating your file...

Data user component access