Data-to-RAG quick start for AI Data Engine
Go from a newly deployed AI Data Engine (AIDE) system to a working retrieval-augmented generation (RAG) endpoint using this workflow. Understand how storage administrators, data engineers, and data scientists collaborate using ONTAP System Manager and AIDE Console.
-
You've installed and added Data compute nodes (DCNs) to the ONTAP cluster.
-
You've installed and licensed AI Data Engine software for vectorization and guardrails.
-
You've configured OpenID Connect (OIDC) and mapped roles for admin, data engineer, and data scientist roles.
Define data scope and governanceAs a storage administrator or security administrator, you want to prepare the environment in AIDE Console and ONTAP System Manager:
-
Create one or more workspaces from local and remote data sources.
-
Configure classifiers and guardrail policies in AIDE Console.
-
Assign data engineer and data scientist access to the workspaces.
Explore workspace metadataAs a data engineer or data scientist, you want to explore the workspace metadata using AIDE Console:
-
Explore workspace metadata to understand available content.
-
Define one or more logical subsets of data that should feed RAG (for example, support articles, product manuals, or anonymized clinical notes).
Create and publish a data collectionAs a data engineer or data scientist, you want to turn the chosen subset into a RAG-ready collection:
-
Create a data collection from the workspace using selected filters.
-
Publish the data collection and monitor indexing until it reaches
Readystate. -
Copy the retrieval endpoint URI for the chosen collection and provide to data scientists or application developers.