FAQ for NetApp AI Data Engine
This FAQ covers common questions about NetApp AI Data Engine (AIDE), including its architecture, deployment, user types, technical features, integration, and licensing.
AIDE basics
NetApp AI Data Engine (AIDE) is a storage-integrated AI data service that spans the entire AI lifecycle from discovering and preparing raw data to providing retrieval endpoints to power generative AI (GenAI), Retrieval-Augmented Generation (RAG), agentic AI, and AI factories. AIDE automates sync and change detection, providing a unified, up-to-date view of selected data for data discovery and curation.
AIDE integrates directly with NetApp ONTAP storage systems to create a global, structured view of the entire NetApp data estate with automated change detection and synchronization. AIDE provides real-time vectorization with compression and deduplication, policy-driven guardrails, and integration with AI tools.
Users and roles
Primary users of AIDE include:
-
ONTAP storage administrators: Manage infrastructure, AI-specific storage needs, security, and compliance.
-
Data engineers: Manage data movement, preparation, and integration across environments.
-
Data scientists: Prepare and transform the relevant data for AI consumption.
Requirements and Deployment
AIDE requires AFX systems for deployment (including an AFX controller, disk shelf, and network switch), but can use cluster data from clusters running ONTAP 9 using SnapMirror and cluster peering. At least four AFX controller nodes are required for AIDE deployments to ensure high availability and performance.
AIDE runs on a NetApp data compute node (DCN). Three DCNs are required. The DCN hosts the AIDE software, which includes the Metadata Engine, Data Sync, Data Curator, and Data Guardrails.
No. The DCN is a NetApp-provided data compute hardware node and is the only deployment mechanism for the AI Data Engine.
Exactly three DCNs are required.
The DCNs run a NetApp-provided software stack with AIDE.
No. AIDE requires AFX for deployment. AIDE uses Trident to consume the AFX volumes for internal storage (persistent volumes). The AFX cluster providing storage for AIDE can be peered with an ONTAP 9 system or cluster. It uses cluster peering and SnapMirror to sync data from the remote ONTAP cluster to the AFX system.
Management and Interfaces
The AIDE Console is a separate management interface that runs on DCNs. You use the AIDE Console to manage AIDE services, such as Data Guardrails and Data Curator. You can also use ONTAP System Manager to monitor the AIDE cluster.
Features and Capabilities
There are four main features of AIDE:
-
Automatically generates a structured, up-to-date, interactive view of your data.
-
Works with data stored on ONTAP.
-
Enables data practitioners to collaborate with storage admins to find and understand data.
-
APIs query metadata to provide capabilities while reducing NFS traffic load on storage systems.
-
Metadata extraction and cataloging capability is built specifically for AIDE and works on a continuous basis and leverages ONTAP capabilities like snapshots.
-
Maintains data recency automatically as source data changes without manual intervention.
-
Admins define the data refresh interval in days or hours.
-
Provides incremental data mobility and sync across the data to eliminate redundant copies of AI data.
-
Automatically identifies and protects sensitive data throughout the AI lifecycle. It's accessible through AI Data Engine Console.
-
Continuously scans, classifies, and categorizes data.
-
Identifies sensitive data (such as PII) and risks.
-
Facilitates the creation of policies for automatic handling of sensitive data in line with company and regulatory standards.
-
Provides automatic sensitive information redaction for data protection.
-
Restricts access to sensitive files as necessary.
-
Allows data scientists to search across storage for relevant data.
-
Creates curated data collections with data existing on AFX volumes.
-
Generates vector embeddings at the storage layer to reduce data bloat and increase performance.
-
Provides a retrieval endpoint for AI applications with vector semantic search and re-ranking.
Integration and Interoperability
AIDE can connect to multiple ONTAP clusters using SnapMirror and cluster peering, enabling centralized metadata visibility.
AIDE stores metadata on the connected AFX cluster using a persistent volume provided by AFX. The DCNs use local storage for internal operations.
No. The Metadata Engine catalogs filesystem metadata and provides APIs to query this cataloged metadata.
AIDE supports ONTAP volumes (local or remote) as data sources. Remote ONTAP clusters must run ONTAP 9 and be connected via cluster peering and SnapMirror.
ONTAP S3 buckets and StorageGRID objects are not supported as data sources in AIDE 9.18.1.
AIDE supports a wide range of file types including PDF, DOCX, PPTX, TXT, and image files with OCR capabilities.
AIDE supports English-language data only.
AIDE provides a RAG API endpoint accessible through direct API calls or through a Model Context Protocol (MCP) server. This supports integration with agentic AI frameworks and tools.
Deployment and Licensing
AIDE is deployed on-premises on AFX infrastructure with DCNs. It integrates directly with NetApp ONTAP AFX installations.
AIDE requires a software license to run Data Guardrails and Data Curator.
If you require only the Metadata Engine, the ONTAP One license, which is included with all AFX systems, provides entitlement for Metadata Engine-only capabilities.