Skip to main content
AI Data Engine

FAQ for NetApp AI Data Engine

Contributors netapp-dbagwell

This FAQ covers common questions about NetApp AI Data Engine (AIDE), including its architecture, deployment, user types, technical features, integration, and licensing.

AIDE basics

What is NetApp AI Data Engine (AIDE)?

NetApp AI Data Engine (AIDE) is a storage-integrated AI data service that spans the entire AI lifecycle from discovering and preparing raw data to providing retrieval endpoints to power generative AI (GenAI), Retrieval-Augmented Generation (RAG), agentic AI, and AI factories. AIDE automates sync and change detection, providing a unified, up-to-date view of selected data for data discovery and curation.

How does AIDE work?

AIDE integrates directly with NetApp ONTAP storage systems to create a global, structured view of the entire NetApp data estate with automated change detection and synchronization. AIDE provides real-time vectorization with compression and deduplication, policy-driven guardrails, and integration with AI tools.

Users and roles

Who uses the AI Data Engine?

Primary users of AIDE include:

  1. ONTAP storage administrators: Manage infrastructure, AI-specific storage needs, security, and compliance.

  2. Data engineers: Manage data movement, preparation, and integration across environments.

  3. Data scientists: Prepare and transform the relevant data for AI consumption.

Requirements and Deployment

What hardware is required?

AIDE requires AFX systems for deployment (including an AFX controller, disk shelf, and network switch), but can use cluster data from clusters running ONTAP 9 using SnapMirror and cluster peering. At least four AFX controller nodes are required for AIDE deployments to ensure high availability and performance.

AIDE runs on a NetApp data compute node (DCN). Three DCNs are required. The DCN hosts the AIDE software, which includes the Metadata Engine, Data Sync, Data Curator, and Data Guardrails.

Can I use my own DCN?

No. The DCN is a NetApp-provided data compute hardware node and is the only deployment mechanism for the AI Data Engine.

What is the minimum number of DCNs required?

Exactly three DCNs are required.

What OS runs on the DCNs?

The DCNs run a NetApp-provided software stack with AIDE.

Can AIDE be deployed without AFX?

No. AIDE requires AFX for deployment. AIDE uses Trident to consume the AFX volumes for internal storage (persistent volumes). The AFX cluster providing storage for AIDE can be peered with an ONTAP 9 system or cluster. It uses cluster peering and SnapMirror to sync data from the remote ONTAP cluster to the AFX system.

Management and Interfaces

Is the AIDE Console part of NetApp Console or a separate interface?

The AIDE Console is a separate management interface that runs on DCNs. You use the AIDE Console to manage AIDE services, such as Data Guardrails and Data Curator. You can also use ONTAP System Manager to monitor the AIDE cluster.

Features and Capabilities

What are the key features of AIDE?

There are four main features of AIDE:

Metadata Engine
  • Automatically generates a structured, up-to-date, interactive view of your data.

  • Works with data stored on ONTAP.

  • Enables data practitioners to collaborate with storage admins to find and understand data.

  • APIs query metadata to provide capabilities while reducing NFS traffic load on storage systems.

  • Metadata extraction and cataloging capability is built specifically for AIDE and works on a continuous basis and leverages ONTAP capabilities like snapshots.

Data Sync
  • Maintains data recency automatically as source data changes without manual intervention.

  • Admins define the data refresh interval in days or hours.

  • Provides incremental data mobility and sync across the data to eliminate redundant copies of AI data.

Data Guardrails
  • Automatically identifies and protects sensitive data throughout the AI lifecycle. It's accessible through AI Data Engine Console.

  • Continuously scans, classifies, and categorizes data.

  • Identifies sensitive data (such as PII) and risks.

  • Facilitates the creation of policies for automatic handling of sensitive data in line with company and regulatory standards.

  • Provides automatic sensitive information redaction for data protection.

  • Restricts access to sensitive files as necessary.

Data Curator
  • Allows data scientists to search across storage for relevant data.

  • Creates curated data collections with data existing on AFX volumes.

  • Generates vector embeddings at the storage layer to reduce data bloat and increase performance.

  • Provides a retrieval endpoint for AI applications with vector semantic search and re-ranking.

Integration and Interoperability

Does AIDE support federated metadata across multiple ONTAP clusters?

AIDE can connect to multiple ONTAP clusters using SnapMirror and cluster peering, enabling centralized metadata visibility.

Where is the metadata stored?

AIDE stores metadata on the connected AFX cluster using a persistent volume provided by AFX. The DCNs use local storage for internal operations.

Does the AIDE Metadata Engine classify data?

No. The Metadata Engine catalogs filesystem metadata and provides APIs to query this cataloged metadata.

What data sources are supported?

AIDE supports ONTAP volumes (local or remote) as data sources. Remote ONTAP clusters must run ONTAP 9 and be connected via cluster peering and SnapMirror.

ONTAP S3 buckets and StorageGRID objects are not supported as data sources in AIDE 9.18.1.

What types of files can AIDE process for classification, vectorization, and semantic search?

AIDE supports a wide range of file types including PDF, DOCX, PPTX, TXT, and image files with OCR capabilities.

Does AIDE support classification of non-English data?

AIDE supports English-language data only.

What integrations does AIDE support?

AIDE provides a RAG API endpoint accessible through direct API calls or through a Model Context Protocol (MCP) server. This supports integration with agentic AI frameworks and tools.

Deployment and Licensing

What are the deployment options?

AIDE is deployed on-premises on AFX infrastructure with DCNs. It integrates directly with NetApp ONTAP AFX installations.

How is AIDE licensed?

AIDE requires a software license to run Data Guardrails and Data Curator.

If you require only the Metadata Engine, the ONTAP One license, which is included with all AFX systems, provides entitlement for Metadata Engine-only capabilities.