Skip to main content
AI Data Engine

FAQ for NetApp AI Data Engine

Contributors netapp-dbagwell

This FAQ covers common questions about NetApp AI Data Engine (AIDE), including its architecture, deployment, user types, technical features, integration, and licensing.

AIDE basics

What is NetApp AI Data Engine?

NetApp AI Data Engine (AIDE) is a storage-integrated AI data service that spans the entire AI lifecycle from discovering and preparing raw data to providing retrieval endpoints to power generative AI (GenAI), Retrieval-Augmented Generation (RAG), agentic AI, and AI factories. AIDE automates sync and change detection, providing a unified, up-to-date view of selected data for data discovery and curation.

How does AIDE work?

AIDE integrates directly with NetApp ONTAP storage systems to create a global, structured view of the entire NetApp data estate with automated change detection and synchronization. AIDE provides real-time vectorization with compression and deduplication, policy-driven guardrails, and integration with AI tools.

Users and roles

Who uses AI Data Engine?

Primary users of AIDE include:

  1. ONTAP storage administrators: Manage infrastructure, AI-specific storage needs, security, and compliance.

  2. Data engineers: Manage data movement, preparation, and integration across environments.

  3. Data scientists: Prepare and transform the relevant data for AI consumption.

Requirements and Deployment

What deployment options are available for AIDE?

AIDE offers two deployment options:

  • NetApp data compute nodes (DCN) deployment: AIDE runs on NetApp-provided data compute nodes with integrated GPU resources, delivering full AIDE capabilities including metadata, vectorization, and RAG endpoints.

  • AIDE software on third-party servers: AIDE software runs on customer-provided RHEL 9.7 servers using supported third-party hardware. A Metadata Engine basic deployment provides metadata cataloging and discovery capabilities but does not include GPU-dependent features.

What hardware is required for NetApp DCN deployments?

NetApp DCN deployments require AFX systems (including an AFX controller, disk shelf, and network switch) and three NetApp data compute nodes. At least four AFX controller nodes are required to ensure high availability and performance.

What hardware is required for AIDE software with Metadata Engine basic functionality deployments on third-party servers?

AIDE software with Metadata Engine basic functionality deployments on third-party servers require:

  • Three customer-procured servers from supported vendors

  • RHEL 9.7 LTS installed on all servers

  • AFX storage system running ONTAP 9.18.1 or later for persistent storage

Can I use my own servers for full AIDE with GPU features?

AIDE 1.0.0 release supports basic Metadata Engine on third-party servers on customer-procured hardware. Full AIDE capabilities with GPU features require NetApp DCN hardware.

What is the minimum number of NetApp DCNs required?

Exactly three NetApp DCNs are required.

What OS runs on AIDE nodes?

The operating system depends on your deployment type:

  • NetApp DCN: NetApp-provided and managed software stack

  • AIDE software with Metadata Engine basic functionality on third-party servers: Red Hat Enterprise Linux (RHEL) 9.7 LTS, installed and managed by the customer

Can AIDE be deployed without AFX?

No. AIDE requires AFX for deployment. AIDE uses Trident to consume the AFX volumes for internal storage (persistent volumes). The AFX cluster providing storage for AIDE can be peered with an ONTAP 9 system or cluster. It uses cluster peering and SnapMirror to sync data from the remote ONTAP cluster to the AFX system.

Management and Interfaces

Is AIDE Console part of NetApp Console or a separate interface?

AIDE Console is a separate management interface that runs on NetApp DCNs. You use AIDE Console to manage AIDE services, such as Data Guardrails and Data Curator. You can also use ONTAP System Manager to monitor the AIDE cluster.

Features and Capabilities

What are the key features of AIDE?

AIDE provides four main features, with availability depending on your deployment type:

Metadata Engine (available in all deployments)
  • Automatically generates a structured, up-to-date, interactive view of your data.

  • Works with data stored on ONTAP.

  • Enables data practitioners to collaborate with storage admins to find and understand data.

  • APIs query metadata to provide capabilities while reducing NFS traffic load on storage systems.

  • Metadata extraction and cataloging capability is built specifically for AIDE and works on a continuous basis and leverages ONTAP capabilities like snapshots.

Data Sync (available in all deployments)
  • Maintains data recency automatically as source data changes without manual intervention.

  • Admins define the data refresh interval in days or hours.

  • Provides incremental data mobility and sync across the data to eliminate redundant copies of AI data.

Data Guardrails (NetApp DCN only with required license)
  • Automatically identifies and protects sensitive data throughout the AI lifecycle. It's accessible through AIDE Console.

  • Continuously scans, classifies, and categorizes data.

  • Identifies sensitive data (such as PII) and risks.

  • Facilitates the creation of policies for automatic handling of sensitive data in line with company and regulatory standards.

  • Full policy enforcement (automatic redaction and access restriction) requires vectorization capabilities available in NetApp DCN deployments only.

  • AIDE software with Metadata Engine basic functionality on third-party servers supports classifier-based metadata tagging but not guardrail enforcement.

Data Curator (NetApp DCN only with required license)
  • Allows data scientists to search across storage for relevant data.

  • Creates curated data collections with data existing on AFX volumes.

  • Generates vector embeddings at the storage layer to reduce data bloat and increase performance.

  • Provides a retrieval endpoint for AI applications with vector semantic search and re-ranking.

Note AIDE software with Metadata Engine basic functionality on third-party servers includes Metadata Engine and Data Sync capabilities. Data Guardrails and Data Curator require GPU resources available in NetApp DCN deployments.
What features are available with AIDE software on third-party servers compared to NetApp DCN?

AIDE software on third-party servers provides metadata-focused capabilities:

Available with AIDE software with Metadata Engine basic functionality on third-party servers:

  • Workspace creation and management

  • Automated metadata extraction and cataloging

  • Metadata search and filtering via REST APIs

  • Data Sync for automated data currency

  • Metadata export functionality

Not available with AIDE software with Metadata Engine basic functionality on third-party servers:

  • GPU-dependent services (vectorization, OCR, enrichment)

  • Data collections and vector embeddings

  • RAG endpoints for semantic search

  • Guardrail policy enforcement at retrieval time

Integration and Interoperability

Does AIDE support federated metadata across multiple ONTAP clusters?

AIDE can connect to multiple ONTAP clusters using SnapMirror and cluster peering, enabling centralized metadata visibility.

Where is the metadata stored?

AIDE stores metadata on the connected AFX cluster using a persistent volume provided by AFX. The data compute nodes use local storage for internal operations.

Does AIDE Metadata Engine classify data?

No. AIDE Metadata Engine catalogs filesystem metadata and provides APIs to query this cataloged metadata.

What data sources are supported?

AIDE supports ONTAP volumes (local or remote) as data sources. Remote ONTAP clusters must run ONTAP 9 and be connected via cluster peering and SnapMirror.

ONTAP S3 buckets and StorageGRID objects are not supported as data sources in AIDE 9.18.1.

What types of files can AIDE process for classification, vectorization, and semantic search?

AIDE supports a wide range of file types including PDF, DOCX, PPTX, TXT, and image files with OCR capabilities.

Does AIDE support classification of non-English data?

AIDE supports English-language data only.

What integrations does AIDE support?

AIDE provides a RAG API endpoint accessible through direct API calls or through a Model Context Protocol (MCP) server. This supports integration with agentic AI frameworks and tools.

Licensing

How is AIDE licensed?

AIDE licensing depends on your deployment type and required features:

NetApp DCN deployments:

  • Data Guardrails and Data Curator require the AIDE premium services license

  • Metadata Engine and Data Sync capabilities are included with the ONTAP One license (included with all AFX systems)

AIDE software with Metadata Engine basic functionality on third-party servers:

  • ONTAP One license provides entitlement for Metadata Engine and Data Sync capabilities

  • Data Guardrails and Data Curator are not available for Metadata Engine basic functionality deployments on third-party servers