Hardware and Software Requirements

Contributors netapp-dorianh kevin-hoke Download PDF of this page

All procedures outlined in this document were validated on the NetApp ONTAP AI converged infrastructure solution described in NVA-1121. This verified architecture pairs a NetApp AFF A800 all-flash storage system with the NVIDIA DGX-1 Deep Learning System using Cisco Nexus networking. For this validation exercise, two bare-metal NVIDIA DGX-1 systems, each featuring eight NVIDIA V100 GPUs, were used as Kubernetes worker nodes. A NetApp AFF A800 all-flash storage system provided a single persistent storage namespace across nodes, and two Cisco Nexus 3232C switches were used to provide network connectivity. Three virtual machines (VMs) that ran on a separate physical server outside of the ONTAP AI pod were used as Kubernetes master nodes.

Note, however, that the NetApp AI Control Plane solution that is outlined in this document is not dependent on this specific hardware. The solution is compatible with any NetApp physical storage appliance, software-defined instance, or cloud service, that supports the NFS protocol. Examples include a NetApp AFF storage system, Azure NetApp Files, NetApp Cloud Volumes Service, a NetApp ONTAP Select software-defined storage instance, or a NetApp Cloud Volumes ONTAP instance. Additionally, the solution can be implemented on any Kubernetes cluster as long as the Kubernetes version used is supported by Kubeflow and NetApp Trident. For a list of Kubernetes versions that are supported by Kubeflow, see the see the official Kubeflow documentation. For a list of Kubernetes versions that are supported by Trident, see the Trident documentation. See the following tables for the validation environment infrastructure and software version details.

Infrastructure Component Quantity Details Operating System

Deployment jump host

1

VM

Ubuntu 18.04.3 LTS

Kubernetes master nodes

3

VM

Ubuntu 18.04.3 LTS

Kubernetes worker nodes

2

NVIDIA DGX-1 (bare-metal)

NVIDIA DGX OS 4.0.5
(based on Ubuntu 18.04.2 LTS)

Storage

1 HA Pair

NetApp AFF A800

NetApp ONTAP 9.5 P1

Network connectivity

2

Cisco Nexus 3232C

Cisco NX-OS 7.0(3)I6(1)

Software Component Version

Apache Airflow

1.10.12

Apache Airflow Helm Chart

7.10.1

Cisco NX-OS

7.0(3)I6(1)

Docker

18.09.7-ce

Kubeflow

0.7.0

Kubernetes

1.15.3

NetApp ONTAP

9.6 P1

NetApp Trident

20.01

NVIDIA DeepOps

Pulled from GitHub repository () on January 28, 2030

NVIDIA DGX OS

4.0.5 (based on Ubuntu 18.04.2 LTS)

Ubuntu

18.04.4 LTS

Support

NetApp does not offer enterprise support for Apache Airflow, Docker, Kubeflow, Kubernetes, or NVIDIA DeepOps. If you are interested in a fully supported solution with capabilities similar to the NetApp AI Control Plane solution, contact NetApp about fully supported AI/ML solutions that NetApp offers jointly with partners.