English

Architecture

Contributors netapp-dorianh Download PDF of this page

Solution Technology

This solution is designed with a NetApp HCI system that contains the following components:

  • Two H615c compute nodes with NVIDIA T4 GPUs

  • Two H410c compute nodes

  • Two H410s storage nodes

  • Two Mellanox SN2010 10GbE/25GbE switches

Architectural Diagram

The following diagram illustrates the solution architecture for the NetApp HCI AI inferencing solution.

Error: Missing Graphic Image

The following diagram illustrates the virtual and physical elements of this solution.

Error: Missing Graphic Image

A VMware infrastructure is used to host the management services required by this inferencing solution. These services do not need to be deployed on a dedicated infrastructure; they can coexist with any existing workloads. The NetApp Deployment Engine (NDE) uses the H410c and H410s nodes to deploy the VMware infrastructure.

After NDE has completed the configuration, the following components are deployed as VMs in the virtual infrastructure:

  • Deployment Jump VM. Used to automate the deployment of NVIDIA DeepOps. See NVIDIA DeepOps and storage management using NetApp Trident.

  • ONTAP Select. An instance of ONTAP Select is deployed to provide NFS file services and persistent storage to the AI workload running on Kubernetes.

  • Kubernetes Masters. During deployment, three VMs are installed and configured with a supported Linux distribution and configured as Kubernetes master nodes. After the management services have been set up, two H615c compute nodes with NVIDIA T4 GPUs are installed with a supported Linux distribution. These two nodes function as the Kubernetes worker nodes and provide the infrastructure for the inferencing platform.

Hardware Requirements

The following table lists the hardware components that are required to implement the solution. The hardware components that are used in any particular implementation of the solution might vary based on customer requirements.

Layer Product Family Quantity Details

Compute

H615c

2

3 NVIDIA Tesla T4 GPUs per node

H410c

2

Compute nodes for management infrastructure

Storage

H410s

2

Storage for OS and workload

Network

Mellanox SN2010

2

10G/25G switches

Software Requirements

The following table lists the software components that are required to implement the solution. The software components that are used in any particular implementation of the solution might vary based on customer requirements.

Layer Software Version

Storage

NetApp Element software

12.0.0.333

ONTAP Select

9.7

NetApp Trident

20.07

NetApp HCI engine

NDE

1.8

Hypervisor

Hypervisor

VMware vSphere ESXi 6.7U1

Hypervisor Management System

VMware vCenter Server 6.7U1

Inferencing Platform

NVIDIA DeepOps

20.08

NVIDIA GPU Operator

1.1.7

Ansible

2.9.5

Kubernetes

1.17.9

Docker

Docker CE 18.09.7

CUDA Version

10.2

GPU Device Plugin

0.6.0

Helm

3.1.2

NVIDIA Tesla Driver

440.64.00

NVIDIA Triton Inference Server

2.1.0 – NGC Container v20.07

K8 Master VMs

Linux

Any supported distribution across NetApp IMT, NVIDIA DeepOps, and GPUOperator

Ubuntu 18.04.4 LTS was used in this solution
Kernel version: 4.15

Host OS/ K8 Worker Nodes

Linux

Any supported distribution across NetApp IMT, NVIDIA DeepOps, and GPUOperator

Ubuntu 18.04.4 LTS was used in this solution
Kernel version: 4.15