Skip to main content
NetApp Solutions

Achieving High Cluster Utilization with Over-Quota GPU Allocation

Contributors kevin-hoke

In this section and in the sections Basic Resource Allocation Fairness, and Over-Quota Fairness, we have devised advanced testing scenarios to demonstrate the Run:AI orchestration capabilities for complex workload management, automatic preemptive scheduling, and over-quota GPU provisioning. We did this to achieve high cluster-resource usage and optimize enterprise-level data science team productivity in an ONTAP AI environment.

For these three sections, set the following projects and quotas:

Project Quota

team-a

4

team-b

2

team-c

2

team-d

8

In addition, we use the following containers for these three sections:

  • Jupyter Notebook: jupyter/base-notebook

  • Run:AI quickstart: gcr.io/run-ai-demo/quickstart

We set the following goals for this test scenario:

  • Show the simplicity of resource provisioning and how resources are abstracted from users

  • Show how users can easily provision fractions of a GPU and integer number of GPUs

  • Show how the system eliminates compute bottlenecks by allowing teams or users to go over their resource quota if there are free GPUs in the cluster

  • Show how data pipeline bottlenecks are eliminated by using the NetApp solution when running compute-intensive jobs, such as the NetApp container

  • Show how multiple types of containers are running using the system

    • Jupyter Notebook

    • Run:AI container

  • Show high utilization when the cluster is full

For details on the actual command sequence executed during the testing, see Testing Details for Section 4.8.

When all 13 workloads are submitted, you can see a list of container names and GPUs allocated, as shown in the following figure. We have seven training and six interactive jobs, simulating four data science teams, each with their own models running or in development. For interactive jobs, individual developers are using Jupyter Notebooks to write or debug their code. Thus, it is suitable to provision GPU fractions without using too many cluster resources.

Figure showing input/output dialog or representing written content

The results of this testing scenario show the following:

  • The cluster should be full: 16/16 GPUs are used.

  • High cluster utilization.

  • More experiments than GPUs due to fractional allocation.

  • team-d is not using all their quota; therefore, team-b and team-c can use additional GPUs for their experiments, leading to faster time to innovation.