Testing Details for Section 4.8

Contributors netapp-dorianh kevin-hoke Download PDF of this page

This section contains the testing details for the section Achieving High Cluster Utilization with Over-Quota GPU Allocation.

Submit jobs in the following order:

Project Image # GPUs Total Comment

team-a

Jupyter

1

1/4

team-a

NetApp

1

2/4

team-a

Run:AI

2

4/4

Using all their quota

team-b

Run:AI

0.6

0.6/2

Fractional GPU

team-b

Run:AI

0.4

1/2

Fractional GPU

team-b

NetApp

1

2/2

team-b

NetApp

2

4/2

Two over quota

team-c

Run:AI

0.5

0.5/2

Fractional GPU

team-c

Run:AI

0.3

0.8/2

Fractional GPU

team-c

Run:AI

0.2

1/2

Fractional GPU

team-c

NetApp

2

3/2

One over quota

team-c

NetApp

1

4/2

Two over quota

team-d

NetApp

4

4/8

Using half of their quota

Command structure:

$ runai submit <job-name> -p <project-name> -g <#GPUs> -i <image-name>

Actual command sequence used in testing:

$ runai submit a-1-1-jupyter -i jupyter/base-notebook -g 1 \
  --interactive --service-type=ingress --port 8888 \
  --args="--NotebookApp.base_url=team-a-test-ingress" --command=start-notebook.sh -p team-a
$ runai submit a-1-g -i gcr.io/run-ai-demo/quickstart -g 1 -p team-a
$ runai submit a-2-gg -i gcr.io/run-ai-demo/quickstart -g 2 -p team-a
$ runai submit b-1-g06 -i gcr.io/run-ai-demo/quickstart -g 0.6 --interactive -p team-b
$ runai submit b-2-g04 -i gcr.io/run-ai-demo/quickstart -g 0.4 --interactive -p team-b
$ runai submit b-3-g -i gcr.io/run-ai-demo/quickstart -g 1 -p team-b
$ runai submit b-4-gg -i gcr.io/run-ai-demo/quickstart -g 2 -p team-b
$ runai submit c-1-g05 -i gcr.io/run-ai-demo/quickstart -g 0.5 --interactive -p team-c
$ runai submit c-2-g03 -i gcr.io/run-ai-demo/quickstart -g 0.3 --interactive -p team-c
$ runai submit c-3-g02 -i gcr.io/run-ai-demo/quickstart -g 0.2 --interactive -p team-c
$ runai submit c-4-gg -i gcr.io/run-ai-demo/quickstart -g 2 -p team-c
$ runai submit c-5-g -i gcr.io/run-ai-demo/quickstart -g 1 -p team-c
$ runai submit d-1-gggg -i gcr.io/run-ai-demo/quickstart -g 4 -p team-d

At this point, you should have the following states:

Project GPUs Allocated Workloads Queued

team-a

4/4 (soft quota/actual allocation)

None

team-b

4/2

None

team-c

4/2

None

team-d

4/8

None

See the section Achieving High Cluster Utilization with Over-uota GPU Allocation for discussions on the proceeding testing scenario.