Test plan

Contributors

This document follows MLPerf Inference v0.7 code, MLPerf Inference v1.1 code, and rules. We ran MLPerf benchmarks designed for inference at the edge as defined in the follow table.

Area Task Model Dataset QSL size Quality Multistream latency constraint

Vision

Image
classification

Resnet50v1.5

ImageNet (224x224)

1024

99% of
FP32

50ms

Vision

Object detection (large)

SSD-
ResNet34

COCO
(1200x1200)

64

99% of
FP32

66ms

Vision

Object detection (small)

SSD-
MobileNetsv1

COCO
(300x300)

256

99% of
FP32

50ms

Vision

Medical image segmentation

3D UNET

BraTS 2019
(224x224x160)

16

99% and 99.9% of
FP32

n/a

Speech

Speech-to-
text

RNNT

Librispeech dev-clean

2513

99% of
FP32

n/a

Language

Language processing

BERT

SQuAD v1.1

10833

99% of
FP32

n/a

The following table presents Edge benchmark scenarios.

Area Task Scenarios

Vision

Image classification

Single stream, offline, multistream

Vision

Object detection (large)

Single stream, offline, multistream

Vision

Object detection (small)

Single stream, offline, multistream

Vision

Medical image segmentation

Single stream, offline

Speech

Speech-to-text

Single stream, offline

Language

Language processing

Single stream, offline

We performed these benchmarks using the networked storage architecture developed in this validation and compared results to those from local runs on the edge servers previously submitted to MLPerf. The comparison is to determine how much impact the shared storage has on inference performance.