简体中文版经机器翻译而成,仅供参考。如与英语版出现任何冲突,应以英语版为准。

测试操作步骤

本节介绍完成验证所需的任务。

前提条件

要执行本节所述的任务、您必须能够访问安装并配置了以下工具的Linux或macOS主机:

  • Kubectl (配置为访问现有Kubernetes集群)

    • 可参见安装和配置说明 "此处"

  • 适用于Kubernetes的NetApp DataOps工具包

场景1—JupyterLab中的按需推理

  1. 为AI/ML推理工作负载创建Kubernetes命名空间。

    $ kubectl create namespace inference
    namespace/inference created
  2. 使用NetApp DataOps工具包配置永久性卷、以存储要执行推理的数据。

    $ netapp_dataops_k8s_cli.py create volume --namespace=inference --pvc-name=inference-data --size=50Gi
    Creating PersistentVolumeClaim (PVC) 'inference-data' in namespace 'inference'.
    PersistentVolumeClaim (PVC) 'inference-data' created. Waiting for Kubernetes to bind volume to PVC.
    Volume successfully created and bound to PersistentVolumeClaim (PVC) 'inference-data' in namespace 'inference'.
  3. 使用NetApp DataOps工具包创建新的JupyterLab工作空间。使用`-mount- PVC`选项挂载上一步中创建的永久性卷。根据需要使用`- nvidia-GPU`选项将NVIDIA GPU分配给工作空间。

    在以下示例中、永久性卷`推理-data`会挂载到JupyterLab工作空间容器中、该容器位于` home/jovyon/data`。使用Project Jupyter官方容器映像时、`/home/jovyan`将作为JupyterLab Web界面中的顶级目录提供。

    $ netapp_dataops_k8s_cli.py create jupyterlab --namespace=inference --workspace-name=live-inference --size=50Gi --nvidia-gpu=2 --mount-pvc=inference-data:/home/jovyan/data
    Set workspace password (this password will be required in order to access the workspace):
    Re-enter password:
    Creating persistent volume for workspace...
    Creating PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-live-inference' in namespace 'inference'.
    PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-live-inference' created. Waiting for Kubernetes to bind volume to PVC.
    Volume successfully created and bound to PersistentVolumeClaim (PVC) 'ntap-dsutil-jupyterlab-live-inference' in namespace 'inference'.
    Creating Service 'ntap-dsutil-jupyterlab-live-inference' in namespace 'inference'.
    Service successfully created.
    Attaching Additional PVC: 'inference-data' at mount_path: '/home/jovyan/data'.
    Creating Deployment 'ntap-dsutil-jupyterlab-live-inference' in namespace 'inference'.
    Deployment 'ntap-dsutil-jupyterlab-live-inference' created.
    Waiting for Deployment 'ntap-dsutil-jupyterlab-live-inference' to reach Ready state.
    Deployment successfully created.
    Workspace successfully created.
    To access workspace, navigate to http://192.168.0.152:32721
  4. 使用`create jupyterlab`命令输出中指定的URL访问JupyterLab工作空间。数据目录表示已挂载到工作空间的永久性卷。

    错误:缺少图形映像

  5. 打开`data`目录并上传要执行推理的文件。将文件上传到数据目录时、这些文件会自动存储在挂载到工作空间的永久性卷上。要上传文件、请单击上传文件图标、如下图所示。

    错误:缺少图形映像

  6. 返回顶级目录并创建新的笔记本。

    错误:缺少图形映像

  7. 向笔记本电脑添加推理代码。以下示例显示了图像检测用例的推理代码。

    错误:缺少图形映像

    错误:缺少图形映像

  8. 将Protopia混淆添加到推理代码中。Protopia直接与客户合作、提供特定于使用情形的文档、不在本技术报告的范围内。以下示例显示了添加了Protopia混淆的图像检测用例的推理代码。

    错误:缺少图形映像

    错误:缺少图形映像

场景2—Kubernetes上的批处理推理

  1. 为AI/ML推理工作负载创建Kubernetes命名空间。

    $ kubectl create namespace inference
    namespace/inference created
  2. 使用NetApp DataOps工具包配置永久性卷、以存储要执行推理的数据。

    $ netapp_dataops_k8s_cli.py create volume --namespace=inference --pvc-name=inference-data --size=50Gi
    Creating PersistentVolumeClaim (PVC) 'inference-data' in namespace 'inference'.
    PersistentVolumeClaim (PVC) 'inference-data' created. Waiting for Kubernetes to bind volume to PVC.
    Volume successfully created and bound to PersistentVolumeClaim (PVC) 'inference-data' in namespace 'inference'.
  3. 使用要执行推理的数据填充新的永久性卷。

    可以通过多种方法将数据加载到PVC上。如果您的数据当前存储在与S3兼容的对象存储平台中、例如NetApp StorageGRID 或Amazon S3、则可以使用 "NetApp DataOps工具包S3 Data Mover功能"。另一种简单的方法是创建JupyterLab工作空间、然后通过JupyterLab Web界面上传文件、如""一节中的步骤3到5所述[Scenario 1 – On-demand inferencing in JupyterLab]。 "

  4. 为批处理推理任务创建Kubernetes作业。以下示例显示了一个图像检测用例的批处理推理作业。此作业会对一组映像中的每个映像执行推理、并将推理准确性指标写入到stdout。

    $ vi inference-job-raw.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: netapp-inference-raw
      namespace: inference
    spec:
      backoffLimit: 5
      template:
        spec:
          volumes:
          - name: data
            persistentVolumeClaim:
              claimName: inference-data
          - name: dshm
            emptyDir:
              medium: Memory
          containers:
          - name: inference
            image: netapp-protopia-inference:latest
            imagePullPolicy: IfNotPresent
            command: ["python3", "run-accuracy-measurement.py", "--dataset", "/data/netapp-face-detection/FDDB"]
            resources:
              limits:
                nvidia.com/gpu: 2
            volumeMounts:
            - mountPath: /data
              name: data
            - mountPath: /dev/shm
              name: dshm
          restartPolicy: Never
    $ kubectl create -f inference-job-raw.yaml
    job.batch/netapp-inference-raw created
  5. 确认推理作业已成功完成。

    $ kubectl -n inference logs netapp-inference-raw-255sp
    100%|██████████| 89/89 [00:52<00:00,  1.68it/s]
    Reading Predictions : 100%|██████████| 10/10 [00:01<00:00,  6.23it/s]
    Predicting ... : 100%|██████████| 10/10 [00:16<00:00,  1.64s/it]
    ==================== Results ====================
    FDDB-fold-1 Val AP: 0.9491256561145955
    FDDB-fold-2 Val AP: 0.9205024466101926
    FDDB-fold-3 Val AP: 0.9253013871078468
    FDDB-fold-4 Val AP: 0.9399781485863011
    FDDB-fold-5 Val AP: 0.9504280149478732
    FDDB-fold-6 Val AP: 0.9416473519339292
    FDDB-fold-7 Val AP: 0.9241631566241117
    FDDB-fold-8 Val AP: 0.9072663297546659
    FDDB-fold-9 Val AP: 0.9339648715035469
    FDDB-fold-10 Val AP: 0.9447707905560152
    FDDB Dataset Average AP: 0.9337148153739079
    =================================================
    mAP: 0.9337148153739079
  6. 将Protopia混淆添加到推理作业。您可以从Protopia中找到直接添加Protopia混淆的使用案例专用说明、该说明不在本技术报告的讨论范围内。以下示例显示了一个人脸检测用例的批处理推理作业、该用例使用0.8的字母值添加了质子模糊。此作业会在对一组图像中的每个图像执行推理之前应用程序对象模糊、然后将推理准确性指标写入stdout。

    对于alpha值0.05%、0.1、0.2、0.4、0.6、 0.8、0.9和0.95。您可以在中查看结果 ""推理准确性比较。""

    $ vi inference-job-protopia-0.8.yaml
    apiVersion: batch/v1
    kind: Job
    metadata:
      name: netapp-inference-protopia-0.8
      namespace: inference
    spec:
      backoffLimit: 5
      template:
        spec:
          volumes:
          - name: data
            persistentVolumeClaim:
              claimName: inference-data
          - name: dshm
            emptyDir:
              medium: Memory
          containers:
          - name: inference
            image: netapp-protopia-inference:latest
            imagePullPolicy: IfNotPresent
            env:
            - name: ALPHA
              value: "0.8"
            command: ["python3", "run-accuracy-measurement.py", "--dataset", "/data/netapp-face-detection/FDDB", "--alpha", "$(ALPHA)", "--noisy"]
            resources:
              limits:
                nvidia.com/gpu: 2
            volumeMounts:
            - mountPath: /data
              name: data
            - mountPath: /dev/shm
              name: dshm
          restartPolicy: Never
    $ kubectl create -f inference-job-protopia-0.8.yaml
    job.batch/netapp-inference-protopia-0.8 created
  7. 确认推理作业已成功完成。

    $ kubectl -n inference logs netapp-inference-protopia-0.8-b4dkz
    100%|██████████| 89/89 [01:05<00:00,  1.37it/s]
    Reading Predictions : 100%|██████████| 10/10 [00:02<00:00,  3.67it/s]
    Predicting ... : 100%|██████████| 10/10 [00:22<00:00,  2.24s/it]
    ==================== Results ====================
    FDDB-fold-1 Val AP: 0.8953066115834589
    FDDB-fold-2 Val AP: 0.8819580264029936
    FDDB-fold-3 Val AP: 0.8781107458462862
    FDDB-fold-4 Val AP: 0.9085731346308461
    FDDB-fold-5 Val AP: 0.9166445508275378
    FDDB-fold-6 Val AP: 0.9101178994188819
    FDDB-fold-7 Val AP: 0.8383443678423771
    FDDB-fold-8 Val AP: 0.8476311547659464
    FDDB-fold-9 Val AP: 0.8739624502111121
    FDDB-fold-10 Val AP: 0.8905468076424851
    FDDB Dataset Average AP: 0.8841195749171925
    =================================================
    mAP: 0.8841195749171925

场景3—NVIDIA Triton推理服务器

  1. 为AI/ML推理工作负载创建Kubernetes命名空间。

    $ kubectl create namespace inference
    namespace/inference created
  2. 使用NetApp DataOps工具包配置永久性卷、以用作NVIDIA Triton推理服务器的型号存储库。

    $ netapp_dataops_k8s_cli.py create volume --namespace=inference --pvc-name=triton-model-repo --size=100Gi
    Creating PersistentVolumeClaim (PVC) 'triton-model-repo' in namespace 'inference'.
    PersistentVolumeClaim (PVC) 'triton-model-repo' created. Waiting for Kubernetes to bind volume to PVC.
    Volume successfully created and bound to PersistentVolumeClaim (PVC) 'triton-model-repo' in namespace 'inference'.
  3. 将您的型号存储在中的新永久性卷上 "格式。" 这可由NVIDIA Triton推理服务器识别。

    可以通过多种方法将数据加载到PVC上。一种简单的方法是创建JupyterLab工作空间、然后通过JupyterLab Web界面上传文件、如中的步骤3到5所述[Scenario 1 – On-demand inferencing in JupyterLab]。"

  4. 使用NetApp DataOps工具包部署新的NVIDIA Triton推理服务器实例。

    $ netapp_dataops_k8s_cli.py create triton-server --namespace=inference --server-name=netapp-inference --model-repo-pvc-name=triton-model-repo
    Creating Service 'ntap-dsutil-triton-netapp-inference' in namespace 'inference'.
    Service successfully created.
    Creating Deployment 'ntap-dsutil-triton-netapp-inference' in namespace 'inference'.
    Deployment 'ntap-dsutil-triton-netapp-inference' created.
    Waiting for Deployment 'ntap-dsutil-triton-netapp-inference' to reach Ready state.
    Deployment successfully created.
    Server successfully created.
    Server endpoints:
    http: 192.168.0.152: 31208
    grpc: 192.168.0.152: 32736
    metrics: 192.168.0.152: 30009/metrics
  5. 使用Triton客户端SDK执行推理任务。以下Python代码摘录使用Triton Python客户端SDK为人脸检测用例执行推理任务。此示例调用Triton API并传递图像以进行推理。然后、Triton推理服务器接收请求、调用模型、并在API结果中返回推理输出。

    # get current frame
    frame = input_image
    # preprocess input
    preprocessed_input = preprocess_input(frame)
    preprocessed_input = torch.Tensor(preprocessed_input).to(device)
    # run forward pass
    clean_activation = clean_model_head(preprocessed_input)  # runs the first few layers
    ######################################################################################
    #          pass clean image to Triton Inference Server API for inferencing           #
    ######################################################################################
    triton_client = httpclient.InferenceServerClient(url="192.168.0.152:31208", verbose=False)
    model_name = "face_detection_base"
    inputs = []
    outputs = []
    inputs.append(httpclient.InferInput("INPUT__0", [1, 128, 32, 32], "FP32"))
    inputs[0].set_data_from_numpy(clean_activation.detach().cpu().numpy(), binary_data=False)
    outputs.append(httpclient.InferRequestedOutput("OUTPUT__0", binary_data=False))
    outputs.append(httpclient.InferRequestedOutput("OUTPUT__1", binary_data=False))
    results = triton_client.infer(
        model_name,
        inputs,
        outputs=outputs,
        #query_params=query_params,
        headers=None,
        request_compression_algorithm=None,
        response_compression_algorithm=None)
    #print(results.get_response())
    statistics = triton_client.get_inference_statistics(model_name=model_name, headers=None)
    print(statistics)
    if len(statistics["model_stats"]) != 1:
        print("FAILED: Inference Statistics")
        sys.exit(1)
    
    loc_numpy = results.as_numpy("OUTPUT__0")
    pred_numpy = results.as_numpy("OUTPUT__1")
    ######################################################################################
    # postprocess output
    clean_pred = (loc_numpy, pred_numpy)
    clean_outputs = postprocess_outputs(
        clean_pred, [[input_image_width, input_image_height]], priors, THRESHOLD
    )
    # draw rectangles
    clean_frame = copy.deepcopy(frame)  # needs to be deep copy
    for (x1, y1, x2, y2, s) in clean_outputs[0]:
        x1, y1 = int(x1), int(y1)
        x2, y2 = int(x2), int(y2)
        cv2.rectangle(clean_frame, (x1, y1), (x2, y2), (0, 0, 255), 4)
  6. 将Protopia混淆添加到推理代码中。您可以从Protopia中找到直接添加Protopia混淆的使用案例专用说明;但是、此过程不在本技术报告的讨论范围内。以下示例显示了与上一步5中显示的相同的Python代码、但添加了Protopia obfuscation。

    请注意、在将图像传递到Triton API之前、系统会对该映像应用程序模糊。因此、非混淆映像永远不会离开本地计算机。仅通过网络传递模糊映像。此工作流适用于以下情形:在受信任区域内收集数据、但随后需要传递到该受信任区域以外以进行推理。如果没有Protopia混淆、则在敏感数据不离开受信任区域的情况下、无法实施此类工作流。

    # get current frame
    frame = input_image
    # preprocess input
    preprocessed_input = preprocess_input(frame)
    preprocessed_input = torch.Tensor(preprocessed_input).to(device)
    # run forward pass
    not_noisy_activation = noisy_model_head(preprocessed_input)  # runs the first few layers
    ##################################################################
    #          obfuscate image locally prior to inferencing          #
    #          SINGLE ADITIONAL LINE FOR PRIVATE INFERENCE           #
    ##################################################################
    noisy_activation = noisy_model_noise(not_noisy_activation)
    ##################################################################
    ###########################################################################################
    #          pass obfuscated image to Triton Inference Server API for inferencing           #
    ###########################################################################################
    triton_client = httpclient.InferenceServerClient(url="192.168.0.152:31208", verbose=False)
    model_name = "face_detection_noisy"
    inputs = []
    outputs = []
    inputs.append(httpclient.InferInput("INPUT__0", [1, 128, 32, 32], "FP32"))
    inputs[0].set_data_from_numpy(noisy_activation.detach().cpu().numpy(), binary_data=False)
    outputs.append(httpclient.InferRequestedOutput("OUTPUT__0", binary_data=False))
    outputs.append(httpclient.InferRequestedOutput("OUTPUT__1", binary_data=False))
    results = triton_client.infer(
        model_name,
        inputs,
        outputs=outputs,
        #query_params=query_params,
        headers=None,
        request_compression_algorithm=None,
        response_compression_algorithm=None)
    #print(results.get_response())
    statistics = triton_client.get_inference_statistics(model_name=model_name, headers=None)
    print(statistics)
    if len(statistics["model_stats"]) != 1:
        print("FAILED: Inference Statistics")
        sys.exit(1)
    
    loc_numpy = results.as_numpy("OUTPUT__0")
    pred_numpy = results.as_numpy("OUTPUT__1")
    ###########################################################################################
    
    # postprocess output
    noisy_pred = (loc_numpy, pred_numpy)
    noisy_outputs = postprocess_outputs(
        noisy_pred, [[input_image_width, input_image_height]], priors, THRESHOLD * 0.5
    )
    # get reconstruction of the noisy activation
    noisy_reconstruction = decoder_function(noisy_activation)
    noisy_reconstruction = noisy_reconstruction.detach().cpu().numpy()[0]
    noisy_reconstruction = unpreprocess_output(
        noisy_reconstruction, (input_image_width, input_image_height), True
    ).astype(np.uint8)
    # draw rectangles
    for (x1, y1, x2, y2, s) in noisy_outputs[0]:
        x1, y1 = int(x1), int(y1)
        x2, y2 = int(x2), int(y2)
        cv2.rectangle(noisy_reconstruction, (x1, y1), (x2, y2), (0, 0, 255), 4)