Kubeflow部署
本節說明在Kubernetes叢集中部署Kubeflow時、必須完成的工作。
先決條件
在您執行本節所述的部署練習之前、我們假設您已經執行下列工作:
-
您已經有一個正常運作的Kubernetes叢集、而且您正在執行Kubernetes支援的Kubernetes版本。如需支援版本的清單、請參閱 "官方Kubeflow文件"。
-
您已在Kubernetes叢集中安裝及設定NetApp Trident、如所述 "Trident部署與組態"。
設定預設Kubernetes StorageClass
在部署Kubeflow之前、您必須在Kubernetes叢集中指定預設StorageClass。Kubeflow部署程序會嘗試使用預設StorageClass來配置新的持續磁碟區。如果沒有將StorageClass指定為預設StorageClass、則部署將會失敗。若要在叢集內指定預設StorageClass、請從部署跨接主機執行下列工作。如果您已在叢集內指定預設StorageClass、則可以跳過此步驟。
-
將現有的其中一個StorageClass指定為預設StorageClass。以下命令範例顯示名為「ONTAP-AI - FlexVols - Retain」的StorageClass為預設StorageClass。
「ontap-non-flexgroup」Trident後端類型的最小PVc尺寸相當大。根據預設、Kubeflow會嘗試配置大小只有幾GB的PVCS。因此、您不應將使用「ONTAP-NAAS-Flexgroup」後端類型的StorageClass指定為Kubefflow部署的預設StorageClass。 |
$ kubectl get sc NAME PROVISIONER AGE ontap-ai-flexgroups-retain csi.trident.netapp.io 25h ontap-ai-flexgroups-retain-iface1 csi.trident.netapp.io 25h ontap-ai-flexgroups-retain-iface2 csi.trident.netapp.io 25h ontap-ai-flexvols-retain csi.trident.netapp.io 3s $ kubectl patch storageclass ontap-ai-flexvols-retain -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}' storageclass.storage.k8s.io/ontap-ai-flexvols-retain patched $ kubectl get sc NAME PROVISIONER AGE ontap-ai-flexgroups-retain csi.trident.netapp.io 25h ontap-ai-flexgroups-retain-iface1 csi.trident.netapp.io 25h ontap-ai-flexgroups-retain-iface2 csi.trident.netapp.io 25h ontap-ai-flexvols-retain (default) csi.trident.netapp.io 54s
使用NVIDIA DeepOps部署Kubeflow
NetApp建議使用NVIDIA DeepOps提供的Kubeflow部署工具。若要使用DeepOps部署工具在Kubernetes叢集中部署Kubeflow、請從部署跳接主機執行下列工作。
或者、您也可以依照手動部署Kubeflow "安裝說明" 在正式的Kubeflow文件中 |
-
遵循、在叢集中部署Kubeflow "Kubeflow部署指示" 在NVIDIA DeepOps GitHub網站上。
-
記下DeepOps Kubeflow部署工具所輸出的Kubeflow儀表板URL。
$ ./scripts/k8s/deploy_kubeflow.sh -x … INFO[0007] Applied the configuration Successfully! filename="cmd/apply.go:72" Kubeflow app installed to: /home/ai/kubeflow It may take several minutes for all services to start. Run 'kubectl get pods -n kubeflow' to verify To remove (excluding CRDs, istio, auth, and cert-manager), run: ./scripts/k8s_deploy_kubeflow.sh -d To perform a full uninstall : ./scripts/k8s_deploy_kubeflow.sh -D Kubeflow Dashboard (HTTP NodePort): http://10.61.188.111:31380
-
確認Kubeflow命名空間內部署的所有Pod均顯示「執行中」的「狀態」、並確認命名空間內未部署任何元件處於錯誤狀態。所有Pod可能需要幾分鐘的時間才能啟動。
$ kubectl get all -n kubeflow NAME READY STATUS RESTARTS AGE pod/admission-webhook-bootstrap-stateful-set-0 1/1 Running 0 95s pod/admission-webhook-deployment-6b89c84c98-vrtbh 1/1 Running 0 91s pod/application-controller-stateful-set-0 1/1 Running 0 98s pod/argo-ui-5dcf5d8b4f-m2wn4 1/1 Running 0 97s pod/centraldashboard-cf4874ddc-7hcr8 1/1 Running 0 97s pod/jupyter-web-app-deployment-685b455447-gjhh7 1/1 Running 0 96s pod/katib-controller-88c97d85c-kgq66 1/1 Running 1 95s pod/katib-db-8598468fd8-5jw2c 1/1 Running 0 95s pod/katib-manager-574c8c67f9-wtrf5 1/1 Running 1 95s pod/katib-manager-rest-778857c989-fjbzn 1/1 Running 0 95s pod/katib-suggestion-bayesianoptimization-65df4d7455-qthmw 1/1 Running 0 94s pod/katib-suggestion-grid-56bf69f597-98vwn 1/1 Running 0 94s pod/katib-suggestion-hyperband-7777b76cb9-9v6dq 1/1 Running 0 93s pod/katib-suggestion-nasrl-77f6f9458c-2qzxq 1/1 Running 0 93s pod/katib-suggestion-random-77b88b5c79-l64j9 1/1 Running 0 93s pod/katib-ui-7587c5b967-nd629 1/1 Running 0 95s pod/metacontroller-0 1/1 Running 0 96s pod/metadata-db-5dd459cc-swzkm 1/1 Running 0 94s pod/metadata-deployment-6cf77db994-69fk7 1/1 Running 3 93s pod/metadata-deployment-6cf77db994-mpbjt 1/1 Running 3 93s pod/metadata-deployment-6cf77db994-xg7tz 1/1 Running 3 94s pod/metadata-ui-78f5b59b56-qb6kr 1/1 Running 0 94s pod/minio-758b769d67-llvdr 1/1 Running 0 91s pod/ml-pipeline-5875b9db95-g8t2k 1/1 Running 0 91s pod/ml-pipeline-persistenceagent-9b69ddd46-bt9r9 1/1 Running 0 90s pod/ml-pipeline-scheduledworkflow-7b8d756c76-7x56s 1/1 Running 0 90s pod/ml-pipeline-ui-79ffd9c76-fcwpd 1/1 Running 0 90s pod/ml-pipeline-viewer-controller-deployment-5fdc87f58-b2t9r 1/1 Running 0 90s pod/mysql-657f87857d-l5k9z 1/1 Running 0 91s pod/notebook-controller-deployment-56b4f59bbf-8bvnr 1/1 Running 0 92s pod/profiles-deployment-6bc745947-mrdkh 2/2 Running 0 90s pod/pytorch-operator-77c97f4879-hmlrv 1/1 Running 0 92s pod/seldon-operator-controller-manager-0 1/1 Running 1 91s pod/spartakus-volunteer-5fdfddb779-l7qkm 1/1 Running 0 92s pod/tensorboard-6544748d94-nh8b2 1/1 Running 0 92s pod/tf-job-dashboard-56f79c59dd-6w59t 1/1 Running 0 92s pod/tf-job-operator-79cbfd6dbc-rb58c 1/1 Running 0 91s pod/workflow-controller-db644d554-cwrnb 1/1 Running 0 97s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/admission-webhook-service ClusterIP 10.233.51.169 <none> 443/TCP 97s service/application-controller-service ClusterIP 10.233.4.54 <none> 443/TCP 98s service/argo-ui NodePort 10.233.47.191 <none> 80:31799/TCP 97s service/centraldashboard ClusterIP 10.233.8.36 <none> 80/TCP 97s service/jupyter-web-app-service ClusterIP 10.233.1.42 <none> 80/TCP 97s service/katib-controller ClusterIP 10.233.25.226 <none> 443/TCP 96s service/katib-db ClusterIP 10.233.33.151 <none> 3306/TCP 97s service/katib-manager ClusterIP 10.233.46.239 <none> 6789/TCP 96s service/katib-manager-rest ClusterIP 10.233.55.32 <none> 80/TCP 96s service/katib-suggestion-bayesianoptimization ClusterIP 10.233.49.191 <none> 6789/TCP 95s service/katib-suggestion-grid ClusterIP 10.233.9.105 <none> 6789/TCP 95s service/katib-suggestion-hyperband ClusterIP 10.233.22.2 <none> 6789/TCP 95s service/katib-suggestion-nasrl ClusterIP 10.233.63.73 <none> 6789/TCP 95s service/katib-suggestion-random ClusterIP 10.233.57.210 <none> 6789/TCP 95s service/katib-ui ClusterIP 10.233.6.116 <none> 80/TCP 96s service/metadata-db ClusterIP 10.233.31.2 <none> 3306/TCP 96s service/metadata-service ClusterIP 10.233.27.104 <none> 8080/TCP 96s service/metadata-ui ClusterIP 10.233.57.177 <none> 80/TCP 96s service/minio-service ClusterIP 10.233.44.90 <none> 9000/TCP 94s service/ml-pipeline ClusterIP 10.233.41.201 <none> 8888/TCP,8887/TCP 94s service/ml-pipeline-tensorboard-ui ClusterIP 10.233.36.207 <none> 80/TCP 93s service/ml-pipeline-ui ClusterIP 10.233.61.150 <none> 80/TCP 93s service/mysql ClusterIP 10.233.55.117 <none> 3306/TCP 94s service/notebook-controller-service ClusterIP 10.233.10.166 <none> 443/TCP 95s service/profiles-kfam ClusterIP 10.233.33.79 <none> 8081/TCP 92s service/pytorch-operator ClusterIP 10.233.37.112 <none> 8443/TCP 95s service/seldon-operator-controller-manager-service ClusterIP 10.233.30.178 <none> 443/TCP 92s service/tensorboard ClusterIP 10.233.58.151 <none> 9000/TCP 94s service/tf-job-dashboard ClusterIP 10.233.4.17 <none> 80/TCP 94s service/tf-job-operator ClusterIP 10.233.60.32 <none> 8443/TCP 94s service/webhook-server-service ClusterIP 10.233.32.167 <none> 443/TCP 87s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/admission-webhook-deployment 1/1 1 1 97s deployment.apps/argo-ui 1/1 1 1 97s deployment.apps/centraldashboard 1/1 1 1 97s deployment.apps/jupyter-web-app-deployment 1/1 1 1 97s deployment.apps/katib-controller 1/1 1 1 96s deployment.apps/katib-db 1/1 1 1 97s deployment.apps/katib-manager 1/1 1 1 96s deployment.apps/katib-manager-rest 1/1 1 1 96s deployment.apps/katib-suggestion-bayesianoptimization 1/1 1 1 95s deployment.apps/katib-suggestion-grid 1/1 1 1 95s deployment.apps/katib-suggestion-hyperband 1/1 1 1 95s deployment.apps/katib-suggestion-nasrl 1/1 1 1 95s deployment.apps/katib-suggestion-random 1/1 1 1 95s deployment.apps/katib-ui 1/1 1 1 96s deployment.apps/metadata-db 1/1 1 1 96s deployment.apps/metadata-deployment 3/3 3 3 96s deployment.apps/metadata-ui 1/1 1 1 96s deployment.apps/minio 1/1 1 1 94s deployment.apps/ml-pipeline 1/1 1 1 94s deployment.apps/ml-pipeline-persistenceagent 1/1 1 1 93s deployment.apps/ml-pipeline-scheduledworkflow 1/1 1 1 93s deployment.apps/ml-pipeline-ui 1/1 1 1 93s deployment.apps/ml-pipeline-viewer-controller-deployment 1/1 1 1 93s deployment.apps/mysql 1/1 1 1 94s deployment.apps/notebook-controller-deployment 1/1 1 1 95s deployment.apps/profiles-deployment 1/1 1 1 92s deployment.apps/pytorch-operator 1/1 1 1 95s deployment.apps/spartakus-volunteer 1/1 1 1 94s deployment.apps/tensorboard 1/1 1 1 94s deployment.apps/tf-job-dashboard 1/1 1 1 94s deployment.apps/tf-job-operator 1/1 1 1 94s deployment.apps/workflow-controller 1/1 1 1 97s NAME DESIRED CURRENT READY AGE replicaset.apps/admission-webhook-deployment-6b89c84c98 1 1 1 97s replicaset.apps/argo-ui-5dcf5d8b4f 1 1 1 97s replicaset.apps/centraldashboard-cf4874ddc 1 1 1 97s replicaset.apps/jupyter-web-app-deployment-685b455447 1 1 1 97s replicaset.apps/katib-controller-88c97d85c 1 1 1 96s replicaset.apps/katib-db-8598468fd8 1 1 1 97s replicaset.apps/katib-manager-574c8c67f9 1 1 1 96s replicaset.apps/katib-manager-rest-778857c989 1 1 1 96s replicaset.apps/katib-suggestion-bayesianoptimization-65df4d7455 1 1 1 95s replicaset.apps/katib-suggestion-grid-56bf69f597 1 1 1 95s replicaset.apps/katib-suggestion-hyperband-7777b76cb9 1 1 1 95s replicaset.apps/katib-suggestion-nasrl-77f6f9458c 1 1 1 95s replicaset.apps/katib-suggestion-random-77b88b5c79 1 1 1 95s replicaset.apps/katib-ui-7587c5b967 1 1 1 96s replicaset.apps/metadata-db-5dd459cc 1 1 1 96s replicaset.apps/metadata-deployment-6cf77db994 3 3 3 96s replicaset.apps/metadata-ui-78f5b59b56 1 1 1 96s replicaset.apps/minio-758b769d67 1 1 1 93s replicaset.apps/ml-pipeline-5875b9db95 1 1 1 93s replicaset.apps/ml-pipeline-persistenceagent-9b69ddd46 1 1 1 92s replicaset.apps/ml-pipeline-scheduledworkflow-7b8d756c76 1 1 1 91s replicaset.apps/ml-pipeline-ui-79ffd9c76 1 1 1 91s replicaset.apps/ml-pipeline-viewer-controller-deployment-5fdc87f58 1 1 1 91s replicaset.apps/mysql-657f87857d 1 1 1 92s replicaset.apps/notebook-controller-deployment-56b4f59bbf 1 1 1 94s replicaset.apps/profiles-deployment-6bc745947 1 1 1 91s replicaset.apps/pytorch-operator-77c97f4879 1 1 1 94s replicaset.apps/spartakus-volunteer-5fdfddb779 1 1 1 94s replicaset.apps/tensorboard-6544748d94 1 1 1 93s replicaset.apps/tf-job-dashboard-56f79c59dd 1 1 1 93s replicaset.apps/tf-job-operator-79cbfd6dbc 1 1 1 93s replicaset.apps/workflow-controller-db644d554 1 1 1 97s NAME READY AGE statefulset.apps/admission-webhook-bootstrap-stateful-set 1/1 97s statefulset.apps/application-controller-stateful-set 1/1 98s statefulset.apps/metacontroller 1/1 98s statefulset.apps/seldon-operator-controller-manager 1/1 92s $ kubectl get pvc -n kubeflow NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE katib-mysql Bound pvc-b07f293e-d028-11e9-9b9d-00505681a82d 10Gi RWO ontap-ai-flexvols-retain 27m metadata-mysql Bound pvc-b0f3f032-d028-11e9-9b9d-00505681a82d 10Gi RWO ontap-ai-flexvols-retain 27m minio-pv-claim Bound pvc-b22727ee-d028-11e9-9b9d-00505681a82d 20Gi RWO ontap-ai-flexvols-retain 27m mysql-pv-claim Bound pvc-b2429afd-d028-11e9-9b9d-00505681a82d 20Gi RWO ontap-ai-flexvols-retain 27m
-
在網頁瀏覽器中、瀏覽至您在步驟2中記下的URL、即可存取Kubeflow中央儀表板。
預設使用者名稱為「admin@kubeflow.org」、預設密碼為「12341234」。若要建立其他使用者、請遵循中的指示 "官方Kubeflow文件"。