本繁體中文版使用機器翻譯，譯文僅供參考，若與英文版本牴觸，應以英文版本為準。

疑難排解

03/07/2024 貢獻者

PDF

請使用此處提供的指標來疑難排解安裝及使用Astra Trident時可能遇到的問題。

一般疑難排解

如果Trident Pod無法正常顯示（例如、當Trident Pod卡在中時） ContainerCreating 階段、只需不到兩個就緒容器）、就能執行 kubectl -n trident describe deployment trident 和 kubectl -n trident describe pod trident--** 可提供更多洞見。取得Kibelet記錄（例如、透過 journalctl -xeu kubelet）也很有幫助。
如果Trident記錄中沒有足夠的資訊、您可以透過傳遞來嘗試啟用Trident的偵錯模式 -d 根據您的安裝選項標示安裝參數。

然後確認是否使用設定偵錯 ./tridentctl logs -n trident 和搜尋 level=debug msg 在記錄中。
與營運者一起安裝
kubectl patch torc trident -n <namespace> --type=merge -p '{"spec":{"debug":true}}'

這會重新啟動所有Trident Pod、可能需要數秒鐘的時間。您可以查看的輸出中的「年齡」欄來檢查 kubectl get pod -n trident。

使用Astra Trident 20.07和20.10 tprov 取代 torc。
與Helm一起安裝
helm upgrade <name> trident-operator-21.07.1-custom.tgz --set tridentDebug=true`
安裝試用版
./tridentctl uninstall -n trident ./tridentctl install -d -n trident
您也可以取得每個後端的除錯記錄、方法是加入 debugTraceFlags 在後端定義中。例如、包括 debugTraceFlags: {“api”:true, “method”:true,} 取得Trident記錄中的API呼叫和方法反向。現有的後端可以有 debugTraceFlags 設定為 tridentctl backend update。
使用RedHat CoreOS時、請務必確認 iscsid 在工作節點上啟用、預設為啟動。您可以使用OpenShift機器組態或修改點火模板來完成此作業。
使用Trident時可能會遇到的常見問題 "Azure NetApp Files" 當租戶和用戶端機密來自權限不足的應用程式登錄時。如需Trident需求的完整清單、請參閱 "Azure NetApp Files" 組態：
如果在將PV安裝至容器時發生問題、請確定 rpcbind 已安裝並執行。使用主機作業系統所需的套件管理程式、並檢查是否有 rpcbind 正在執行。您可以檢查的狀態 rpcbind 執行 systemctl status rpcbind 或同等產品。
如果Trident後端報告其位於 failed 狀態儘管以前曾經工作過、但可能是因為變更與後端相關的SVM/admin認證資料所致。使用更新後端資訊 tridentctl update backend 或是退回Trident Pod即可解決此問題。
如果在容器執行時間安裝Trident with Docker時遇到權限問題、請嘗試使用安裝Trident --in cluster=false 旗標。這不會使用安裝程式pod、也會避免因為而發生權限問題 trident-installer 使用者：
使用 uninstall parameter <Uninstalling Trident> 用於在執行失敗後進行清理。根據預設、指令碼不會移除Trident所建立的客戶需求日、即使在執行中的部署中、也能安全地解除安裝及再次安裝。
如果您想要降級至舊版的 Trident 、請先執行 tridentctl uninstall 移除Trident的命令。下載所需的 "Trident版本" 並使用安裝 tridentctl install 命令。
成功安裝之後、如果有一個PVc卡在中 Pending 階段、執行中 kubectl describe pvc 可提供有關Trident為何無法為此PVc配置PV的其他資訊。

使用運算子的 Trident 部署不成功

如果您使用運算子部署Trident、則狀態為 TridentOrchestrator 變更來源 Installing 至 Installed。如果您觀察到 Failed 狀態、而且操作員無法自行恢復、您應該執行下列命令來檢查操作員的記錄：

tridentctl logs -l trident-operator

追蹤Trident運算子容器的記錄可以指出問題所在。例如、其中一個問題可能是無法從無線環境中的上游登錄擷取所需的容器映像。

為了瞭解為什麼 Trident 的安裝失敗、您可以
請看一下 TridentOrchestrator 狀態。

kubectl describe torc trident-2
Name:         trident-2
Namespace:
Labels:       <none>
Annotations:  <none>
API Version:  trident.netapp.io/v1
Kind:         TridentOrchestrator
...
Status:
  Current Installation Params:
    IPv6:
    Autosupport Hostname:
    Autosupport Image:
    Autosupport Proxy:
    Autosupport Serial Number:
    Debug:
    Image Pull Secrets:         <nil>
    Image Registry:
    k8sTimeout:
    Kubelet Dir:
    Log Format:
    Silence Autosupport:
    Trident Image:
  Message:                      Trident is bound to another CR 'trident'
  Namespace:                    trident-2
  Status:                       Error
  Version:
Events:
  Type     Reason  Age                From                        Message
  ----     ------  ----               ----                        -------
  Warning  Error   16s (x2 over 16s)  trident-operator.netapp.io  Trident is bound to another CR 'trident'

此錯誤表示已存在 TridentOrchestrator
用於安裝Trident。因為每個 Kubernetes 叢集都只能使用
有一個 Trident 執行個體、運算子可確保在任何指定的情況下都能如此
只有一個作用中的時間 TridentOrchestrator 就是這樣
建立。

此外、觀察Trident Pod的狀態、通常會指出是否有不正確的情況。

kubectl get pods -n trident

NAME                                READY   STATUS             RESTARTS   AGE
trident-csi-4p5kq                   1/2     ImagePullBackOff   0          5m18s
trident-csi-6f45bfd8b6-vfrkw        4/5     ImagePullBackOff   0          5m19s
trident-csi-9q5xc                   1/2     ImagePullBackOff   0          5m18s
trident-csi-9v95z                   1/2     ImagePullBackOff   0          5m18s
trident-operator-766f7b8658-ldzsv   1/1     Running            0          8m17s

您可以清楚看到 Pod 無法完全初始化
因為未擷取一或多個容器映像。

若要解決此問題、您應該編輯 TridentOrchestrator CR.
或者、您也可以刪除 TridentOrchestrator，然後建立新的
其中一個定義經過修改且準確。

使用不成功的 Trident 部署 `tridentctl`

為了協助您找出錯誤所在、您可以使用再次執行安裝程式 -d 引數、可開啟偵錯模式、協助您瞭解問題所在：

./tridentctl install -n trident -d

解決此問題之後、您可以依照下列方式清理安裝、然後執行 tridentctl install 再次命令：

./tridentctl uninstall -n trident
INFO Deleted Trident deployment.
INFO Deleted cluster role binding.
INFO Deleted cluster role.
INFO Deleted service account.
INFO Removed Trident user from security context constraint.
INFO Trident uninstallation succeeded.

完全移除 Astra Trident 和 CRD

您可以完全移除 Astra Trident 和所有建立的客戶需求日、以及相關的自訂資源。

此動作無法復原。除非您想要全新安裝 Astra Trident 、否則請勿這麼做。若要在不移除客戶需求日的情況下解除安裝 Astra Trident 、請參閱 "解除安裝Astra Trident"。

Trident運算子

若要解除安裝 Astra Trident 、並使用 Trident 運算子完全移除 CRD ：

kubectl patch torc <trident-orchestrator-name> --type=merge -p '{"spec":{"wipeout":["crds"],"uninstall":true}}'

掌舵

若要解除安裝 Astra Trident 、並使用 Helm 完全移除 CRD ：

kubectl patch torc trident --type=merge -p '{"spec":{"wipeout":["crds"],"uninstall":true}}'

<code>tridentctl</code>

若要在使用解除安裝 Astra Trident 之後完全移除 CRD tridentctl

tridentctl obliviate crd

在 Kubernetes 1.26 上使用 rwx 原始區塊命名空間時、 NVMe 節點非分段失敗

如果您執行的是 Kubernetes 1.26 、則當使用含 rwx 原始區塊命名空間的 NVMe / TCP 時、節點解除暫存可能會失敗。下列案例提供故障的因應措施。或者、您也可以將 Kubernetes 升級至 1.27 。

已刪除命名空間和 Pod

請考慮將 Astra Trident 託管命名空間（ NVMe 持續磁碟區）附加至 Pod 的案例。如果您直接從 ONTAP 後端刪除命名空間、則在嘗試刪除 Pod 之後、取消暫存程序會卡住。此案例不會影響 Kubernetes 叢集或其他功能。

因應措施

從個別節點上卸載持續磁碟區（對應於該命名空間）、然後將其刪除。

封鎖 dataLIFs

 If you block (or bring down) all the dataLIFs of the NVMe Astra Trident backend, the unstaging process gets stuck when you attempt to delete the pod. In this scenario, you cannot run any NVMe CLI commands on the Kubernetes node.
.因應措施
開啟 dataLIFS 以還原完整功能。

刪除命名空間對應

 If you remove the `hostNQN` of the worker node from the corresponding subsystem, the unstaging process gets stuck when you attempt to delete the pod. In this scenario, you cannot run any NVMe CLI commands on the Kubernetes node.
.因應措施
新增 `hostNQN` 返回子系統。