管理集群
您可以使用带有 Astra Data Store 预览版的 kubectl 命令来管理集群。
-
安装了 kubectl 和 kubectl-astrad 插件的系统。请参见 "安装 Astra Data Store 预览版"。
添加节点
要添加的节点应属于 Kubernetes 集群,并且其配置应与集群中的其他节点类似。
-
如果新节点的 dataIP 尚未加入 ADSCluster CR ,请执行以下操作:
-
编辑 astradscluster CR 并在 ADS Data Networks Addresses 字段中添加额外的 dataIP :
~% kubectl edit astradscluster <cluster-name> -n astrads-system
响应:
adsDataNetworks: -addresses: dataIP1,dataIP2,dataIP3,dataIP4,*newdataIP*
-
保存 CR 。
-
将节点添加到 Astra Data Store 预览集群:
~% kubectl astrads nodes add --cluster <cluster-name>
-
-
否则,只需添加节点:
~% kubectl astrads nodes add --cluster <cluster-name>
-
验证是否已添加此节点:
~% kubectl astrads nodes list
将节点置于维护模式
需要执行主机维护或软件包升级时,应将节点置于维护模式。
|
此节点必须已属于 Astra Data Store 预览集群。 |
当节点处于维护模式时,您无法向集群添加节点。在此示例中,我们会将节点 nhcitj1525
置于维护模式。
-
显示节点详细信息:
~% kubectl get nodes
响应:
NAME STATUS ROLES AGE VERSION nhcitjj1525 Ready <none> 3d18h v1.20.0 nhcitjj1526 Ready <none> 3d18h v1.20.0 nhcitjj1527 Ready <none> 3d18h v1.20.0 nhcitjj1528 Ready <none> 3d18h v1.20.0 scs000039783-1 Ready control-plane,master 3d18h v1.20.0
-
确保节点尚未处于维护模式:
~% kubectl astrads maintenance list
响应(没有节点处于维护模式):
NAME NODE NAME IN MAINTENANCE MAINTENANCE STATE MAINTENANCE VARIANT
-
启用维护模式。
~% kubectl astrads maintenance create <cr-name> --node-name=<<node-name>> --variant=Node
示例:
~% kubectl astrads maintenance create maint1 --node-name="nhcitjj1525" --variant=Node Maintenance mode astrads-system/maint1 created
-
列出节点。
~% kubectl astrads nodes list
响应:
NODE NAME NODE STATUS CLUSTER NAME nhcitjj1525 Added ftap-astra-012 ...
-
检查维护模式的状态:
~% kubectl astrads maintenance list
响应:
NAME NODE NAME IN MAINTENANCE MAINTENANCE STATE MAINTENANCE VARIANT node4 nhcitjj1525 true ReadyForMaintenance Node
维护` 模式下的
将以 `false
开头,并更改为true
。M状态
从PreparingForMaintenance
更改为ReadyforMaintenance
。 -
完成节点维护后,禁用维护模式:
~% kubectl astrads maintenance update maint1 --node-name="nhcitjj1525" --variant=None
-
确保节点不再处于维护模式:
~% kubectl astrads maintenance list
更换节点
使用 kubectl 命令和 Astra Data Store 预览版替换集群中的故障节点。
-
列出所有节点:
~% kubectl astrads nodes list
响应:
NODE NAME NODE STATUS CLUSTER NAME sti-rx2540-534d.. Added cluster-multinodes-21209 sti-rx2540-535d... Added cluster-multinodes-21209 ...
-
描述集群:
~% kubectl astrads clusters list
响应:
CLUSTER NAME CLUSTER STATUS NODE COUNT cluster-multinodes-21209 created 4
-
验证故障节点上的
Node HA
是否标记为false
:~% kubectl describe astradscluster -n astrads-system
响应:
Name: cluster-multinodes-21209 Namespace: astrads-system Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"astrads.netapp.io/v1alpha1","kind":"AstraDSCluster","metadata":{"annotations":{},"name":"cluster-multinodes-21209","namespa... API Version: astrads.netapp.io/v1alpha1 Kind: AstraDSCluster State: Disabled Variant: None Node HA: false Node ID: 4 Node Is Reachable: false Node Management IP: 172.21.192.192 Node Name: sti-rx2540-532d.ctl.gdl.englab.netapp.com Node Role: Storage Node UUID: 6f6b88f3-8411-56e5-b1f0-a8e8d0c946db Node Version: 12.75.0.6167444 Status: Added
-
通过将 `AddsNode Count ' 的值减至 3 ,修改 astradscluster CR 以删除故障节点:
cat manifests/astradscluster.yaml
响应:
apiVersion: astrads.netapp.io/v1alpha1 kind: AstraDSCluster metadata: name: cluster-multinodes-21209 namespace: astrads-system spec: # ADS Node Configuration per node settings adsNodeConfig: # Specify CPU limit for ADS components # Supported value: 9 cpu: 9 # Specify Memory Limit in GiB for ADS Components. # Your kubernetes worker nodes need to have at least this much RAM free # for ADS to function correctly # Supported value: 34 memory: 34 # [Optional] Specify raw storage consumption limit. The operator will only select drives for a node up to this limit capacity: 600 # [Optional] Set a cache device if you do not want auto detection e.g. /dev/sdb # cacheDevice: "" # Set this regex filter to select drives for ADS cluster # drivesFilter: ".*" # [Optional] Specify node selector labels to select the nodes for creating ADS cluster # adsNodeSelector: # matchLabels: # customLabelKey: customLabelValue # Specify the number of nodes that should be used for creating ADS cluster adsNodeCount: 3 # Specify the IP address of a floating management IP routable from any worker node in the cluster mvip: "172..." # Comma separated list of floating IP addresses routable from any host where you intend to mount a NetApp Volume # at least one per node must be specified # addresses: 10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5 # netmask: 255.255.255.0 adsDataNetworks: - addresses: "172..." netmask: 255.255.252.0 # [Optional] Provide a k8s label key that defines which protection domain a node belongs to # adsProtectionDomainKey: "" # [Optional] Provide a monitoring config to be used to setup/configure a monitoring agent. monitoringConfig: namespace: "netapp-monitoring" repo: "docker.repo.eng.netapp.com/global/astra" autoSupportConfig: # AutoUpload defines the flag to enable or disable AutoSupport upload in the cluster (true/false) autoUpload: true # Enabled defines the flag to enable or disable automatic AutoSupport collection. # When set to false, periodic and event driven AutoSupport collection would be disabled. # It is still possible to trigger an AutoSupport manually while AutoSupport is disabled # enabled: true # CoredumpUpload defines the flag to enable or disable the upload of coredumps for this ADS Cluster # coredumpUpload: false # HistoryRetentionCount defines the number of local (not uploaded) AutoSupport Custom Resources to retain in the cluster before deletion historyRetentionCount: 25 # DestinationURL defines the endpoint to transfer the AutoSupport bundle collection destinationURL: "https://testbed.netapp.com/put/AsupPut" # ProxyURL defines the URL of the proxy with port to be used for AutoSupport bundle transfer # proxyURL: # Periodic defines the config for periodic/scheduled AutoSupport objects periodic: # Schedule defines the Kubernetes Cronjob schedule - schedule: "0 0 * * *" # PeriodicConfig defines the fields needed to create the Periodic AutoSupports periodicconfig: - component: name: storage event: dailyMonitoring userMessage: Daily Monitoring Storage AutoSupport bundle nodes: all - component: name: controlplane event: daily userMessage: Daily Control Plane AutoSupport bundle
-
验证是否已从集群中删除此节点:
~% kubectl get nodes --show-labels
响应:
NAME STATUS ROLES AGE VERSION LABELS sti-astramaster-237 Ready control-plane,master 24h v1.20.0 sti-rx2540-532d Ready <none> 24h v1.20.0 sti-rx2540-533d Ready <none> 24h
~% kubectl astrads nodes list
响应:
NODE NAME NODE STATUS CLUSTER NAME sti-rx2540-534d Added cluster-multinodes-21209 sti-rx2540-535d Added cluster-multinodes-21209 sti-rx2540-536d Added cluster-multinodes-21209
~% kubectl get nodes --show-labels
响应:
NAME STATUS ROLES AGE VERSION LABELS sti-astramaster-237 Ready control-plane,master 24h v1.20.0 beta.kubernetes.io/arch=amd64, sti-rx2540-532d Ready <none> 24h v1.20.0 astrads.netapp.io/node-removal
~% kubectl describe astradscluster -n astrads-system
响应:
Name: cluster-multinodes-21209 Namespace: astrads-system Labels: <none> Kind: AstraDSCluster Metadata: ...
-
通过修改集群 CR 将节点添加到集群以进行更换。节点数将递增至 4 。验证是否已选取新节点进行添加。
rvi manifests/astradscluster.yaml cat manifests/astradscluster.yaml apiVersion: astrads.netapp.io/v1alpha1 kind: AstraDSCluster metadata: name: cluster-multinodes-21209 namespace: astrads-system
~% kubectl apply -f manifests/astradscluster.yaml
响应:
astradscluster.astrads.netapp.io/cluster-multinodes-21209 configured
~% kubectl get pods -n astrads-system
响应:
NAME READY STATUS RESTARTS AGE astrads-cluster-controller... 1/1 Running 1 24h astrads-deployment-support... 3/3 Running 0 24h astrads-ds-cluster-multinodes-21209 1/1 Running
~% kubectl astrads nodes list
响应:
NODE NAME NODE STATUS CLUSTER NAME sti-rx2540-534d... Added cluster-multinodes-21209 sti-rx2540-535d... Added cluster-multinodes-21209
~% kubectl astrads clusters list
响应:
CLUSTER NAME CLUSTER STATUS NODE COUNT cluster-multinodes-21209 created 4
~% kubectl astrads drives list
响应:
DRIVE NAME DRIVE ID DRIVE STATUS NODE NAME CLUSTER NAME scsi-36000.. c3e197f2... Active sti-rx2540... cluster-multinodes-21209
更换驱动器
当集群中的驱动器发生故障时,必须尽快更换驱动器以确保数据完整性。驱动器发生故障时,您将在集群 CR 节点状态,集群运行状况信息和指标端点中看到故障驱动器信息。
$ kubectl get adscl -A -o yaml
响应:
... apiVersion: astrads.netapp.io/v1alpha1 kind: AstraDSCluster ... nodeStatuses: - driveStatuses: - driveID: 31205e51-f592-59e3-b6ec-185fd25888fa driveName: scsi-36000c290ace209465271ed6b8589b494 drivesStatus: Failed - driveID: 3b515b09-3e95-5d25-a583-bee531ff3f31 driveName: scsi-36000c290ef2632627cb167a03b431a5f drivesStatus: Active - driveID: 0807fa06-35ce-5a46-9c25-f1669def8c8e driveName: scsi-36000c292c8fc037c9f7e97a49e3e2708 drivesStatus: Active ...
故障驱动器 CR 会在集群中自动创建,其名称与故障驱动器的 UUID 相对应。
$ kubectl get adsfd -A -o yaml
响应:
... apiVersion: astrads.netapp.io/v1alpha1 kind: AstraDSFailedDrive metadata: name: c290a-5000-4652c-9b494 namespace: astrads-system spec: executeReplace: false replaceWith: "" status: cluster: arda-6e4b4af failedDriveInfo: failureReason: AdminFailed inUse: false name: scsi-36000c290ace209465271ed6b8589b494 path: /dev/disk/by-id/scsi-36000c290ace209465271ed6b8589b494 present: true serial: 6000c290ace209465271ed6b8589b494 node: sti-rx2540-300b.ctl.gdl.englab.netapp.com state: ReadyToReplace
~% kubectl astrads faileddrive list --cluster arda-6e4b4af
响应:
NAME NODE CLUSTER STATE AGE 6000c290 sti-rx2540-300b.lab.netapp.com ard-6e4b4af ReadyToReplace 13m
-
使用
kubectl astrad show-replacements
命令列出可能的替代驱动器,该命令可筛选符合更换限制(未在集群中使用,未挂载,无分区以及等于或大于故障驱动器)的驱动器。要在不筛选可能的替代驱动器的情况下列出所有驱动器,请在
sHow-replacements
命令中添加 ` -all` 。~% kubectl astrads faileddrive show-replacements --cluster ard-6e4b4af --name 6000c290
响应:
NAME IDPATH SERIAL PARTITIONCOUNT MOUNTED SIZE sdh /scsi-36000c29417 45000c 0 false 100GB
-
使用
replace
命令将驱动器替换为已传递的序列号。如果 ` -wait` 时间已过,则命令将完成替换或失败。~% kubectl astrads faileddrive replace --cluster arda-6e4b4af --name 6000c290 --replaceWith 45000c --wait Drive replacement completed successfully
如果使用不适当的 ` -replaceWith` 序列号执行 kubectl astrad faileddrive replace
,则会显示类似以下内容的错误:~% kubectl astrads replacedrive replace --cluster astrads-cluster-f51b10a --name 6000c2927 --replaceWith BAD_SERIAL_NUMBER Drive 6000c2927 replacement started Failed drive 6000c2927 has been set to use BAD_SERIAL_NUMBER as a replacement ... Drive replacement didn't complete within 25 seconds Current status: {FailedDriveInfo:{InUse:false Present:true Name:scsi-36000c2 FiretapUUID:444a5468 Serial:6000c Path:/scsi-36000c FailureReason:AdminFailed Node:sti-b200-0214a.lab.netapp.com} Cluster:astrads-cluster-f51b10a State:ReadyToReplace Conditions:[{Message: "Replacement drive serial specified doesn't exist", Reason: "DriveSelectionFailed", Status: False, Type:' Done"]}
-
要重新运行驱动器更换,请使用 ` -force` 和上一个命令:
~% kubectl astrads replacedrive replace --cluster astrads-cluster-f51b10a --name 6000c2927 --replaceWith VALID_SERIAL_NUMBER --force