简体中文版经机器翻译而成,仅供参考。如与英语版出现任何冲突,应以英语版为准。

管理集群

您可以使用带有 Astra Data Store 预览版的 kubectl 命令来管理集群。

您需要什么? #8217 ;将需要什么

添加节点

要添加的节点应属于 Kubernetes 集群,并且其配置应与集群中的其他节点类似。

步骤
  1. 如果新节点的 dataIP 尚未加入 ADSCluster CR ,请执行以下操作:

    1. 编辑 astradscluster CR 并在 ADS Data Networks Addresses 字段中添加额外的 dataIP :

      ~% kubectl edit astradscluster <cluster-name> -n astrads-system

      响应:

    adsDataNetworks:
        -addresses:  dataIP1,dataIP2,dataIP3,dataIP4,*newdataIP*
    1. 保存 CR 。

    2. 将节点添加到 Astra Data Store 预览集群:

      ~% kubectl astrads nodes add --cluster <cluster-name>
  2. 否则,只需添加节点:

    ~% kubectl astrads nodes add --cluster <cluster-name>
  3. 验证是否已添加此节点:

    ~% kubectl astrads nodes list

将节点置于维护模式

需要执行主机维护或软件包升级时,应将节点置于维护模式。

注 此节点必须已属于 Astra Data Store 预览集群。

当节点处于维护模式时,您无法向集群添加节点。在此示例中,我们会将节点 nhcitj1525 置于维护模式。

步骤
  1. 显示节点详细信息:

    ~% kubectl get nodes

    响应:

     NAME             STATUS   ROLES                  AGE     VERSION
     nhcitjj1525      Ready    <none>                 3d18h   v1.20.0
     nhcitjj1526      Ready    <none>                 3d18h   v1.20.0
     nhcitjj1527      Ready    <none>                 3d18h   v1.20.0
     nhcitjj1528      Ready    <none>                 3d18h   v1.20.0
     scs000039783-1   Ready    control-plane,master   3d18h   v1.20.0
  2. 确保节点尚未处于维护模式:

    ~% kubectl astrads maintenance list

    响应(没有节点处于维护模式):

    NAME    NODE NAME  IN MAINTENANCE  MAINTENANCE STATE       MAINTENANCE VARIANT
  3. 启用维护模式。

    ~% kubectl astrads maintenance create <cr-name> --node-name=<<node-name>> --variant=Node

    示例:

    ~% kubectl astrads maintenance create maint1 --node-name="nhcitjj1525" --variant=Node
    Maintenance mode astrads-system/maint1 created
  4. 列出节点。

    ~% kubectl astrads nodes list

    响应:

    NODE NAME       NODE STATUS     CLUSTER NAME
    nhcitjj1525     Added           ftap-astra-012
    ...
  5. 检查维护模式的状态:

    ~% kubectl astrads maintenance list

    响应:

    NAME    NODE NAME       IN MAINTENANCE  MAINTENANCE STATE       MAINTENANCE VARIANT
    node4   nhcitjj1525     true            ReadyForMaintenance     Node

    维护` 模式下的 将以 `false 开头,并更改为 trueM状态PreparingForMaintenance 更改为 ReadyforMaintenance

  6. 完成节点维护后,禁用维护模式:

    ~% kubectl astrads maintenance update maint1 --node-name="nhcitjj1525" --variant=None
  7. 确保节点不再处于维护模式:

    ~% kubectl astrads maintenance list

更换节点

使用 kubectl 命令和 Astra Data Store 预览版替换集群中的故障节点。

步骤
  1. 列出所有节点:

    ~% kubectl astrads nodes list

    响应:

    NODE NAME           NODE STATUS    CLUSTER NAME
    sti-rx2540-534d..   Added       cluster-multinodes-21209
    sti-rx2540-535d...  Added       cluster-multinodes-21209
    ...
  2. 描述集群:

    ~% kubectl astrads clusters list

    响应:

    CLUSTER NAME               CLUSTER STATUS  NODE COUNT
    cluster-multinodes-21209   created         4
  3. 验证故障节点上的 Node HA 是否标记为 false

    ~% kubectl describe astradscluster -n astrads-system

    响应:

    Name:         cluster-multinodes-21209
    Namespace:    astrads-system
    Labels:       <none>
    Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                    {"apiVersion":"astrads.netapp.io/v1alpha1","kind":"AstraDSCluster","metadata":{"annotations":{},"name":"cluster-multinodes-21209","namespa...
    API Version:  astrads.netapp.io/v1alpha1
    Kind:         AstraDSCluster
    
    State:               Disabled
    Variant:             None
    Node HA:             false
    Node ID:             4
    Node Is Reachable:   false
    Node Management IP:  172.21.192.192
    Node Name:           sti-rx2540-532d.ctl.gdl.englab.netapp.com
    Node Role:           Storage
    Node UUID:           6f6b88f3-8411-56e5-b1f0-a8e8d0c946db
    Node Version:        12.75.0.6167444
    Status:              Added
  4. 通过将 `AddsNode Count ' 的值减至 3 ,修改 astradscluster CR 以删除故障节点:

    cat manifests/astradscluster.yaml

    响应:

    apiVersion: astrads.netapp.io/v1alpha1
    kind: AstraDSCluster
    metadata:
      name: cluster-multinodes-21209
      namespace: astrads-system
    spec:
      # ADS Node Configuration per node settings
      adsNodeConfig:
        # Specify CPU limit for ADS components
        # Supported value: 9
        cpu: 9
        # Specify Memory Limit in GiB for ADS Components.
        # Your kubernetes worker nodes need to have at least this much RAM free
        # for ADS to function correctly
        # Supported value: 34
        memory: 34
        # [Optional] Specify raw storage consumption limit. The operator will only select drives for a node up to this limit
        capacity: 600
        # [Optional] Set a cache device if you do not want auto detection e.g. /dev/sdb
        # cacheDevice: ""
        # Set this regex filter to select drives for ADS cluster
        # drivesFilter: ".*"
    
      # [Optional] Specify node selector labels to select the nodes for creating ADS cluster
      # adsNodeSelector:
      #   matchLabels:
      #     customLabelKey: customLabelValue
    
      # Specify the number of nodes that should be used for creating ADS cluster
      adsNodeCount: 3
    
      # Specify the IP address of a floating management IP routable from any worker node in the cluster
      mvip: "172..."
    
      # Comma separated list of floating IP addresses routable from any host where you intend to mount a NetApp Volume
      # at least one per node must be specified
      # addresses: 10.0.0.1,10.0.0.2,10.0.0.3,10.0.0.4,10.0.0.5
      # netmask: 255.255.255.0
      adsDataNetworks:
        - addresses: "172..."
          netmask: 255.255.252.0
    
    
      # [Optional] Provide a k8s label key that defines which protection domain a node belongs to
      # adsProtectionDomainKey: ""
    
      # [Optional] Provide a monitoring config to be used to setup/configure a monitoring agent.
      monitoringConfig:
       namespace: "netapp-monitoring"
       repo: "docker.repo.eng.netapp.com/global/astra"
    
      autoSupportConfig:
        # AutoUpload defines the flag to enable or disable AutoSupport upload in the cluster (true/false)
        autoUpload: true
        # Enabled defines the flag to enable or disable automatic AutoSupport collection.
        # When set to false, periodic and event driven AutoSupport collection would be disabled.
        # It is still possible to trigger an AutoSupport manually while AutoSupport is disabled
        # enabled: true
        # CoredumpUpload defines the flag to enable or disable the upload of coredumps for this ADS Cluster
        # coredumpUpload: false
        # HistoryRetentionCount defines the number of local (not uploaded) AutoSupport Custom Resources to retain in the cluster before deletion
        historyRetentionCount: 25
        # DestinationURL defines the endpoint to transfer the AutoSupport bundle collection
        destinationURL: "https://testbed.netapp.com/put/AsupPut"
        # ProxyURL defines the URL of the proxy with port to be used for AutoSupport bundle transfer
        # proxyURL:
        # Periodic defines the config for periodic/scheduled AutoSupport objects
        periodic:
          # Schedule defines the Kubernetes Cronjob schedule
          - schedule: "0 0 * * *"
            # PeriodicConfig defines the fields needed to create the Periodic AutoSupports
            periodicconfig:
            - component:
                name: storage
                event: dailyMonitoring
              userMessage: Daily Monitoring Storage AutoSupport bundle
              nodes: all
            - component:
                name: controlplane
                event: daily
              userMessage: Daily Control Plane AutoSupport bundle
  5. 验证是否已从集群中删除此节点:

    ~% kubectl get nodes --show-labels

    响应:

    NAME                  STATUS   ROLES               AGE   VERSION   LABELS
    sti-astramaster-237   Ready control-plane,master   24h   v1.20.0
    sti-rx2540-532d       Ready  <none>                24h   v1.20.0
    sti-rx2540-533d       Ready  <none>                24h
    ~% kubectl astrads nodes list

    响应:

    NODE NAME         NODE STATUS     CLUSTER NAME
    sti-rx2540-534d   Added           cluster-multinodes-21209
    sti-rx2540-535d   Added           cluster-multinodes-21209
    sti-rx2540-536d   Added           cluster-multinodes-21209
    ~% kubectl get nodes --show-labels

    响应:

    NAME                STATUS   ROLES                  AGE   VERSION   LABELS
    sti-astramaster-237 Ready    control-plane,master   24h   v1.20.0   beta.kubernetes.io/arch=amd64,
    sti-rx2540-532d     Ready    <none>                 24h   v1.20.0   astrads.netapp.io/node-removal
    ~% kubectl describe astradscluster -n astrads-system

    响应:

    Name:         cluster-multinodes-21209
    Namespace:    astrads-system
    Labels:       <none>
    Kind:         AstraDSCluster
    Metadata:
    ...
  6. 通过修改集群 CR 将节点添加到集群以进行更换。节点数将递增至 4 。验证是否已选取新节点进行添加。

    rvi manifests/astradscluster.yaml
    cat manifests/astradscluster.yaml
    apiVersion: astrads.netapp.io/v1alpha1
    kind: AstraDSCluster
    metadata:
      name: cluster-multinodes-21209
      namespace: astrads-system
    ~% kubectl apply -f manifests/astradscluster.yaml

    响应:

    astradscluster.astrads.netapp.io/cluster-multinodes-21209 configured
    ~% kubectl get pods -n astrads-system

    响应:

    NAME                                READY   STATUS    RESTARTS   AGE
    astrads-cluster-controller...       1/1     Running   1          24h
    astrads-deployment-support...       3/3     Running   0          24h
    astrads-ds-cluster-multinodes-21209 1/1     Running
    ~% kubectl astrads nodes list

    响应:

    NODE NAME                NODE STATUS     CLUSTER NAME
    sti-rx2540-534d...       Added           cluster-multinodes-21209
    sti-rx2540-535d...       Added           cluster-multinodes-21209
    ~% kubectl astrads clusters list

    响应:

    CLUSTER NAME                    CLUSTER STATUS  NODE COUNT
    cluster-multinodes-21209        created         4
    ~% kubectl astrads drives list

    响应:

    DRIVE NAME    DRIVE ID    DRIVE STATUS   NODE NAME     CLUSTER NAME
    scsi-36000..  c3e197f2... Active         sti-rx2540... cluster-multinodes-21209

更换驱动器

当集群中的驱动器发生故障时,必须尽快更换驱动器以确保数据完整性。驱动器发生故障时,您将在集群 CR 节点状态,集群运行状况信息和指标端点中看到故障驱动器信息。

在 nodeStatuss.driveStatuses 中显示故障驱动器的集群示例
$ kubectl get adscl -A -o yaml

响应:

...
apiVersion: astrads.netapp.io/v1alpha1
kind: AstraDSCluster
...
nodeStatuses:
  - driveStatuses:
    - driveID: 31205e51-f592-59e3-b6ec-185fd25888fa
      driveName: scsi-36000c290ace209465271ed6b8589b494
      drivesStatus: Failed
    - driveID: 3b515b09-3e95-5d25-a583-bee531ff3f31
      driveName: scsi-36000c290ef2632627cb167a03b431a5f
      drivesStatus: Active
    - driveID: 0807fa06-35ce-5a46-9c25-f1669def8c8e
      driveName: scsi-36000c292c8fc037c9f7e97a49e3e2708
      drivesStatus: Active
...

故障驱动器 CR 会在集群中自动创建,其名称与故障驱动器的 UUID 相对应。

$ kubectl get adsfd -A -o yaml

响应:

...
apiVersion: astrads.netapp.io/v1alpha1
kind: AstraDSFailedDrive
metadata:
    name: c290a-5000-4652c-9b494
    namespace: astrads-system
spec:
  executeReplace: false
  replaceWith: ""
 status:
   cluster: arda-6e4b4af
   failedDriveInfo:
     failureReason: AdminFailed
     inUse: false
     name: scsi-36000c290ace209465271ed6b8589b494
     path: /dev/disk/by-id/scsi-36000c290ace209465271ed6b8589b494
     present: true
     serial: 6000c290ace209465271ed6b8589b494
     node: sti-rx2540-300b.ctl.gdl.englab.netapp.com
   state: ReadyToReplace
~% kubectl astrads faileddrive list --cluster arda-6e4b4af

响应:

NAME       NODE                             CLUSTER        STATE                AGE
6000c290   sti-rx2540-300b.lab.netapp.com   ard-6e4b4af    ReadyToReplace       13m
步骤
  1. 使用 kubectl astrad show-replacements 命令列出可能的替代驱动器,该命令可筛选符合更换限制(未在集群中使用,未挂载,无分区以及等于或大于故障驱动器)的驱动器。

    要在不筛选可能的替代驱动器的情况下列出所有驱动器,请在 sHow-replacements 命令中添加 ` -all` 。

    ~%  kubectl astrads faileddrive show-replacements --cluster ard-6e4b4af --name 6000c290

    响应:

    NAME  IDPATH             SERIAL  PARTITIONCOUNT   MOUNTED   SIZE
    sdh   /scsi-36000c29417  45000c  0                false     100GB
  2. 使用 replace 命令将驱动器替换为已传递的序列号。如果 ` -wait` 时间已过,则命令将完成替换或失败。

    ~% kubectl astrads faileddrive replace --cluster arda-6e4b4af --name 6000c290 --replaceWith 45000c --wait
    Drive replacement completed successfully
    注 如果使用不适当的 ` -replaceWith` 序列号执行 kubectl astrad faileddrive replace ,则会显示类似以下内容的错误:
    ~% kubectl astrads replacedrive replace --cluster astrads-cluster-f51b10a --name 6000c2927 --replaceWith BAD_SERIAL_NUMBER
    Drive 6000c2927 replacement started
    Failed drive 6000c2927 has been set to use BAD_SERIAL_NUMBER as a replacement
    ...
    Drive replacement didn't complete within 25 seconds
    Current status: {FailedDriveInfo:{InUse:false Present:true Name:scsi-36000c2 FiretapUUID:444a5468 Serial:6000c Path:/scsi-36000c FailureReason:AdminFailed Node:sti-b200-0214a.lab.netapp.com} Cluster:astrads-cluster-f51b10a State:ReadyToReplace Conditions:[{Message: "Replacement drive serial specified doesn't exist", Reason: "DriveSelectionFailed", Status: False, Type:' Done"]}
  3. 要重新运行驱动器更换,请使用 ` -force` 和上一个命令:

    ~%  kubectl astrads replacedrive replace --cluster astrads-cluster-f51b10a --name 6000c2927 --replaceWith VALID_SERIAL_NUMBER --force