Manage the Astra Data Store cluster

Contributors amgrissino netapp-mwallis

You can manage the cluster by using kubectl commands with Astra Data Store.

What you’ll need

Add a node

The node that you are adding should be part of the Kubernetes cluster and should have a configuration that is similar to the other nodes in the cluster.

Note To add a node using Astra Control Center, see Add nodes to a storage backend cluster.
Steps
  1. If the new node’s dataIP is not already part of the Astra Data Store cluster CR, do the following:

    1. Edit the cluster CR and add the additional dataIP in the adsDataNetworks addresses field. Replace the information in capital letters with the appropriate values for your environment:

      kubectl edit astradscluster CLUSTER_NAME -n astrads-system
    2. Save the CR.

    3. Add the node to the Astra Data Store cluster. Replace the information in capital letters with the appropriate values for your environment:

      kubectl astrads nodes add --cluster CLUSTER_NAME
  2. Otherwise, just add the nodes. Replace the information in capital letters with the appropriate values for your environment:

    kubectl astrads nodes add --cluster CLUSTER_NAME
  3. Verify that the node has been added:

    kubectl astrads nodes list

Remove a node

Use kubectl commands with Astra Data Store to remove a node in a cluster.

Steps
  1. List all the nodes:

    kubectl astrads nodes list

    Response:

    NODE NAME           NODE STATUS    CLUSTER NAME
    sti-rx2540-534d...  Added       cluster-multinodes-21209
    sti-rx2540-535d...  Added       cluster-multinodes-21209
    ...
  2. Mark the node for removal. Replace the information in capital letters with the appropriate values for your environment:

    kubectl astrads nodes remove NODE_NAME

    Response:

    Removal label set on node sti-rx2540-534d.lab.org
    Successfully updated ADS cluster cluster-multinodes-21209 desired node count from 4 to 3

    After you mark the node for removal, the node status should change from active to present.

  3. Verify the present status of the removed node:

    kubectl get nodes --show-labels

    Response:

    NAME                                            STATUS   ROLES                  AGE     VERSION   LABELS
    sti-astramaster-050.lab.org   Ready    control-plane,master   3h39m   v1.20.0   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=sti-astramaster-050.lab.org,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=
    sti-rx2540-556a.lab.org       Ready    worker                 3h38m   v1.20.0   astrads.netapp.io/cluster=astrads-cluster-890c32c,astrads.netapp.io/storage-cluster-status=active,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=sti-rx2540-556a.lab.org,kubernetes.io/os=linux,node-role.kubernetes.io/worker=true
    sti-rx2540-556b.lab.org       Ready    worker                 3h38m   v1.20.0   astrads.netapp.io/cluster=astrads-cluster-890c32c,astrads.netapp.io/storage-cluster-status=active,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=sti-rx2540-556b.lab.org,kubernetes.io/os=linux,node-role.kubernetes.io/worker=true
    sti-rx2540-534d.lab.org       Ready    worker                 3h38m   v1.20.0   astrads.netapp.io/storage-cluster-status=present,astrads.netapp.io/storage-node-removal=,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=sti-rx2540-557a.lab.org,kubernetes.io/os=linux,node-role.kubernetes.io/worker=true
    sti-rx2540-557b.lab.org       Ready    worker                 3h38m   v1.20.0   astrads.netapp.io/cluster=astrads-cluster-890c32c,astrads.netapp.io/storage-cluster-status=active,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=sti-rx2540-557b.lab.org,kubernetes.io/os=linux,node-role.kubernetes.io/worker=true
  4. Uninstall Astra Data Store from the node. Replace the information in capital letters with the appropriate values for your environment:

    kubectl astrads nodes uninstall NODE_NAME
  5. Verify the node is removed from the cluster:

    kubectl astrads nodes list
Result

The node is removed from Astra Data Store.

Place a node in maintenance mode

When you need to perform host maintenance or package upgrades, you should place the node in maintenance mode.

Note The node must already be part of the Astra Data Store cluster.

When a node is in maintenance mode, you cannot add a node to the cluster. In this example, we will place node nhcitjj1525 into maintenance mode.

Steps
  1. Display the node details:

    kubectl get nodes

    Response:

     NAME             STATUS   ROLES                  AGE     VERSION
     nhcitjj1525      Ready    <none>                 3d18h   v1.23.5
     nhcitjj1526      Ready    <none>                 3d18h   v1.23.5
     nhcitjj1527      Ready    <none>                 3d18h   v1.23.5
     nhcitjj1528      Ready    <none>                 3d18h   v1.23.5
     scs000039783-1   Ready    control-plane,master   3d18h   v1.23.5
  2. Ensure that the node is not already in maintenance mode:

    kubectl astrads maintenance list

    Response (there are no nodes already in maintenance mode):

    NAME    NODE NAME  IN MAINTENANCE  MAINTENANCE STATE       MAINTENANCE VARIANT
  3. Enable maintenance mode. Replace the information in capital letters with the appropriate values for your environment:

    kubectl astrads maintenance create CR_NAME --node-name=NODE_NAME --variant=Node

    For example:

    kubectl astrads maintenance create maint1 --node-name="nhcitjj1525" --variant=Node

    Response:

    Maintenance mode astrads-system/maint1 created
  4. List the nodes:

    kubectl astrads nodes list

    Response:

    NODE NAME       NODE STATUS     CLUSTER NAME
    nhcitjj1525     Added           ftap-astra-012
    ...
  5. Check the status of the maintenance mode:

    kubectl astrads maintenance list

    Response:

    NAME    NODE NAME       IN MAINTENANCE  MAINTENANCE STATE       MAINTENANCE VARIANT
    node4   nhcitjj1525     true            ReadyForMaintenance     Node

    The In Maintenance mode starts as false and changes to true.
    The Maintenance State changes from PreparingForMaintenance to ReadyforMaintenance.

  6. After the node maintenance is complete, disable maintenance mode:

    kubectl astrads maintenance update maint1 --node-name="nhcitjj1525" --variant=None
  7. Ensure that the node is no longer in maintenance mode:

    kubectl astrads maintenance list

Add drives to a node

Use kubectl commands with Astra Data Store to add physical or virtual drives to a node in an Astra Data Store cluster.

What you’ll need
  • One or more drives that meet the following criteria:

    • Already installed in the node (physical drives) or added to the node VM (virtual drives)

    • No partitions on the drive

    • Drive is not currently in use by the cluster

    • Drive raw capacity does not exceed licensed raw capacity in the cluster (For example, with a license granting 2TB of storage per CPU core, a cluster of 10 nodes would have a maximum of 20TB raw drive capacity)

    • Drive is at least the size of other active drives in the node

Note Astra Data Store requires no more than 16 drives per node. If you try to add a 17th drive, the drive add request is denied.
Steps
  1. Describe the cluster:

    kubectl astrads clusters list

    Response:

    CLUSTER NAME                    CLUSTER STATUS  NODE COUNT
    cluster-multinodes-21209        created         4
  2. Note the cluster name.

  3. Show the drives that are available to add to all nodes in the cluster. Replace CLUSTER_NAME with the name of your cluster:

    kubectl astrads adddrive show-available --cluster=CLUSTER_NAME

    Response:

    Current cluster drive add status
    Licensed cluster capacity: 72.0 TiB
    Cluster capacity used: 2.3 TiB
    Maximum node size without stranding: 800.0 GiB
    
    Node: node1.name
    Current node size: 600.0 GiB
    Maximum licensed node size: 18.0 TiB
    Total size that can be added to this node without stranding: 200.0 GiB
    Add drive minimum/reccomended size: 100.0 GiB. Larger disks will be constrained to this size
    NAME IDPATH SERIAL PARTITIONCOUNT SIZE ALREADYINCLUSTER
    sdg /dev/disk/by-id/scsi-3c290e16d52479a9af5eac c290e16d52479a9af5eac 0 100 GiB false
    sdh /dev/disk/by-id/scsi-3c2935798df68355dee0be c2935798df68355dee0be 0 100 GiB false
    
    Node: node2.name
    Current node size: 600.0 GiB
    Maximum licensed node size: 18.0 TiB
    Total size that can be added to this node without stranding: 200.0 GiB
    Add drive minimum/reccomended size: 100.0 GiB. Larger disks will be constrained to this size
    No suitable drives to add exist.
    
    Node: node3.name
    Current node size: 600.0 GiB
    Maximum licensed node size: 18.0 TiB
    Total size that can be added to this node without stranding: 200.0 GiB
    Add drive minimum/reccomended size: 100.0 GiB. Larger disks will be constrained to this size
    NAME IDPATH SERIAL PARTITIONCOUNT SIZE ALREADYINCLUSTER
    sdg /dev/disk/by-id/scsi-3c29ee82992ed7a36fc942 c29ee82992ed7a36fc942 0 100 GiB false
    sdh /dev/disk/by-id/scsi-3c29312aa362469fb3da9c c29312aa362469fb3da9c 0 100 GiB false
    
    Node: node4.name
    Current node size: 600.0 GiB
    Maximum licensed node size: 18.0 TiB
    Total size that can be added to this node without stranding: 200.0 GiB
    Add drive minimum/reccomended size: 100.0 GiB. Larger disks will be constrained to this size
    No suitable drives to add exist.
  4. Do one of the following:

    • If all available drives have the same name, you can add them to the respective nodes simultaneously. Replace the information in capital letters with the appropriate values for your environment:

      kubectl astrads adddrive create --cluster=CLUSTER_NAME --name REQUEST_NAME --drivesbyname all=DRIVE_NAME
    • If the drives are named differently, you can add them to the respective nodes one at a time (you’ll need to repeat this step for each drive you need to add). Replace the information in capital letters with the appropriate values for your environment:

      kubectl astrads adddrive create --cluster=CLUSTER_NAME --name REQUEST_NAME --drivesbyname NODE_NAME=DRIVE_NAME
Result

Astra Data Store creates a request to add the drive or drives, and a message appears with the result of the request.

Replace a drive

When a drive fails in a cluster, the drive must be replaced as soon as possible to ensure data integrity.
If a drive fails, you can see information about the failed drive in cluster CR node status, cluster health condition information, and the metrics endpoint. You can use the following example commands to see failed drive information.

Example of cluster showing failed drive in nodeStatuses.driveStatuses
kubectl get adscl -A -o yaml

Response:

...
apiVersion: astrads.netapp.io/v1alpha1
kind: AstraDSCluster
...
nodeStatuses:
  - driveStatuses:
    - driveID: 31205e51-f592-59e3-b6ec-185fd25888fa
      driveName: scsi-36000c290ace209465271ed6b8589b494
      drivesStatus: Failed
    - driveID: 3b515b09-3e95-5d25-a583-bee531ff3f31
      driveName: scsi-36000c290ef2632627cb167a03b431a5f
      drivesStatus: Active
    - driveID: 0807fa06-35ce-5a46-9c25-f1669def8c8e
      driveName: scsi-36000c292c8fc037c9f7e97a49e3e2708
      drivesStatus: Active
...
Example of new AstraDSFailedDrive CR

The failed drive CR is created automatically in the cluster with a name corresponding to the UUID of the failed drive.

kubectl get adsfd -A -o yaml

Response:

...
apiVersion: astrads.netapp.io/v1alpha1
kind: AstraDSFailedDrive
metadata:
    name: c290a-5000-4652c-9b494
    namespace: astrads-system
spec:
  executeReplace: false
  replaceWith: ""
 status:
   cluster: arda-6e4b4af
   failedDriveInfo:
     failureReason: AdminFailed
     inUse: false
     name: scsi-36000c290ace209465271ed6b8589b494
     path: /dev/disk/by-id/scsi-36000c290ace209465271ed6b8589b494
     present: true
     serial: 6000c290ace209465271ed6b8589b494
     node: sti-rx2540-300b.lab.org
   state: ReadyToReplace
kubectl astrads faileddrive list --cluster arda-6e4b4af

Response:

NAME       NODE                             CLUSTER        STATE                AGE
6000c290   sti-rx2540-300b.lab.netapp.com   ard-6e4b4af    ReadyToReplace       13m
Steps
  1. List possible replacement drives with the kubectl astrads faileddrive show-replacements command, which filters drives that fit replacement restrictions (unused in cluster, not mounted, no partitions, and equal or larger than failed drive).

    To list all drives without filtering possible replacement drives, add --all to show-replacements command.

    kubectl astrads faileddrive show-replacements --cluster ard-6e4b4af --name 6000c290

    Response:

    NAME  IDPATH             SERIAL  PARTITIONCOUNT   MOUNTED   SIZE
    sdh   /scsi-36000c29417  45000c  0                false     100GB
  2. Use the replace command to replace the drive with the passed serial number. The command completes the replacement or fails if --wait time elapses.

    kubectl astrads faileddrive replace --cluster arda-6e4b4af --name 6000c290 --replaceWith 45000c --wait

    Response:

    Drive replacement completed successfully
    Note If kubectl astrads faileddrive replace is executed using an inappropriate --replaceWith serial number, an error appears similar to this:
    kubectl astrads replacedrive replace --cluster astrads-cluster-f51b10a --name 6000c2927 --replaceWith BAD_SERIAL_NUMBER
    Drive 6000c2927 replacement started
    Failed drive 6000c2927 has been set to use BAD_SERIAL_NUMBER as a replacement
    ...
    Drive replacement didn't complete within 25 seconds
    Current status: {FailedDriveInfo:{InUse:false Present:true Name:scsi-36000c2 FiretapUUID:444a5468 Serial:6000c Path:/scsi-36000c FailureReason:AdminFailed Node:sti-b200-0214a.lab.netapp.com} Cluster:astrads-cluster-f51b10a State:ReadyToReplace Conditions:[{Message: "Replacement drive serial specified doesn't exist", Reason: "DriveSelectionFailed", Status: False, Type:' Done"]}
  3. To re-run drive replacement use --force with the previous command:

    kubectl astrads faileddrive replace --cluster astrads-cluster-f51b10a --name 6000c2927 --replaceWith VALID_SERIAL_NUMBER --force