Skip to main content

Halt or reboot ONTAP nodes without initiating takeover in two-node clusters

Contributors netapp-lisa netapp-aaron-holt netapp-ahibbard netapp-aherbin netapp-barbe

You halt or reboot a node in a two-node cluster without initiating takeover when you perform certain hardware maintenance on a node or a shelf and you want to limit down time by keeping the partner node up, or when there are issues preventing a manual takeover and you want to keep the partner node’s aggregates up and serving data. Additionally, if technical support is assisting you with troubleshooting problems, they might have you perform this procedure as part of those efforts.

About this task
  • Before you inhibit takeover (using the -inhibit-takeover true parameter), you disable cluster HA.

Caution
  • In a two-node cluster, cluster HA ensures that the failure of one node does not disable the cluster. However, if you do not disable cluster HA before using the -inhibit-takeover true parameter, both nodes stop serving data.

  • If you attempt to halt or reboot a node before disabling cluster HA, ONTAP issues a warning and instructs you to disable cluster HA.

  • You migrate LIFs (logical interfaces) to the partner node that you want to remain online.

  • If on the node you are halting or rebooting there are aggregates you want to keep, you move them to the node that you want to remain online.

Steps
  1. Verify both nodes are healthy:

    cluster show

    For both nodes, true appears in the Health column.

    cluster::> cluster show
    Node         Health  Eligibility
    ------------ ------- ------------
    node1        true     true
    node2        true     true

    Learn more about cluster show in the ONTAP command reference.

  2. Migrate all LIFs from the node that you will halt or reboot to the partner node:

    network interface migrate-all -node <node_name>

    Learn more about network interface migrate-all in the ONTAP command reference.

  3. If there are aggregates on the node that you will halt or reboot that you want to keep online when the node is down, relocate them to the partner node.

    1. Show the aggregates on the node you will halt or reboot:

      storage aggregates show -node <node_name>

      For example, node1 is the node that will be halted or rebooted:

      cluster::> storage aggregates show -node node1
      Aggregate  Size  Available  Used%  State  #Vols   Nodes   RAID  Status
      ---------  ----  ---------  -----  -----  -----   -----   ----  ------
      aggr0_node_1_0
                 744.9GB   32.68GB   96% online       2 node1    raid_dp,
                                                                      normal
      aggr1       2.91TB    2.62TB   10% online       8 node1    raid_dp,
                                                                      normal
      aggr2
                  4.36TB    3.74TB   14% online      12 node1    raid_dp,
                                                                      normal
      test2_aggr  2.18TB    2.18TB    0% online       7 node1    raid_dp,
                                                                      normal
      4 entries were displayed.
    2. Move the aggregates to the partner node:

      storage aggregate relocation start -node <node_name> -destination <node_name> -aggregate-list <aggregate_name>

      For example, aggregates aggr1, aggr2 and test2_aggr are being moved from node1 to node2:

      storage aggregate relocation start -node node1 -destination node2 -aggregate-list aggr1,aggr2,test2_aggr

  4. Disable cluster HA:

    cluster ha modify -configured false

    When cluster HA is disabled, epsilon is automatically moved to node 1. The return output confirms HA is disabled: Notice: HA is disabled

    Note This operation does not disable storage failover.
  5. If the node to be halted or rebooted is node 1, move epsilon to node 2:

    1. Set the privilege level to advanced:

      set -privilege advanced
    2. Verify that node 2 is healthy and eligible for epsilon:

      cluster show
    3. Remove epsilon from node 1:

      cluster modify -node <node1_name> -epsilon false
    4. Assign epsilon to node 2:

      cluster modify -node <node2_name> -epsilon true
    5. Verify that epsilon is on node 2:

      cluster show
  6. Halt or reboot and inhibit takeover of the target node:

    Halt the node without initiating takeover:

    system node halt -node _node_name_ -inhibit-takeover true

    Reboot the node without initiating takeover:

    system node reboot -node _node_name_ -inhibit-takeover true
    Note In the command output, you will see a warning asking you if you want to proceed, enter y.
  7. Verify that the node that is still online is in a healthy state (while the partner is down):

    cluster show

    For the online node, true appears in the Health column.

    Note In the command output, you will see a warning that cluster HA is not configured. You can ignore the warning.
  8. Perform the actions that required you to halt or reboot the node.

  9. Boot the offlined node from the LOADER prompt:

    boot_ontap
  10. Verify both nodes are healthy:

    cluster show

    For both nodes, true appears in the Health column.

    Note In the command output, you will see a warning that cluster HA is not configured. You can ignore the warning at this time.
  11. Reenable cluster HA:

    cluster ha modify -configured true
  12. If earlier in this procedure you relocated aggregates to the partner node, move them back to their home node.

    storage aggregate relocation start -node <node_name> -destination <node_name> -aggregate-list <aggregate_name>

    For example, aggregates aggr1, aggr2 and test2_aggr are being moved from node node2 to node node1:
    storage aggregate relocation start -node node2 -destination node1 -aggregate-list aggr1,aggr2,test2_aggr

  13. Revert LIFs to their home ports:

    1. View LIFs that are not at home:

      network interface show -is-home false

      Learn more about network interface show in the ONTAP command reference.

    2. If there are non-home LIFs that were not migrated from the down node, verify it is safe to move them before reverting.

    3. If it is safe to do so, revert all LIFs home.

       network interface revert *

      Learn more about network interface revert in the ONTAP command reference.