Shut down the controllers - AFF A400

Contributors dougthomp netapp-martyh

Shut down or take over the impaired controller using the appropriate procedure for your configuration.

Option 1: Shut down the controllers when replacing a chassis

You must shut down the controller or controller in the chassis prior to moving them to the new chassis.

About this task
  • If you have a cluster with more than two controllers, it must be in quorum. If the cluster is not in quorum or a healthy controller shows false for eligibility and health, you must correct the issue before shutting down the impaired controller; see the Administration overview with the CLI.

  • If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=number_of_hours_downh

    The following AutoSupport message suppresses automatic case creation for two hours: cluster1:*> system node autosupport invoke -node * -type all -message MAINT=2h

  1. If your system has two controller modules, disable the HA pair.

    If your system is running clustered ONTAP with…​ Then…​

    Two controllers in the cluster

    cluster ha modify -configured false storage failover modify -node node0 -enabled false

    More than two controllers in the cluster

    storage failover modify -node node0 -enabled false

  2. Halt the controller, pressing y when you are prompted to confirm the halt: system node halt -node node_name

    The confirmation message looks like the following:

    Warning: This operation will cause controller "node-name" to be marked as unhealthy. Unhealthy nodes do not participate in quorum voting. If the controller goes out of service and one more controller goes out of service there will be a data serving failure for the entire cluster. This will cause a client disruption. Use "cluster show" to verify cluster state. If possible bring other nodes online to improve the resiliency of this cluster.
    Do you want to continue? {y|n}:
    Note You must perform a clean system shutdown before replacing the chassis to avoid losing unwritten data in the nonvolatile memory (NVMEM/NVRAM). Depending on your system, if the NVMEM/NVRAM LED is flashing, there is content in the NVMEM/NVRAM that has not been saved to disk. You need to reboot the controller and start from the beginning of this procedure. If repeated attempts to cleanly shut down the controller fail, be aware that you might lose any data that was not saved to disk.
  3. Where applicable, halt the second controller to avoid a possible quorum error message in an HA pair configuration: system node halt -node second_node_name -ignore-quorum-warnings true -skip-lif-migration-before-shutdown true

    Answer y when prompted.

Option 2: Shut down a controller in a two-node MetroCluster configuration

To shut down the impaired controller, you must determine the status of the controller and, if necessary, switch over the controller so that the healthy controller continues to serve data from the impaired controller storage.

About this task
  • If you are using NetApp Storage Encryption, you must have reset the MSID using the instructions in the "Return a FIPS drive or SED to unprotected mode" section of NetApp Encryption overview with the CLI.

  • You must leave the power supplies turned on at the end of this procedure to provide power to the healthy controller.

  1. Check the MetroCluster status to determine whether the impaired controller has automatically switched over to the healthy controller: metrocluster show

  2. Depending on whether an automatic switchover has occurred, proceed according to the following table:

    If the impaired controller…​ Then…​

    Has automatically switched over

    Proceed to the next step.

    Has not automatically switched over

    Perform a planned switchover operation from the healthy controller: metrocluster switchover

    Has not automatically switched over, you attempted switchover with the metrocluster switchover command, and the switchover was vetoed

    Review the veto messages and, if possible, resolve the issue and try again. If you are unable to resolve the issue, contact technical support.

  3. Resynchronize the data aggregates by running the metrocluster heal -phase aggregates command from the surviving cluster.

    controller_A_1::> metrocluster heal -phase aggregates
    [Job 130] Job succeeded: Heal Aggregates is successful.

    If the healing is vetoed, you have the option of reissuing the metrocluster heal command with the -override-vetoes parameter. If you use this optional parameter, the system overrides any soft vetoes that prevent the healing operation.

  4. Verify that the operation has been completed by using the metrocluster operation show command.

    controller_A_1::> metrocluster operation show
        Operation: heal-aggregates
          State: successful
    Start Time: 7/25/2016 18:45:55
       End Time: 7/25/2016 18:45:56
         Errors: -
  5. Check the state of the aggregates by using the storage aggregate show command.

    controller_A_1::> storage aggregate show
    Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
    --------- -------- --------- ----- ------- ------ ---------------- ------------
    aggr_b2    227.1GB   227.1GB    0% online       0 mcc1-a2          raid_dp, mirrored, normal...
  6. Heal the root aggregates by using the metrocluster heal -phase root-aggregates command.

    mcc1A::> metrocluster heal -phase root-aggregates
    [Job 137] Job succeeded: Heal Root Aggregates is successful

    If the healing is vetoed, you have the option of reissuing the metrocluster heal command with the -override-vetoes parameter. If you use this optional parameter, the system overrides any soft vetoes that prevent the healing operation.

  7. Verify that the heal operation is complete by using the metrocluster operation show command on the destination cluster:

    mcc1A::> metrocluster operation show
      Operation: heal-root-aggregates
          State: successful
     Start Time: 7/29/2016 20:54:41
       End Time: 7/29/2016 20:54:42
         Errors: -
  8. On the impaired controller module, disconnect the power supplies.