Skip to main content
ONTAP MetroCluster

Choosing the correct recovery procedure

Contributors netapp-folivia netapp-aoife netapp-thomi netapp-martyh netapp-ahibbard b-ahibbard NetAppZacharyWambold

After a failure in a MetroCluster configuration, you must select the correct recovery procedure. Use the following table and examples to select the appropriate recovery procedure.

This information in this table assumes that the installation or transition is complete, meaning that the metrocluster configure command ran successfully.

Scope of failures at disaster site

Procedure

  • No hardware failure

  • No controller module failure

  • Other hardware has failed

  • Single controller module failure or failure of FRU components within the controller module

  • Drives have not failed

If a failure is limited to a single controller module, you must use the controller module FRU replacement procedure for the platform model. In a four or eight-node MetroCluster configuration, such a failure is isolated to the local HA pair.

Note: The controller module FRU replacement procedure can be used in a two-node MetroCluster configuration if there are no drive or other hardware failures.

  • Single controller module failure or failure of FRU components within the controller module

  • Drives have failed

  • Single controller module failure or failure of FRU components within the controller module

  • Drives have not failed

  • Additional hardware outside the controller module has failed

You should skip all steps for drive assignment.

  • Multiple controller module failure (with or without additional failures) within a DR group

Controller module failure scenarios during MetroCluster installation

Responding to a controller module failure during the MetroCluster configuration procedure depends on whether the metrocluster configure command successfully completed.

  • If the metrocluster configure command was not yet run, or failed, you must restart the MetroCluster software configuration procedure from the beginning with a replacement controller module.

    Note You must be sure to perform the steps in Restoring system defaults on a controller module on each controller (including the replacement controller) to verify that the previous configuration is removed.
  • If the metrocluster configure command successfully completed and then the controller module failed, use the previous table to determine the correct recovery procedure.

Controller module failure scenarios during MetroCluster FC-to-IP transition

The recovery procedure can be used if a site failure occurs during transition. However, it can only be used if the configuration is a stable mixed configuration, with the FC DR group and IP DR group both fully configured. The output of the metrocluster node show command should show both DR groups with all eight nodes.

Important If the failure occurred during transition when the nodes are in the process of being added or removed, you must contact technical support.

Controller module failure scenarios in eight-node MetroCluster configurations

Failure scenarios:

Single controller module failures in a single DR group

In this case the failure is limited to an HA pair.

Two controller module failures in a single DR group

In this case the failure requires a switchover. You can use the multi-controller module failure recovery procedure.

This scenario applies to four-node MetroCluster configurations also.

mcc dr groups 8 node with a multi controller failure

Single controller module failures in separate DR groups

In this case the failure is limited to separate HA pairs.

mcc dr groups 8 node with two single controller failures

Three controller module failures spread across the DR groups

In this case the failure requires a switchover. You can use the multi-controller module failure recovery procedure for DR Group One.

You can use the platform-specific controller module FRU replacement procedure for DR Group Two.

mcc dr groups 8 node with a 3 controller failure

Controller module failure scenarios in two-node MetroCluster configurations

The procedure you use depends on the extent of the failure.

mcc dr groups 2 node with a single controller failure