Choosing the correct recovery procedure

09/24/2025 Contributors

PDFs

After a failure in a MetroCluster configuration, you must select the correct recovery procedure. Use the following table and examples to select the appropriate recovery procedure.

This information in this table assumes that the installation or transition is complete, meaning that the metrocluster configure command ran successfully.

Scope of failures at disaster site	Procedure
No hardware failure (for example, a power failure)	Recovering from a non-controller failure
No controller module failure Other hardware has failed	Recovering from a non-controller failure
Single controller module failure or failure of FRU components within the controller module Drives have not failed	If a failure is limited to a single controller module, you must use the controller module FRU replacement procedure for the platform model. In a four or eight-node MetroCluster configuration, such a failure is isolated to the local HA pair. Note: The controller module FRU replacement procedure can be used in a two-node MetroCluster configuration if there are no drive or other hardware failures. ONTAP Hardware Systems Documentation
Single controller module failure or failure of FRU components within the controller module Drives have failed	Recovering from a multi-controller or storage failure
Single controller module failure or failure of FRU components within the controller module Drives have not failed Additional hardware outside the controller module has failed	Recovering from a multi-controller or storage failure You should skip all steps for drive assignment.
Multiple controller module failure (with or without additional failures) within a DR group	Recovering from a multi-controller or storage failure

Scope of failures at disaster site

Procedure

No hardware failure (for example, a power failure)

Recovering from a non-controller failure

No controller module failure
Other hardware has failed

Recovering from a non-controller failure

Single controller module failure or failure of FRU components within the controller module
Drives have not failed

If a failure is limited to a single controller module, you must use the controller module FRU replacement procedure for the platform model. In a four or eight-node MetroCluster configuration, such a failure is isolated to the local HA pair.

Note: The controller module FRU replacement procedure can be used in a two-node MetroCluster configuration if there are no drive or other hardware failures.

ONTAP Hardware Systems Documentation

Single controller module failure or failure of FRU components within the controller module
Drives have failed

Recovering from a multi-controller or storage failure

Single controller module failure or failure of FRU components within the controller module
Drives have not failed
Additional hardware outside the controller module has failed

Recovering from a multi-controller or storage failure

You should skip all steps for drive assignment.

Multiple controller module failure (with or without additional failures) within a DR group

Recovering from a multi-controller or storage failure

Controller module failure scenarios during MetroCluster installation

Responding to a controller module failure during the MetroCluster configuration procedure depends on whether the metrocluster configure command successfully completed.

If the metrocluster configure command was not yet run, or failed, you must restart the MetroCluster software configuration procedure from the beginning with a replacement controller module.

You must be sure to perform the steps in Restoring system defaults on a controller module on each controller (including the replacement controller) to verify that the previous configuration is removed.

If the metrocluster configure command successfully completed and then the controller module failed, use the previous table to determine the correct recovery procedure.

Controller module failure scenarios during MetroCluster FC-to-IP transition

The recovery procedure can be used if a site failure occurs during transition. However, it can only be used if the configuration is a stable mixed configuration, with the FC DR group and IP DR group both fully configured. The output of the metrocluster node show command should show both DR groups with all eight nodes.

If the failure occurred during transition when the nodes are in the process of being added or removed, you must contact technical support.

Controller module failure scenarios in eight-node MetroCluster configurations

Failure scenarios:

Single controller module failures in a single DR group
Two controller module failures in a single DR group
Single controller module failures in separate DR groups
Three controller module failures spread across the DR groups

Single controller module failures in a single DR group

In this case the failure is limited to an HA pair.

If no storage requires replacement, you can use the controller module FRU replacement procedure for the platform model.

ONTAP Hardware Systems Documentation
If storage requires replacement, you can use the multi-controller module recovery procedure.

Recovering from a multi-controller or storage failure

This scenario applies to four-node MetroCluster configurations also.

Two controller module failures in a single DR group

In this case the failure requires a switchover. You can use the multi-controller module failure recovery procedure.

Recovering from a multi-controller or storage failure

This scenario applies to four-node MetroCluster configurations also.

Eight node MetroCluster DR groups with multi controller failure

Single controller module failures in separate DR groups

In this case the failure is limited to separate HA pairs.

If no storage requires replacement, you can use the controller module FRU replacement procedure for the platform model.

The FRU replacement procedure is performed twice, once for each failed controller module.

ONTAP Hardware Systems Documentation
If storage requires replacement, you can use the multi-controller module recovery procedure.

Recovering from a multi-controller or storage failure

Eight node MetroCluster DR groups with two single controller failures

Three controller module failures spread across the DR groups

In this case the failure requires a switchover. You can use the multi-controller module failure recovery procedure for DR Group One.

Recovering from a multi-controller or storage failure

You can use the platform-specific controller module FRU replacement procedure for DR Group Two.

ONTAP Hardware Systems Documentation

Eight node MetroCluster DR groups with three controller failures

Controller module failure scenarios in two-node MetroCluster configurations

The procedure you use depends on the extent of the failure.

If no storage requires replacement, you can use the controller module FRU replacement procedure for the platform model.

ONTAP Hardware Systems Documentation
If storage requires replacement, you can use the multi-controller module recovery procedure.

Recovering from a multi-controller or storage failure

Two node MetroCluster DR groups with single controller failure

Choosing the correct recovery procedure

Creating your file...

Controller module failure scenarios during MetroCluster installation

Controller module failure scenarios during MetroCluster FC-to-IP transition

Controller module failure scenarios in eight-node MetroCluster configurations

Single controller module failures in a single DR group

Two controller module failures in a single DR group

Single controller module failures in separate DR groups

Three controller module failures spread across the DR groups

Controller module failure scenarios in two-node MetroCluster configurations