MetroCluster failure and recovery scenarios

12/22/2023 Contributors

PDFs

You should be aware of how the MetroCluster configuration responds to different failure events.

For additional information about recovery from node failures, see the section "Choosing the correct recovery procedure" in the Recover from a disaster.

Event	Impact	Recovery
Single node failure	A failover is triggered.	The configuration recovers through a local takeover. RAID is not impacted. Review system messages and replace failed FRUs as necessary. ONTAP Hardware Systems Documentation
Two nodes fail at one site	Two nodes will fail only if automated switchover is enabled in the MetroCluster Tiebreaker software.	Manual unplanned switchover (USO) if automated switchover in MetroCluster Tiebreaker software is not enabled. ONTAP Hardware Systems Documentation
MetroCluster IP interface—failure of one port	The system is degraded. Additional port failure impacts HA mirroring.	The second port is used. Health Monitor generates an alert if the physical link to the port is broken. Review system messages and replace failed FRUs as necessary. ONTAP Hardware Systems Documentation
MetroCluster IP interface—failure of both ports	HA capability is impacted. RAID SyncMirror of the node stops syncing.	Immediate manual recovery is required as there is no HA takeover. Review system messages and replace failed FRUs as necessary. ONTAP Hardware Systems Documentation
Failure of one MetroCluster IP switch	No impact. Redundancy is provided through the second network.	Replace the failed switch as necessary. Replacing an IP switch
Failure of two MetroCluster IP switches that are in the same network	No impact. Redundancy is provided through the second network.	Replace the failed switch as necessary. Replacing an IP switch
Failure of two MetroCluster IP switches that are at one site	RAID SyncMirror of the node stops syncing. HA capability is impacted and the cluster goes out of quorum.	Replace the failed switch as necessary. Replacing an IP switch
Failure of two MetroCluster IP switches that are at different sites and not on the same network (diagonal failure)	RAID SyncMirror of the node stops syncing.	RAID SyncMirror of the node stops syncing. Cluster and HA capability are not impacted. Replace the failed switch as necessary. Replacing an IP switch

Event

Impact

Recovery

Single node failure

A failover is triggered.

The configuration recovers through a local takeover. RAID is not impacted. Review system messages and replace failed FRUs as necessary.

ONTAP Hardware Systems Documentation

Two nodes fail at one site

Two nodes will fail only if automated switchover is enabled in the MetroCluster Tiebreaker software.

Manual unplanned switchover (USO) if automated switchover in MetroCluster Tiebreaker software is not enabled.

ONTAP Hardware Systems Documentation

MetroCluster IP interface—failure of one port

The system is degraded. Additional port failure impacts HA mirroring.

The second port is used. Health Monitor generates an alert if the physical link to the port is broken. Review system messages and replace failed FRUs as necessary.

ONTAP Hardware Systems Documentation

MetroCluster IP interface—failure of both ports

HA capability is impacted. RAID SyncMirror of the node stops syncing.

Immediate manual recovery is required as there is no HA takeover. Review system messages and replace failed FRUs as necessary.

ONTAP Hardware Systems Documentation

Failure of one MetroCluster IP switch

No impact. Redundancy is provided through the second network.

Replace the failed switch as necessary.

Replacing an IP switch

Failure of two MetroCluster IP switches that are in the same network

No impact. Redundancy is provided through the second network.

Replace the failed switch as necessary.