MetroCluster failure and recovery scenarios

Contributors netapp-ivanad Download PDF of this page

You should be aware of how the MetroCluster configuration responds to different failure events.

Note For additional information about recovery from node failures, see the section "Choosing the correct recovery procedure" in the MetroCluster Management and Disaster Recovery Guide.
Event Impact Recovery

Single node failure

A failover is triggered.

The configuration recovers through a local takeover. RAID is not impacted. Review system messages and replace failed FRUs as necessary.

Two nodes fail at one site

Two nodes will fail only if automated switchover is enabled in the MetroCluster Tiebreaker software.

Manual USO if automated switchover in MetroCluster Tiebreaker software is not enabled.

MetroCluster IP interface—​failure of one port

The system is degraded. Additional port failure impacts HA mirroring.

The second port is used. Health Monitor generates an alert if the physical link to the port is broken. Review system messages and replace failed FRUs as necessary.

MetroCluster IP interface—​failure of both ports

HA capability is impacted. RAID SyncMirror of the node stops syncing.

Immediate manual recovery is required as there is no HA takeover. Review system messages and replace failed FRUs as necessary.

Failure of one MetroCluster IP switch

No impact. Redundancy is provided through the second network.

Replace the failed switch as necessary.

Failure of two MetroCluster IP switches that are in the same network

No impact. Redundancy is provided through the second network.

Replace the failed switch as necessary.

Failure of two MetroCluster IP switches that are at one site

RAID SyncMirror of the node stops syncing. HA capability is impacted and the cluster goes out of quorum.

Replace the failed switch as necessary.

Failure of two MetroCluster IP switches that are at different sites and not on the same network (diagonal failure)

RAID SyncMirror of the node stops syncing.

RAID SyncMirror of the node stops syncing. Cluster and HA capability are not impacted. Replace the failed switch as necessary.