MetroCluster failure and recovery scenarios
You should be aware of how the MetroCluster configuration responds to different failure events.
For additional information about recovery from node failures, see the section "Choosing the correct recovery procedure" in the Recover from a disaster. |
Event |
Impact |
Recovery |
---|---|---|
Single node failure |
A failover is triggered. |
The configuration recovers through a local takeover. RAID is not impacted. Review system messages and replace failed FRUs as necessary. |
Two nodes fail at one site |
Two nodes will fail only if automated switchover is enabled in the MetroCluster Tiebreaker software. |
Manual unplanned switchover (USO) if automated switchover in MetroCluster Tiebreaker software is not enabled. |
MetroCluster IP interface—failure of one port |
The system is degraded. Additional port failure impacts HA mirroring. |
The second port is used. Health Monitor generates an alert if the physical link to the port is broken. Review system messages and replace failed FRUs as necessary. |
MetroCluster IP interface—failure of both ports |
HA capability is impacted. RAID SyncMirror of the node stops syncing. |
Immediate manual recovery is required as there is no HA takeover. Review system messages and replace failed FRUs as necessary. |
Failure of one MetroCluster IP switch |
No impact. Redundancy is provided through the second network. |
Replace the failed switch as necessary. |
Failure of two MetroCluster IP switches that are in the same network |
No impact. Redundancy is provided through the second network. |
Replace the failed switch as necessary. |
Failure of two MetroCluster IP switches that are at one site |
RAID SyncMirror of the node stops syncing. HA capability is impacted and the cluster goes out of quorum. |
Replace the failed switch as necessary. |
Failure of two MetroCluster IP switches that are at different sites and not on the same network (diagonal failure) |
RAID SyncMirror of the node stops syncing. |
RAID SyncMirror of the node stops syncing. Cluster and HA capability are not impacted. Replace the failed switch as necessary. |