MetroCluster failure and recovery scenarios
You should be aware of how the MetroCluster configuration responds to different failure events.
|
For additional information about recovery from node failures, see the section "Choosing the correct recovery procedure" in the Recover from a disaster. |
Event |
Impact |
Recovery |
---|---|---|
Single node failure |
A failover is triggered. |
The configuration recovers through a local takeover. RAID is not impacted. Review system messages and replace failed FRUs as necessary. |
Two nodes fail at one site |
Two nodes will fail only if automated switchover is enabled in the MetroCluster Tiebreaker software. |
Manual unplanned switchover (USO) if automated switchover in MetroCluster Tiebreaker software is not enabled. |
MetroCluster IP interface—failure of one port |
The system is degraded. Additional port failure impacts HA mirroring. |
The second port is used. Health Monitor generates an alert if the physical link to the port is broken. Review system messages and replace failed FRUs as necessary. |
MetroCluster IP interface—failure of both ports |
HA capability is impacted. RAID SyncMirror of the node stops syncing. |
Immediate manual recovery is required as there is no HA takeover. Review system messages and replace failed FRUs as necessary. |
Failure of one MetroCluster IP switch |
No impact. Redundancy is provided through the second network. |
Replace the failed switch as necessary. |
Failure of two MetroCluster IP switches that are in the same network |
No impact. Redundancy is provided through the second network. |
Replace the failed switch as necessary. |
Failure of two MetroCluster IP switches that are at one site |
RAID SyncMirror of the node stops syncing. HA capability is impacted and the cluster goes out of quorum. |
Replace the failed switch as necessary. |
Failure of two MetroCluster IP switches that are at different sites and not on the same network (diagonal failure) |
RAID SyncMirror of the node stops syncing. |
RAID SyncMirror of the node stops syncing. Cluster and HA capability are not impacted. Replace the failed switch as necessary. |