Performing aggregate healing and restoring mirrors (MetroCluster IP configurations)
After replacing hardware and assigning disks, in systems running ONTAP 9.5 or earlier you can perform the MetroCluster healing operations. In all versions of ONTAP, you must then confirm that aggregates are mirrored and, if necessary, restart mirroring.
Beginning with ONTAP 9.6, the healing operations are performed automatically when the disaster site nodes boot up. The healing commands are not required.
These steps are performed on the surviving cluster.
-
If you are using ONTAP 9.6 or later, you must verify that automatic healing completed successfully:
-
Confirm that the heal-aggr-auto and heal-root-aggr-auto operations completed:
metrocluster operation history show
The following output shows that the operations have completed successfully on cluster_A.
cluster_B::*> metrocluster operation history show Operation State Start Time End Time ----------------------------- -------------- ---------------- ---------------- heal-root-aggr-auto successful 2/25/2019 06:45:58 2/25/2019 06:46:02 heal-aggr-auto successful 2/25/2019 06:45:48 2/25/2019 06:45:52 . . .
-
Confirm that the disaster site is ready for switchback:
metrocluster node show
The following output shows that the operations have completed successfully on cluster_A.
cluster_B::*> metrocluster node show DR Configuration DR Group Cluster Node State Mirroring Mode ----- ------- ------------- -------------- --------- -------------------- 1 cluster_A node_A_1 configured enabled heal roots completed node_A_2 configured enabled heal roots completed cluster_B node_B_1 configured enabled waiting for switchback recovery node_B_2 configured enabled waiting for switchback recovery 4 entries were displayed.
-
-
If you are using ONTAP 9.5 or earlier, you must perform aggregate healing:
-
Verify the state of the nodes:
metrocluster node show
The following output shows that switchover has completed, so healing can be performed.
cluster_B::> metrocluster node show DR Configuration DR Group Cluster Node State Mirroring Mode ----- ------- ------------------ -------------- --------- -------------------- 1 cluster_B node_B_1 configured enabled switchover completed node_B_2 configured enabled switchover completed cluster_A node_A_1 configured enabled waiting for switchback recovery node_A_2 configured enabled waiting for switchback recovery 4 entries were displayed. cluster_B::>
-
Perform the aggregates healing phase:
metrocluster heal -phase aggregates
The following output shows a typical aggregates healing operation.
cluster_B::*> metrocluster heal -phase aggregates [Job 647] Job succeeded: Heal Aggregates is successful. cluster_B::*> metrocluster operation show Operation: heal-aggregates State: successful Start Time: 10/26/2017 12:01:15 End Time: 10/26/2017 12:01:17 Errors: - cluster_B::*>
-
Verify that aggregate healing has completed and the disaster site is ready for switchback:
metrocluster node show
The following output shows that the "heal aggregates" phase has completed on cluster_A.
cluster_B::> metrocluster node show DR Configuration DR Group Cluster Node State Mirroring Mode ----- ------- ------------------ -------------- --------- -------------------- 1 cluster_A node_A_1 configured enabled heal aggregates completed node_A_2 configured enabled heal aggregates completed cluster_B node_B_1 configured enabled waiting for switchback recovery node_B_2 configured enabled waiting for switchback recovery 4 entries were displayed. cluster_B::>
-
-
If disks have been replaced, you must mirror the local and switched-over aggregates:
-
Display the aggregates:
storage aggregate show
cluster_B::> storage aggregate show cluster_B Aggregates: Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ node_B_1_aggr0 1.49TB 74.12GB 95% online 1 node_B_1 raid4, normal node_B_2_aggr0 1.49TB 74.12GB 95% online 1 node_B_2 raid4, normal node_B_1_aggr1 3.14TB 3.04TB 3% online 15 node_B_1 raid_dp, normal node_B_1_aggr2 3.14TB 3.06TB 3% online 14 node_B_1 raid_tec, normal node_B_1_aggr1 3.14TB 2.99TB 5% online 37 node_B_2 raid_dp, normal node_B_1_aggr2 3.14TB 3.02TB 4% online 35 node_B_2 raid_tec, normal cluster_A Switched Over Aggregates: Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ node_A_1_aggr1 2.36TB 2.12TB 10% online 91 node_B_1 raid_dp, normal node_A_1_aggr2 3.14TB 2.90TB 8% online 90 node_B_1 raid_tec, normal node_A_2_aggr1 2.36TB 2.10TB 11% online 91 node_B_2 raid_dp, normal node_A_2_aggr2 3.14TB 2.89TB 8% online 90 node_B_2 raid_tec, normal 12 entries were displayed. cluster_B::>
-
Mirror the aggregate:
storage aggregate mirror -aggregate aggregate-name
The following output shows a typical mirroring operation.
cluster_B::> storage aggregate mirror -aggregate node_B_1_aggr1 Info: Disks would be added to aggregate "node_B_1_aggr1" on node "node_B_1" in the following manner: Second Plex RAID Group rg0, 6 disks (block checksum, raid_dp) Position Disk Type Size ---------- ------------------------- ---------- --------------- dparity 5.20.6 SSD - parity 5.20.14 SSD - data 5.21.1 SSD 894.0GB data 5.21.3 SSD 894.0GB data 5.22.3 SSD 894.0GB data 5.21.13 SSD 894.0GB Aggregate capacity available for volume use would be 2.99TB. Do you want to continue? {y|n}: y
-
Repeat the previous step for each of the aggregates from the surviving site.
-
Wait for the aggregates to resynchronize; you can check the status with the
storage aggregate show
command.The following output shows that a number of aggregates are resynchronizing.
cluster_B::> storage aggregate show cluster_B Aggregates: Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ node_B_1_aggr0 1.49TB 74.12GB 95% online 1 node_B_1 raid4, mirrored, normal node_B_2_aggr0 1.49TB 74.12GB 95% online 1 node_B_2 raid4, mirrored, normal node_B_1_aggr1 2.86TB 2.76TB 4% online 15 node_B_1 raid_dp, resyncing node_B_1_aggr2 2.89TB 2.81TB 3% online 14 node_B_1 raid_tec, resyncing node_B_2_aggr1 2.73TB 2.58TB 6% online 37 node_B_2 raid_dp, resyncing node_B-2_aggr2 2.83TB 2.71TB 4% online 35 node_B_2 raid_tec, resyncing cluster_A Switched Over Aggregates: Aggregate Size Available Used% State #Vols Nodes RAID Status --------- -------- --------- ----- ------- ------ ---------------- ------------ node_A_1_aggr1 1.86TB 1.62TB 13% online 91 node_B_1 raid_dp, resyncing node_A_1_aggr2 2.58TB 2.33TB 10% online 90 node_B_1 raid_tec, resyncing node_A_2_aggr1 1.79TB 1.53TB 14% online 91 node_B_2 raid_dp, resyncing node_A_2_aggr2 2.64TB 2.39TB 9% online 90 node_B_2 raid_tec, resyncing 12 entries were displayed.
-
Confirm that all aggregates are online and have resynchronized:
storage aggregate plex show
The following output shows that all aggregates have resynchronized.
cluster_A::> storage aggregate plex show () Is Is Resyncing Aggregate Plex Online Resyncing Percent Status --------- --------- ------- ---------- --------- --------------- node_B_1_aggr0 plex0 true false - normal,active node_B_1_aggr0 plex8 true false - normal,active node_B_2_aggr0 plex0 true false - normal,active node_B_2_aggr0 plex8 true false - normal,active node_B_1_aggr1 plex0 true false - normal,active node_B_1_aggr1 plex9 true false - normal,active node_B_1_aggr2 plex0 true false - normal,active node_B_1_aggr2 plex5 true false - normal,active node_B_2_aggr1 plex0 true false - normal,active node_B_2_aggr1 plex9 true false - normal,active node_B_2_aggr2 plex0 true false - normal,active node_B_2_aggr2 plex5 true false - normal,active node_A_1_aggr1 plex4 true false - normal,active node_A_1_aggr1 plex8 true false - normal,active node_A_1_aggr2 plex1 true false - normal,active node_A_1_aggr2 plex5 true false - normal,active node_A_2_aggr1 plex4 true false - normal,active node_A_2_aggr1 plex8 true false - normal,active node_A_2_aggr2 plex1 true false - normal,active node_A_2_aggr2 plex5 true false - normal,active 20 entries were displayed.
-
-
On systems running ONTAP 9.5 and earlier, perform the root-aggregates healing phase:
metrocluster heal -phase root-aggregates
cluster_B::> metrocluster heal -phase root-aggregates [Job 651] Job is queued: MetroCluster Heal Root Aggregates Job.Oct 26 13:05:00 [Job 651] Job succeeded: Heal Root Aggregates is successful.
-
Verify that the "heal roots" phase has completed and the disaster site is ready for switchback:
The following output shows that the "heal roots" phase has completed on cluster_A.
cluster_B::> metrocluster node show DR Configuration DR Group Cluster Node State Mirroring Mode ----- ------- ------------------ -------------- --------- -------------------- 1 cluster_A node_A_1 configured enabled heal roots completed node_A_2 configured enabled heal roots completed cluster_B node_B_1 configured enabled waiting for switchback recovery node_B_2 configured enabled waiting for switchback recovery 4 entries were displayed. cluster_B::>
Proceed to verify the licenses on the replaced nodes.