Performing aggregate healing and restoring mirrors (MetroCluster IP configurations)

02/03/2022 Contributors

After replacing hardware and assigning disks, in systems running ONTAP 9.5 or earlier you can perform the MetroCluster healing operations. In all versions of ONTAP, you must then confirm that aggregates are mirrored and, if necessary, restart mirroring.

About this task

Beginning with ONTAP 9.6, the healing operations are performed automatically when the disaster site nodes boot up. The healing commands are not required.

These steps are performed on the surviving cluster.

Steps

If you are using ONTAP 9.6 or later, you must verify that automatic healing completed successfully:

Confirm that the heal-aggr-auto and heal-root-aggr-auto operations completed:

metrocluster operation history show

The following output shows that the operations have completed successfully on cluster_A.

cluster_B::*> metrocluster operation history show
Operation                     State          Start Time       End Time
----------------------------- -------------- ---------------- ----------------
heal-root-aggr-auto           successful      2/25/2019 06:45:58
                                                              2/25/2019 06:46:02
heal-aggr-auto                successful     2/25/2019 06:45:48
                                                              2/25/2019 06:45:52
.
.
.

Confirm that the disaster site is ready for switchback:

metrocluster node show

The following output shows that the operations have completed successfully on cluster_A.

cluster_B::*> metrocluster node show
DR                          Configuration  DR
Group Cluster Node          State          Mirroring Mode
----- ------- ------------- -------------- --------- --------------------
1     cluster_A
              node_A_1      configured     enabled   heal roots completed
              node_A_2      configured     enabled   heal roots completed
      cluster_B
              node_B_1      configured     enabled   waiting for switchback recovery
              node_B_2      configured     enabled   waiting for switchback recovery
4 entries were displayed.

If you are using ONTAP 9.5 or earlier, you must perform aggregate healing:

Verify the state of the nodes:

metrocluster node show

The following output shows that switchover has completed, so healing can be performed.

cluster_B::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster_B
              node_B_1           configured     enabled   switchover completed
              node_B_2           configured     enabled   switchover completed
      cluster_A
              node_A_1           configured     enabled   waiting for switchback recovery
              node_A_2           configured     enabled   waiting for switchback recovery
4 entries were displayed.

cluster_B::>

Perform the aggregates healing phase:

metrocluster heal -phase aggregates

The following output shows a typical aggregates healing operation.

cluster_B::*> metrocluster heal -phase aggregates
[Job 647] Job succeeded: Heal Aggregates is successful.

cluster_B::*> metrocluster operation show
  Operation: heal-aggregates
      State: successful
 Start Time: 10/26/2017 12:01:15
   End Time: 10/26/2017 12:01:17
     Errors: -

cluster_B::*>

Verify that aggregate healing has completed and the disaster site is ready for switchback:

metrocluster node show

The following output shows that the "heal aggregates" phase has completed on cluster_A.

cluster_B::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster_A
              node_A_1           configured     enabled   heal aggregates completed
              node_A_2           configured     enabled   heal aggregates completed
      cluster_B
              node_B_1           configured     enabled   waiting for switchback recovery
              node_B_2           configured     enabled   waiting for switchback recovery
4 entries were displayed.

cluster_B::>

If disks have been replaced, you must mirror the local and switched-over aggregates:

Display the aggregates:

storage aggregate show

cluster_B::> storage aggregate show
cluster_B Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_B_1_aggr0 1.49TB  74.12GB   95% online     1 node_B_1         raid4,
                                                                   normal
node_B_2_aggr0 1.49TB  74.12GB   95% online     1 node_B_2         raid4,
                                                                   normal
node_B_1_aggr1 3.14TB  3.04TB    3% online     15 node_B_1         raid_dp,
                                                                   normal
node_B_1_aggr2 3.14TB  3.06TB    3% online     14 node_B_1         raid_tec,
                                                                   normal
node_B_1_aggr1 3.14TB  2.99TB    5% online     37 node_B_2         raid_dp,
                                                                   normal
node_B_1_aggr2 3.14TB  3.02TB    4% online     35 node_B_2         raid_tec,
                                                                   normal

cluster_A Switched Over Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_aggr1 2.36TB  2.12TB   10% online     91 node_B_1         raid_dp,
                                                                   normal
node_A_1_aggr2 3.14TB  2.90TB    8% online     90 node_B_1         raid_tec,
                                                                   normal
node_A_2_aggr1 2.36TB  2.10TB   11% online     91 node_B_2         raid_dp,
                                                                   normal
node_A_2_aggr2 3.14TB  2.89TB    8% online     90 node_B_2         raid_tec,
                                                                   normal
12 entries were displayed.

cluster_B::>

Mirror the aggregate:

storage aggregate mirror -aggregate aggregate-name

The following output shows a typical mirroring operation.

cluster_B::> storage aggregate mirror -aggregate node_B_1_aggr1

Info: Disks would be added to aggregate "node_B_1_aggr1" on node "node_B_1" in
      the following manner:

      Second Plex

        RAID Group rg0, 6 disks (block checksum, raid_dp)
          Position   Disk                      Type                  Size
          ---------- ------------------------- ---------- ---------------
          dparity    5.20.6                    SSD                      -
          parity     5.20.14                   SSD                      -
          data       5.21.1                    SSD                894.0GB
          data       5.21.3                    SSD                894.0GB
          data       5.22.3                    SSD                894.0GB
          data       5.21.13                   SSD                894.0GB

      Aggregate capacity available for volume use would be 2.99TB.

Do you want to continue? {y|n}: y

Repeat the previous step for each of the aggregates from the surviving site.

Wait for the aggregates to resynchronize; you can check the status with the storage aggregate show command.

The following output shows that a number of aggregates are resynchronizing.

cluster_B::> storage aggregate show

cluster_B Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_B_1_aggr0 1.49TB  74.12GB   95% online     1 node_B_1         raid4,
                                                                   mirrored,
                                                                   normal
node_B_2_aggr0 1.49TB  74.12GB   95% online     1 node_B_2         raid4,
                                                                   mirrored,
                                                                   normal
node_B_1_aggr1 2.86TB  2.76TB    4% online     15 node_B_1         raid_dp,
                                                                   resyncing
node_B_1_aggr2 2.89TB  2.81TB    3% online     14 node_B_1         raid_tec,
                                                                   resyncing
node_B_2_aggr1 2.73TB  2.58TB    6% online     37 node_B_2         raid_dp,
                                                                   resyncing
node_B-2_aggr2 2.83TB  2.71TB    4% online     35 node_B_2         raid_tec,
                                                                   resyncing

cluster_A Switched Over Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_aggr1 1.86TB  1.62TB   13% online     91 node_B_1         raid_dp,
                                                                   resyncing
node_A_1_aggr2 2.58TB  2.33TB   10% online     90 node_B_1         raid_tec,
                                                                   resyncing
node_A_2_aggr1 1.79TB  1.53TB   14% online     91 node_B_2         raid_dp,
                                                                   resyncing
node_A_2_aggr2 2.64TB  2.39TB    9% online     90 node_B_2         raid_tec,
                                                                   resyncing
12 entries were displayed.

Confirm that all aggregates are online and have resynchronized:

storage aggregate plex show

The following output shows that all aggregates have resynchronized.

cluster_A::> storage aggregate plex show
  ()
                    Is      Is         Resyncing
Aggregate Plex      Online  Resyncing    Percent Status
--------- --------- ------- ---------- --------- ---------------
node_B_1_aggr0 plex0 true    false              - normal,active
node_B_1_aggr0 plex8 true    false              - normal,active
node_B_2_aggr0 plex0 true    false              - normal,active
node_B_2_aggr0 plex8 true    false              - normal,active
node_B_1_aggr1 plex0 true    false              - normal,active
node_B_1_aggr1 plex9 true    false              - normal,active
node_B_1_aggr2 plex0 true    false              - normal,active
node_B_1_aggr2 plex5 true    false              - normal,active
node_B_2_aggr1 plex0 true    false              - normal,active
node_B_2_aggr1 plex9 true    false              - normal,active
node_B_2_aggr2 plex0 true    false              - normal,active
node_B_2_aggr2 plex5 true    false              - normal,active
node_A_1_aggr1 plex4 true    false              - normal,active
node_A_1_aggr1 plex8 true    false              - normal,active
node_A_1_aggr2 plex1 true    false              - normal,active
node_A_1_aggr2 plex5 true    false              - normal,active
node_A_2_aggr1 plex4 true    false              - normal,active
node_A_2_aggr1 plex8 true    false              - normal,active
node_A_2_aggr2 plex1 true    false              - normal,active
node_A_2_aggr2 plex5 true    false              - normal,active
20 entries were displayed.

On systems running ONTAP 9.5 and earlier, perform the root-aggregates healing phase:

metrocluster heal -phase root-aggregates

cluster_B::> metrocluster heal -phase root-aggregates
[Job 651] Job is queued: MetroCluster Heal Root Aggregates Job.Oct 26 13:05:00
[Job 651] Job succeeded: Heal Root Aggregates is successful.

Verify that the "heal roots" phase has completed and the disaster site is ready for switchback:

The following output shows that the "heal roots" phase has completed on cluster_A.

cluster_B::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster_A
              node_A_1           configured     enabled   heal roots completed
              node_A_2           configured     enabled   heal roots completed
      cluster_B
              node_B_1           configured     enabled   waiting for switchback recovery
              node_B_2           configured     enabled   waiting for switchback recovery
4 entries were displayed.

cluster_B::>

Proceed to verify the licenses on the replaced nodes.

Verifying licenses on the replaced nodes

Performing aggregate healing and restoring mirrors (MetroCluster IP configurations)

Creating your file...