Skip to main content
ONTAP MetroCluster

Performing aggregate healing and restoring mirrors (MetroCluster IP configurations)

Contributors netapp-thomi netapp-pcarriga

After replacing hardware and assigning disks, in systems running ONTAP 9.5 or earlier you can perform the MetroCluster healing operations. In all versions of ONTAP, you must then confirm that aggregates are mirrored and, if necessary, restart mirroring.

About this task

Beginning with ONTAP 9.6, the healing operations are performed automatically when the disaster site nodes boot up. The healing commands are not required.

These steps are performed on the surviving cluster.

Steps
  1. If you are using ONTAP 9.6 or later, you must verify that automatic healing completed successfully:

    1. Confirm that the heal-aggr-auto and heal-root-aggr-auto operations completed:

      metrocluster operation history show

      The following output shows that the operations have completed successfully on cluster_A.

      cluster_B::*> metrocluster operation history show
      Operation                     State          Start Time       End Time
      ----------------------------- -------------- ---------------- ----------------
      heal-root-aggr-auto           successful      2/25/2019 06:45:58
                                                                    2/25/2019 06:46:02
      heal-aggr-auto                successful     2/25/2019 06:45:48
                                                                    2/25/2019 06:45:52
      .
      .
      .
    2. Confirm that the disaster site is ready for switchback:

      metrocluster node show

      The following output shows that the operations have completed successfully on cluster_A.

      cluster_B::*> metrocluster node show
      DR                          Configuration  DR
      Group Cluster Node          State          Mirroring Mode
      ----- ------- ------------- -------------- --------- --------------------
      1     cluster_A
                    node_A_1      configured     enabled   heal roots completed
                    node_A_2      configured     enabled   heal roots completed
            cluster_B
                    node_B_1      configured     enabled   waiting for switchback recovery
                    node_B_2      configured     enabled   waiting for switchback recovery
      4 entries were displayed.
  2. If you are using ONTAP 9.5 or earlier, you must perform aggregate healing:

    1. Verify the state of the nodes:

      metrocluster node show

      The following output shows that switchover has completed, so healing can be performed.

      cluster_B::> metrocluster node show
      DR                               Configuration  DR
      Group Cluster Node               State          Mirroring Mode
      ----- ------- ------------------ -------------- --------- --------------------
      1     cluster_B
                    node_B_1           configured     enabled   switchover completed
                    node_B_2           configured     enabled   switchover completed
            cluster_A
                    node_A_1           configured     enabled   waiting for switchback recovery
                    node_A_2           configured     enabled   waiting for switchback recovery
      4 entries were displayed.
      
      cluster_B::>
    2. Perform the aggregates healing phase:

      metrocluster heal -phase aggregates

      The following output shows a typical aggregates healing operation.

      cluster_B::*> metrocluster heal -phase aggregates
      [Job 647] Job succeeded: Heal Aggregates is successful.
      
      cluster_B::*> metrocluster operation show
        Operation: heal-aggregates
            State: successful
       Start Time: 10/26/2017 12:01:15
         End Time: 10/26/2017 12:01:17
           Errors: -
      
      cluster_B::*>
    3. Verify that aggregate healing has completed and the disaster site is ready for switchback:

      metrocluster node show

      The following output shows that the "heal aggregates" phase has completed on cluster_A.

      cluster_B::> metrocluster node show
      DR                               Configuration  DR
      Group Cluster Node               State          Mirroring Mode
      ----- ------- ------------------ -------------- --------- --------------------
      1     cluster_A
                    node_A_1           configured     enabled   heal aggregates completed
                    node_A_2           configured     enabled   heal aggregates completed
            cluster_B
                    node_B_1           configured     enabled   waiting for switchback recovery
                    node_B_2           configured     enabled   waiting for switchback recovery
      4 entries were displayed.
      
      cluster_B::>
  3. If disks have been replaced, you must mirror the local and switched-over aggregates:

    1. Display the aggregates:

      storage aggregate show

      cluster_B::> storage aggregate show
      cluster_B Aggregates:
      Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_B_1_aggr0 1.49TB  74.12GB   95% online     1 node_B_1         raid4,
                                                                         normal
      node_B_2_aggr0 1.49TB  74.12GB   95% online     1 node_B_2         raid4,
                                                                         normal
      node_B_1_aggr1 3.14TB  3.04TB    3% online     15 node_B_1         raid_dp,
                                                                         normal
      node_B_1_aggr2 3.14TB  3.06TB    3% online     14 node_B_1         raid_tec,
                                                                         normal
      node_B_1_aggr1 3.14TB  2.99TB    5% online     37 node_B_2         raid_dp,
                                                                         normal
      node_B_1_aggr2 3.14TB  3.02TB    4% online     35 node_B_2         raid_tec,
                                                                         normal
      
      cluster_A Switched Over Aggregates:
      Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_A_1_aggr1 2.36TB  2.12TB   10% online     91 node_B_1         raid_dp,
                                                                         normal
      node_A_1_aggr2 3.14TB  2.90TB    8% online     90 node_B_1         raid_tec,
                                                                         normal
      node_A_2_aggr1 2.36TB  2.10TB   11% online     91 node_B_2         raid_dp,
                                                                         normal
      node_A_2_aggr2 3.14TB  2.89TB    8% online     90 node_B_2         raid_tec,
                                                                         normal
      12 entries were displayed.
      
      cluster_B::>
    2. Mirror the aggregate:

      storage aggregate mirror -aggregate aggregate-name

      The following output shows a typical mirroring operation.

      cluster_B::> storage aggregate mirror -aggregate node_B_1_aggr1
      
      Info: Disks would be added to aggregate "node_B_1_aggr1" on node "node_B_1" in
            the following manner:
      
            Second Plex
      
              RAID Group rg0, 6 disks (block checksum, raid_dp)
                Position   Disk                      Type                  Size
                ---------- ------------------------- ---------- ---------------
                dparity    5.20.6                    SSD                      -
                parity     5.20.14                   SSD                      -
                data       5.21.1                    SSD                894.0GB
                data       5.21.3                    SSD                894.0GB
                data       5.22.3                    SSD                894.0GB
                data       5.21.13                   SSD                894.0GB
      
            Aggregate capacity available for volume use would be 2.99TB.
      
      Do you want to continue? {y|n}: y
    3. Repeat the previous step for each of the aggregates from the surviving site.

    4. Wait for the aggregates to resynchronize; you can check the status with the storage aggregate show command.

      The following output shows that a number of aggregates are resynchronizing.

      cluster_B::> storage aggregate show
      
      cluster_B Aggregates:
      Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_B_1_aggr0 1.49TB  74.12GB   95% online     1 node_B_1         raid4,
                                                                         mirrored,
                                                                         normal
      node_B_2_aggr0 1.49TB  74.12GB   95% online     1 node_B_2         raid4,
                                                                         mirrored,
                                                                         normal
      node_B_1_aggr1 2.86TB  2.76TB    4% online     15 node_B_1         raid_dp,
                                                                         resyncing
      node_B_1_aggr2 2.89TB  2.81TB    3% online     14 node_B_1         raid_tec,
                                                                         resyncing
      node_B_2_aggr1 2.73TB  2.58TB    6% online     37 node_B_2         raid_dp,
                                                                         resyncing
      node_B-2_aggr2 2.83TB  2.71TB    4% online     35 node_B_2         raid_tec,
                                                                         resyncing
      
      cluster_A Switched Over Aggregates:
      Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
      --------- -------- --------- ----- ------- ------ ---------------- ------------
      node_A_1_aggr1 1.86TB  1.62TB   13% online     91 node_B_1         raid_dp,
                                                                         resyncing
      node_A_1_aggr2 2.58TB  2.33TB   10% online     90 node_B_1         raid_tec,
                                                                         resyncing
      node_A_2_aggr1 1.79TB  1.53TB   14% online     91 node_B_2         raid_dp,
                                                                         resyncing
      node_A_2_aggr2 2.64TB  2.39TB    9% online     90 node_B_2         raid_tec,
                                                                         resyncing
      12 entries were displayed.
    5. Confirm that all aggregates are online and have resynchronized:

      storage aggregate plex show

      The following output shows that all aggregates have resynchronized.

      cluster_A::> storage aggregate plex show
        ()
                          Is      Is         Resyncing
      Aggregate Plex      Online  Resyncing    Percent Status
      --------- --------- ------- ---------- --------- ---------------
      node_B_1_aggr0 plex0 true    false              - normal,active
      node_B_1_aggr0 plex8 true    false              - normal,active
      node_B_2_aggr0 plex0 true    false              - normal,active
      node_B_2_aggr0 plex8 true    false              - normal,active
      node_B_1_aggr1 plex0 true    false              - normal,active
      node_B_1_aggr1 plex9 true    false              - normal,active
      node_B_1_aggr2 plex0 true    false              - normal,active
      node_B_1_aggr2 plex5 true    false              - normal,active
      node_B_2_aggr1 plex0 true    false              - normal,active
      node_B_2_aggr1 plex9 true    false              - normal,active
      node_B_2_aggr2 plex0 true    false              - normal,active
      node_B_2_aggr2 plex5 true    false              - normal,active
      node_A_1_aggr1 plex4 true    false              - normal,active
      node_A_1_aggr1 plex8 true    false              - normal,active
      node_A_1_aggr2 plex1 true    false              - normal,active
      node_A_1_aggr2 plex5 true    false              - normal,active
      node_A_2_aggr1 plex4 true    false              - normal,active
      node_A_2_aggr1 plex8 true    false              - normal,active
      node_A_2_aggr2 plex1 true    false              - normal,active
      node_A_2_aggr2 plex5 true    false              - normal,active
      20 entries were displayed.
  4. On systems running ONTAP 9.5 and earlier, perform the root-aggregates healing phase:

    metrocluster heal -phase root-aggregates

    cluster_B::> metrocluster heal -phase root-aggregates
    [Job 651] Job is queued: MetroCluster Heal Root Aggregates Job.Oct 26 13:05:00
    [Job 651] Job succeeded: Heal Root Aggregates is successful.
  5. Verify that the "heal roots" phase has completed and the disaster site is ready for switchback:

    The following output shows that the "heal roots" phase has completed on cluster_A.

    cluster_B::> metrocluster node show
    DR                               Configuration  DR
    Group Cluster Node               State          Mirroring Mode
    ----- ------- ------------------ -------------- --------- --------------------
    1     cluster_A
                  node_A_1           configured     enabled   heal roots completed
                  node_A_2           configured     enabled   heal roots completed
          cluster_B
                  node_B_1           configured     enabled   waiting for switchback recovery
                  node_B_2           configured     enabled   waiting for switchback recovery
    4 entries were displayed.
    
    cluster_B::>

Proceed to verify the licenses on the replaced nodes.