简体中文版经机器翻译而成，仅供参考。如与英语版出现任何冲突，应以英语版为准。

测试 MetroCluster 配置

04/13/2022 贡献者

PDF

您可以测试故障情形，以确认 MetroCluster 配置是否正常运行。

验证协商的切换

您可以测试协商（计划内）切换操作，以确认数据不会中断。

此测试通过将集群切换到第二个数据中心来验证数据可用性不受影响（ Microsoft 服务器消息块（ SMB ）和 Solaris 光纤通道协议除外）。

此测试需要大约 30 分钟。

此操作步骤具有以下预期结果：

MetroCluster switchover 命令将显示警告提示符。

如果对提示符回答 ` * 是 *` ，则发出命令的站点将切换配对站点。

对于 MetroCluster IP 配置：

对于 ONTAP 9.4 及更早版本：
- 在协商切换后，镜像聚合将降级。
对于 ONTAP 9.5 及更高版本：
- 如果可以访问远程存储，则镜像聚合将保持正常状态。
- 如果无法访问远程存储，则在协商切换后，镜像聚合将降级。
对于 ONTAP 9.8 及更高版本：
- 如果无法访问远程存储，则位于灾难站点的未镜像聚合将不可用。这可能会导致控制器中断。

步骤

确认所有节点均处于已配置状态和正常模式：

MetroCluster node show

cluster_A::>  metrocluster node show

Cluster                        Configuration State    Mode
------------------------------ ---------------------- ------------------------
 Local: cluster_A               configured             normal
Remote: cluster_B               configured             normal

开始切换操作：

MetroCluster switchover

cluster_A::> metrocluster switchover
Warning: negotiated switchover is about to start. It will stop all the data Vservers on cluster "cluster_B" and
automatically re-start them on cluster "cluster_A". It will finally gracefully shutdown cluster "cluster_B".

确认本地集群处于已配置状态和切换模式：

MetroCluster node show

cluster_A::>  metrocluster node show

Cluster                        Configuration State    Mode
------------------------------ ---------------------- ------------------------
Local: cluster_A                configured             switchover
Remote: cluster_B               not-reachable          -
              configured             normal

确认切换操作已成功：

MetroCluster 操作显示

cluster_A::>  metrocluster operation show

cluster_A::> metrocluster operation show
  Operation: switchover
      State: successful
 Start Time: 2/6/2016 13:28:50
   End Time: 2/6/2016 13:29:41
     Errors: -

使用 vserver show 和 network interface show 命令验证灾难恢复 SVM 和 LIF 是否已联机。

验证修复和手动切回

您可以通过在协商切换后将集群切回原始数据中心来测试修复和手动切回操作，以验证数据可用性不受影响（ SMB 和 Solaris FC 配置除外）。

此测试需要大约 30 分钟。

此操作步骤的预期结果是，应将服务切换回其主节点。

步骤

验证修复是否已完成：

MetroCluster node show

以下示例显示了命令的成功完成：

cluster_A::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster_A
              node_A_1         configured     enabled   heal roots completed
      cluster_B
              node_B_2         unreachable    -         switched over
42 entries were displayed.metrocluster operation show

m所有聚合均已 ：

s存储聚合显示

以下示例显示所有聚合的 RAID 状态均为已镜像：

cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate Size     Available Used% State   #Vols  Nodes       RAID Status
--------- -------- --------- ----- ------- ------ ----------- ------------
data_cluster
            4.19TB    4.13TB    2% online       8 node_A_1    raid_dp,
                                                              mirrored,
                                                              normal
root_cluster
           715.5GB   212.7GB   70% online       1 node_A_1    raid4,
                                                              mirrored,
                                                              normal
cluster_B Switched Over Aggregates:
Aggregate Size     Available Used% State   #Vols  Nodes       RAID Status
--------- -------- --------- ----- ------- ------ ----------- ------------
data_cluster_B
            4.19TB    4.11TB    2% online       5 node_A_1    raid_dp,
                                                              mirrored,
                                                              normal
root_cluster_B    -         -     - unknown      - node_A_1   -

从灾难站点启动节点。

检查切回恢复的状态：

MetroCluster node show

cluster_A::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster_A
             node_A_1            configured     enabled   heal roots completed
      cluster_B
             node_B_2            configured     enabled   waiting for switchback
                                                          recovery
2 entries were displayed.

执行切回：

MetroCluster 切回

cluster_A::> metrocluster switchback
[Job 938] Job succeeded: Switchback is successful.Verify switchback

确认节点的状态：

MetroCluster node show

cluster_A::> metrocluster node show
DR                               Configuration  DR
Group Cluster Node               State          Mirroring Mode
----- ------- ------------------ -------------- --------- --------------------
1     cluster_A
              node_A_1         configured     enabled   normal
      cluster_B
              node_B_2         configured     enabled   normal

2 entries were displayed.

确认状态：

MetroCluster 操作显示

输出应显示成功状态。

cluster_A::> metrocluster operation show
  Operation: switchback
      State: successful
 Start Time: 2/6/2016 13:54:25
   End Time: 2/6/2016 13:56:15
     Errors: -

丢失一个 FC-SAS 网桥

您可以测试单个 FC-SAS 网桥的故障，以确保不存在单点故障。

此测试需要大约 15 分钟。

此操作步骤具有以下预期结果：

关闭网桥时，应生成错误。
不应发生故障转移或服务丢失。
只能通过一条路径从控制器模块连接到网桥后面的驱动器。

从 ONTAP 9.8 开始， storage bridge 命令将替换为 ssystem bridge 。以下步骤显示了 storage bridge 命令，但如果您运行的是 ONTAP 9.8 或更高版本，则首选使用 ssystem bridge 命令。

步骤

关闭网桥的电源。

确认网桥监控指示出现错误：

storage bridge show

cluster_A::> storage bridge show

                                                            Is        Monitor
Bridge     Symbolic Name Vendor  Model     Bridge WWN       Monitored Status
---------- ------------- ------- --------- ---------------- --------- -------
ATTO_10.65.57.145
	     bridge_A_1    Atto    FibreBridge 6500N
                                           200000108662d46c true      error

确认网桥后面的驱动器可通过一条路径使用：

s存储磁盘错误显示

cluster_A::> storage disk error show
Disk             Error Type        Error Text
---------------- ----------------- --------------------------------------------
1.0.0            onedomain         1.0.0 (5000cca057729118): All paths to this array LUN are connected to the same fault domain. This is a single point of failure.
1.0.1            onedomain         1.0.1 (5000cca057727364): All paths to this array LUN are connected to the same fault domain. This is a single point of failure.
1.0.2            onedomain         1.0.2 (5000cca05772e9d4): All paths to this array LUN are connected to the same fault domain. This is a single point of failure.
...
1.0.23           onedomain         1.0.23 (5000cca05772e9d4): All paths to this array LUN are connected to the same fault domain. This is a single point of failure.

在电源线中断后验证操作

您可以测试 MetroCluster 配置对 PDU 故障的响应。

最佳做法是，将组件中的每个电源设备（ PSU ）连接到单独的电源。如果两个 PSU 都连接到同一个配电单元（ PDU ），并且发生电气中断，则站点可能会关闭，并且整个磁盘架可能不可用。测试一条电源线故障，以确认没有布线不匹配，从而发生原因可能导致服务中断。

此测试需要大约 15 分钟。

此测试需要关闭所有左侧 PDU 的电源，然后关闭包含 MetroCluster 组件的所有机架上的所有右侧 PDU 的电源。

此操作步骤具有以下预期结果：

当 PDU 断开连接时，应生成错误。
不应发生故障转移或服务丢失。

步骤

关闭包含 MetroCluster 组件的机架左侧 PDU 的电源。

使用 ssystem environment sensors show -state fault 和 storage shelf show -errors 命令在控制台上监控结果。

cluster_A::> system environment sensors show -state fault

Node Sensor 			State Value/Units Crit-Low Warn-Low Warn-Hi Crit-Hi
---- --------------------- ------ ----------- -------- -------- ------- -------
node_A_1
		PSU1 			fault
							PSU_OFF
		PSU1 Pwr In OK 	fault
							FAULT
node_A_2
		PSU1 			fault
							PSU_OFF
		PSU1 Pwr In OK 	fault
							FAULT
4 entries were displayed.

cluster_A::> storage shelf show -errors
    Shelf Name: 1.1
     Shelf UID: 50:0a:09:80:03:6c:44:d5
 Serial Number: SHFHU1443000059

Error Type          Description
------------------  ---------------------------
Power               Critical condition is detected in storage shelf power supply unit "1". The unit might fail.Reconnect PSU1

重新打开左侧 PDU 的电源。
确保 ONTAP 清除错误情况。
对右侧 PDU 重复上述步骤。

在丢失一个存储架后验证操作

您可以测试单个存储架的故障，以验证是否没有单点故障。

此操作步骤具有以下预期结果：

监控软件应报告错误消息。
不应发生故障转移或服务丢失。
硬件故障恢复后，镜像重新同步将自动启动。

步骤

检查存储故障转移状态：

s存储故障转移显示

cluster_A::> storage failover show

Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
node_A_1       node_A_2       true     Connected to node_A_2
node_A_2       node_A_1       true     Connected to node_A_1
2 entries were displayed.

检查聚合状态：

s存储聚合显示

cluster_A::> storage aggregate show

cluster Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
            4.15TB    3.40TB   18% online       3 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_1root
           707.7GB   34.29GB   95% online       1 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_2_data01_mirrored
            4.15TB    4.12TB    1% online       2 node_A_2       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_2_data02_unmirrored
            2.18TB    2.18TB    0% online       1 node_A_2       raid_dp,
                                                                   normal
node_A_2_root
           707.7GB   34.27GB   95% online       1 node_A_2       raid_dp,
                                                                   mirrored,
                                                                   normal

验证所有数据 SVM 和数据卷是否均已联机并提供数据：

vserver show -type data

network interface show -fields is-home false

volume show ！ vol0 ，！ mdv*

cluster_A::> vserver show -type data

cluster_A::> vserver show -type data
                               Admin      Operational Root
Vserver     Type    Subtype    State      State       Volume     Aggregate
----------- ------- ---------- ---------- ----------- ---------- ----------
SVM1        data    sync-source           running     SVM1_root  node_A_1_data01_mirrored
SVM2        data    sync-source	          running     SVM2_root  node_A_2_data01_mirrored

cluster_A::> network interface show -fields is-home false
There are no entries matching your query.

cluster_A::> volume show !vol0,!MDV*
Vserver   Volume       Aggregate    State      Type       Size  Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
SVM1
          SVM1_root
                       node_A_1data01_mirrored
                                    online     RW         10GB     9.50GB    5%
SVM1
          SVM1_data_vol
                       node_A_1data01_mirrored
                                    online     RW         10GB     9.49GB    5%
SVM2
          SVM2_root
                       node_A_2_data01_mirrored
                                    online     RW         10GB     9.49GB    5%
SVM2
          SVM2_data_vol
                       node_A_2_data02_unmirrored
                                    online     RW          1GB    972.6MB    5%

确定池 1 中用于节点 node_A_2 的磁盘架以关闭电源以模拟突然发生的硬件故障：

storage aggregate show -r -node node-name ！ * root

您选择的磁盘架必须包含镜像数据聚合中的驱动器。

在以下示例中，选择磁盘架 ID 31 失败。

cluster_A::> storage aggregate show -r -node node_A_2 !*root
Owner Node: node_A_2
 Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirrored) (block checksums)
  Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
   RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  2.30.3                       0   BSAS    7200  827.7GB  828.0GB (normal)
     parity   2.30.4                       0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.6                       0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.8                       0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.5                       0   BSAS    7200  827.7GB  828.0GB (normal)

  Plex: /node_A_2_data01_mirrored/plex4 (online, normal, active, pool1)
   RAID Group /node_A_2_data01_mirrored/plex4/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  1.31.7                       1   BSAS    7200  827.7GB  828.0GB (normal)
     parity   1.31.6                       1   BSAS    7200  827.7GB  828.0GB (normal)
     data     1.31.3                       1   BSAS    7200  827.7GB  828.0GB (normal)
     data     1.31.4                       1   BSAS    7200  827.7GB  828.0GB (normal)
     data     1.31.5                       1   BSAS    7200  827.7GB  828.0GB (normal)

 Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
  Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
   RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  2.30.12                      0   BSAS    7200  827.7GB  828.0GB (normal)
     parity   2.30.22                      0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.21                      0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.20                      0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.14                      0   BSAS    7200  827.7GB  828.0GB (normal)
15 entries were displayed.

物理关闭选定磁盘架的电源。

再次检查聚合状态：

s存储聚合

storage aggregate show -r -node node_A_2 ！ * root

驱动器位于已关闭电源架上的聚合应具有 degraded RAID 状态，而受影响丛上的驱动器应具有 "`Failed` " 状态，如以下示例所示：

cluster_A::> storage aggregate show
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
            4.15TB    3.40TB   18% online       3 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_1root
           707.7GB   34.29GB   95% online       1 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_2_data01_mirrored
            4.15TB    4.12TB    1% online       2 node_A_2       raid_dp,
                                                                   mirror
                                                                   degraded
node_A_2_data02_unmirrored
            2.18TB    2.18TB    0% online       1 node_A_2       raid_dp,
                                                                   normal
node_A_2_root
           707.7GB   34.27GB   95% online       1 node_A_2       raid_dp,
                                                                   mirror
                                                                   degraded
cluster_A::> storage aggregate show -r -node node_A_2 !*root
Owner Node: node_A_2
 Aggregate: node_A_2_data01_mirrored (online, raid_dp, mirror degraded) (block checksums)
  Plex: /node_A_2_data01_mirrored/plex0 (online, normal, active, pool0)
   RAID Group /node_A_2_data01_mirrored/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  2.30.3                       0   BSAS    7200  827.7GB  828.0GB (normal)
     parity   2.30.4                       0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.6                       0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.8                       0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.5                       0   BSAS    7200  827.7GB  828.0GB (normal)

  Plex: /node_A_2_data01_mirrored/plex4 (offline, failed, inactive, pool1)
   RAID Group /node_A_2_data01_mirrored/plex4/rg0 (partial, none checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  FAILED                       -   -          -  827.7GB        - (failed)
     parity   FAILED                       -   -          -  827.7GB        - (failed)
     data     FAILED                       -   -          -  827.7GB        - (failed)
     data     FAILED                       -   -          -  827.7GB        - (failed)
     data     FAILED                       -   -          -  827.7GB        - (failed)

 Aggregate: node_A_2_data02_unmirrored (online, raid_dp) (block checksums)
  Plex: /node_A_2_data02_unmirrored/plex0 (online, normal, active, pool0)
   RAID Group /node_A_2_data02_unmirrored/plex0/rg0 (normal, block checksums)
                                                              Usable Physical
     Position Disk                        Pool Type     RPM     Size     Size Status
     -------- --------------------------- ---- ----- ------ -------- -------- ----------
     dparity  2.30.12                      0   BSAS    7200  827.7GB  828.0GB (normal)
     parity   2.30.22                      0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.21                      0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.20                      0   BSAS    7200  827.7GB  828.0GB (normal)
     data     2.30.14                      0   BSAS    7200  827.7GB  828.0GB (normal)
15 entries were displayed.

验证是否正在提供数据，以及所有卷是否仍处于联机状态：

vserver show -type data

network interface show -fields is-home false

volume show ！ vol0 ，！ mdv*

cluster_A::> vserver show -type data

cluster_A::> vserver show -type data
                               Admin      Operational Root
Vserver     Type    Subtype    State      State       Volume     Aggregate
----------- ------- ---------- ---------- ----------- ---------- ----------
SVM1        data    sync-source           running     SVM1_root  node_A_1_data01_mirrored
SVM2        data    sync-source	          running     SVM2_root  node_A_1_data01_mirrored

cluster_A::> network interface show -fields is-home false
There are no entries matching your query.

cluster_A::> volume show !vol0,!MDV*
Vserver   Volume       Aggregate    State      Type       Size  Available Used%
--------- ------------ ------------ ---------- ---- ---------- ---------- -----
SVM1
          SVM1_root
                       node_A_1data01_mirrored
                                    online     RW         10GB     9.50GB    5%
SVM1
          SVM1_data_vol
                       node_A_1data01_mirrored
                                    online     RW         10GB     9.49GB    5%
SVM2
          SVM2_root
                       node_A_1data01_mirrored
                                    online     RW         10GB     9.49GB    5%
SVM2
          SVM2_data_vol
                       node_A_2_data02_unmirrored
                                    online     RW          1GB    972.6MB    5%

物理启动磁盘架。

重新同步将自动启动。

验证重新同步是否已启动：

s存储聚合显示

受影响的聚合应具有 " re同步 " RAID 状态，如以下示例所示：

cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1_data01_mirrored
            4.15TB    3.40TB   18% online       3 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_1_root
           707.7GB   34.29GB   95% online       1 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_2_data01_mirrored
            4.15TB    4.12TB    1% online       2 node_A_2       raid_dp,
                                                                   resyncing
node_A_2_data02_unmirrored
            2.18TB    2.18TB    0% online       1 node_A_2       raid_dp,
                                                                   normal
node_A_2_root
           707.7GB   34.27GB   95% online       1 node_A_2       raid_dp,
                                                                   resyncing

监控聚合以确认重新同步已完成：

s存储聚合显示

受影响的聚合应具有 "`normal` " RAID 状态，如以下示例所示：

cluster_A::> storage aggregate show
cluster Aggregates:
Aggregate     Size Available Used% State   #Vols  Nodes            RAID Status
--------- -------- --------- ----- ------- ------ ---------------- ------------
node_A_1data01_mirrored
            4.15TB    3.40TB   18% online       3 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_1root
           707.7GB   34.29GB   95% online       1 node_A_1       raid_dp,
                                                                   mirrored,
                                                                   normal
node_A_2_data01_mirrored
            4.15TB    4.12TB    1% online       2 node_A_2       raid_dp,
                                                                   normal
node_A_2_data02_unmirrored
            2.18TB    2.18TB    0% online       1 node_A_2       raid_dp,
                                                                   normal
node_A_2_root
           707.7GB   34.27GB   95% online       1 node_A_2       raid_dp,
                                                                   resyncing

测试 MetroCluster 配置

Creating your file...

验证协商的切换

验证修复和手动切回

丢失一个 FC-SAS 网桥

在电源线中断后验证操作

在丢失一个存储架后验证操作