System verifications to perform before you revert an ONTAP cluster

12/17/2024 Contributors

PDFs

Before you revert an ONTAP cluster, you should verify your cluster health, storage health, and system time. You should also verify that no jobs are running on the cluster.

Verify cluster health

Before you revert an ONTAP cluster, you should verify that the nodes are healthy and eligible to participate in the cluster, and that the cluster is in quorum.

Steps

Verify that the nodes in the cluster are online and are eligible to participate in the cluster:
```
cluster show
```
Cli
Copy
In this example, all nodes are healthy and eligible to participate in the cluster.
```
cluster1::> cluster show
Node                  Health  Eligibility
--------------------- ------- ------------
node0                 true    true
node1                 true    true
```
If any node is unhealthy or ineligible, check EMS logs for errors and take corrective action.
Set the privilege level to advanced:
```
set -privilege advanced
```
Cli
Copy
Enter y to continue.

Verify the configuration details for each RDB process.

The relational database epoch and database epochs should match for each node.

The per-ring quorum master should be the same for all nodes.

Note that each ring might have a different quorum master.

To display this RDB process… Enter this command…

To display this RDB process…	Enter this command…
Management application	`cluster ring show -unitname mgmt` Cli Copy
Volume location database	`cluster ring show -unitname vldb` Cli Copy
Virtual-Interface manager	`cluster ring show -unitname vifmgr` Cli Copy
SAN management daemon	`cluster ring show -unitname bcomd` Cli Copy

Management application

cluster ring show -unitname mgmt

Volume location database

cluster ring show -unitname vldb

Virtual-Interface manager

cluster ring show -unitname vifmgr

SAN management daemon

cluster ring show -unitname bcomd

This example shows the volume location database process:

cluster1::*> cluster ring show -unitname vldb
Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
--------- -------- -------- -------- -------- --------- ---------
node0     vldb     154      154      14847    node0     master
node1     vldb     154      154      14847    node0     secondary
node2     vldb     154      154      14847    node0     secondary
node3     vldb     154      154      14847    node0     secondary
4 entries were displayed.

Return to the admin privilege level:
```
set -privilege admin
```
Cli
Copy

If you are operating in a SAN environment, verify that each node is in a SAN quorum:

event log show  -severity informational -message-name scsiblade.*

The most recent scsiblade event message for each node should indicate that the scsi-blade is in quorum.

cluster1::*> event log show  -severity informational -message-name scsiblade.*
Time             Node       Severity       Event
---------------  ---------- -------------- ---------------------------
MM/DD/YYYY TIME  node0      INFORMATIONAL  scsiblade.in.quorum: The scsi-blade ...
MM/DD/YYYY TIME  node1      INFORMATIONAL  scsiblade.in.quorum: The scsi-blade ...

Related information

System administration

Verify storage health

Before you revert an ONTAP cluster, you should verify the status of your disks, aggregates, and volumes.

Steps

Verify disk status:

To check for… Do this…

To check for…	Do this…
Broken disks	Display any broken disks: `storage disk show -state broken` Cli Copy Remove or replace any broken disks.
Disks undergoing maintenance or reconstruction	Display any disks in maintenance, pending, or reconstructing states: `storage disk show -state maintenance\|pending\|reconstructing` Cli Copy Wait for the maintenance or reconstruction operation to finish before proceeding.

Broken disks

Display any broken disks:
```
storage disk show -state broken
```
Cli
Copy
Remove or replace any broken disks.

Disks undergoing maintenance or reconstruction

Display any disks in maintenance, pending, or reconstructing states:
```
storage disk show -state maintenance|pending|reconstructing
```
Cli
Copy
Wait for the maintenance or reconstruction operation to finish before proceeding.

Verify that all aggregates are online by displaying the state of physical and logical storage, including storage aggregates:
```
storage aggregate show -state !online
```
Cli
Copy
This command displays the aggregates that are not online. All aggregates must be online before and after performing a major upgrade or reversion.
```
cluster1::> storage aggregate show -state !online
There are no entries matching your query.
```
Verify that all volumes are online by displaying any volumes that are not online:
```
volume show -state !online
```
Cli
Copy
All volumes must be online before and after performing a major upgrade or reversion.
```
cluster1::> volume show -state !online
There are no entries matching your query.
```
Verify that there are no inconsistent volumes:
```
volume show -is-inconsistent true
```
Cli
Copy
See the Knowledge Base article Volume Showing WAFL Inconsistent on how to address the inconsistent volumes.

Related information

Disk and aggregate management

Verify the system time

Before you revert an ONTAP cluster, you should verify that NTP is configured, and that the time is synchronized across the cluster.

Steps

Verify that the cluster is associated with an NTP server:
```
cluster time-service ntp server show
```
Cli
Copy

Verify that each node has the same date and time:

cluster date show

cluster1::> cluster date show
Node      Date                Timezone
--------- ------------------- -------------------------
node0     4/6/2013 20:54:38   GMT
node1     4/6/2013 20:54:38   GMT
node2     4/6/2013 20:54:38   GMT
node3     4/6/2013 20:54:38   GMT
4 entries were displayed.

Verify that no jobs are running

Before you revert an ONTAP cluster, you should verify the status of cluster jobs. If any aggregate, volume, NDMP (dump or restore), or snapshot jobs (such as create, delete, move, modify, replicate, and mount jobs) are running or queued, you should allow the jobs to finish successfully or stop the queued entries.

Steps

Review the list of any running or queued aggregate, volume, or snapshot jobs:

job show

In this example, there are two jobs queued:

cluster1::> job show
                            Owning
Job ID Name                 Vserver    Node           State
------ -------------------- ---------- -------------- ----------
8629   Vol Reaper           cluster1   -              Queued
       Description: Vol Reaper Job
8630   Certificate Expiry Check
                            cluster1   -              Queued
       Description: Certificate Expiry Check

Delete any running or queued aggregate, volume, or snapshot jobs:
```
job delete -id <job_id>
```
Cli
Copy

Verify that no aggregate, volume, or snapshot jobs are running or queued:

job show

In this example, all running and queued jobs have been deleted:

cluster1::> job show
                            Owning
Job ID Name                 Vserver    Node           State
------ -------------------- ---------- -------------- ----------
9944   SnapMirrorDaemon_7_2147484678
                            cluster1   node1          Dormant
       Description: Snapmirror Daemon for 7_2147484678
18377  SnapMirror Service Job
                            cluster1   node0          Dormant
       Description: SnapMirror Service Job
2 entries were displayed

System verifications to perform before you revert an ONTAP cluster

Creating your file...

Verify cluster health

Verify storage health

Verify the system time

Verify that no jobs are running