Run Element storage health checks prior to upgrading storage
You must run health checks prior to upgrading Element storage to ensure all storage nodes in your cluster are ready for the next Element storage upgrade.
-
Management services: You have updated to the latest management services bundle (2.10.27 or later).
You must upgrade to the latest management services bundle before upgrading your Element software. -
Management node: You are running management node 11.3 or later.
-
Element software: Your cluster version is running NetApp Element software 11.3 or later.
-
End User License Agreement (EULA): Beginning with management services 2.20.69, you must accept and save the EULA before using the NetApp Hybrid Cloud Control UI or API to run Element storage health checks:
-
Open the IP address of the management node in a web browser:
https://<ManagementNodeIP>
-
Log in to NetApp Hybrid Cloud Control by providing the storage cluster administrator credentials.
-
Select Upgrade near the top right of the interface.
-
The EULA pops up. Scroll down, select I accept for current and all future updates, and select Save.
-
You can run health checks using the NetApp Hybrid Cloud Control UI or the NetApp Hybrid Cloud Control API:
You can also find out more about storage health checks that are run by the service:
Use NetApp Hybrid Cloud Control to run Element storage health checks prior to upgrading storage
Using NetApp Hybrid Cloud Control (HCC), you can verify that a storage cluster is ready to be upgraded.
-
Open the IP address of the management node in a web browser:
https://<ManagementNodeIP>
-
Log in to NetApp Hybrid Cloud Control by providing the storage cluster administrator credentials.
-
Select Upgrade near the top right of the interface.
-
On the Upgrades page, select the Storage tab.
-
Select the health check for the cluster you want to check for upgrade readiness.
-
On the Storage Health Check page, select Run Health Check.
-
If there are issues, do the following:
-
Go to the specific KB article listed for each issue or perform the specified remedy.
-
If a KB is specified, complete the process described in the relevant KB article.
-
After you have resolved cluster issues, select Re-Run Health Check.
-
After the health check completes without errors, the storage cluster is ready to upgrade. See storage node upgrade instructions to proceed.
Use API to run Element storage health checks prior to upgrading storage
You can use REST API to verify that a storage cluster is ready to be upgraded. The health check verifies that there are no obstacles to upgrading, such as pending nodes, disk space issues, and cluster faults.
-
Locate the storage cluster ID:
-
Open the management node REST API UI on the management node:
https://<ManagementNodeIP>/mnode
-
Select Authorize and complete the following:
-
Enter the cluster user name and password.
-
Enter the client ID as
mnode-client
if the value is not already populated. -
Select Authorize to begin a session.
-
Close the authorization window.
-
-
From the REST API UI, select
GET /assets
. -
Select Try it out.
-
Select Execute.
-
From the response, copy the
"id"
from the"storage"
section of the cluster you intend to check for upgrade readiness.Do not use the "parent"
value in this section because this is the management node’s ID, not the storage cluster’s ID."config": {}, "credentialid": "12bbb2b2-f1be-123b-1234-12c3d4bc123e", "host_name": "SF_DEMO", "id": "12cc3a45-e6e7-8d91-a2bb-0bdb3456b789", "ip": "10.123.12.12", "parent": "d123ec42-456e-8912-ad3e-4bd56f4a789a", "sshcredentialid": null, "ssl_certificate": null
-
-
Run health checks on the storage cluster:
-
Open the storage REST API UI on the management node:
https://<ManagementNodeIP>/storage/1/
-
Select Authorize and complete the following:
-
Enter the cluster user name and password.
-
Enter the client ID as
mnode-client
if the value is not already populated. -
Select Authorize to begin a session.
-
Close the authorization window.
-
-
Select POST /health-checks.
-
Select Try it out.
-
In the parameter field, enter the storage cluster ID obtained in Step 1.
{ "config": {}, "storageId": "123a45b6-1a2b-12a3-1234-1a2b34c567d8" }
-
Select Execute to run a health check on the specified storage cluster.
The response should indicate state as
initializing
:{ "_links": { "collection": "https://10.117.149.231/storage/1/health-checks", "log": "https://10.117.149.231/storage/1/health-checks/358f073f-896e-4751-ab7b-ccbb5f61f9fc/log", "self": "https://10.117.149.231/storage/1/health-checks/358f073f-896e-4751-ab7b-ccbb5f61f9fc" }, "config": {}, "dateCompleted": null, "dateCreated": "2020-02-21T22:11:15.476937+00:00", "healthCheckId": "358f073f-896e-4751-ab7b-ccbb5f61f9fc", "state": "initializing", "status": null, "storageId": "c6d124b2-396a-4417-8a47-df10d647f4ab", "taskId": "73f4df64-bda5-42c1-9074-b4e7843dbb77" }
-
Copy the
healthCheckID
that is part of response.
-
-
Verify the results of the health checks:
-
Select GET /health-checks/{healthCheckId}.
-
Select Try it out.
-
Enter the health check ID in the parameter field.
-
Select Execute.
-
Scroll to the bottom of the response body.
If all health checks are successful, the return is similar to the following example:
"message": "All checks completed successfully.", "percent": 100, "timestamp": "2020-03-06T00:03:16.321621Z"
-
-
If the
message
return indicates that there were problems regarding cluster health, do the following:-
Select GET /health-checks/{healthCheckId}/log
-
Select Try it out.
-
Enter the health check ID in the parameter field.
-
Select Execute.
-
Review any specific errors and obtain their associated KB article links.
-
Go to the specific KB article listed for each issue or perform the specified remedy.
-
If a KB is specified, complete the process described in the relevant KB article.
-
After you have resolved cluster issues, run GET /health-checks/{healthCheckId}/log again.
-
Storage health checks made by the service
Storage health checks make the following checks per cluster.
Check Name | Node/Cluster | Description |
---|---|---|
check_async_results |
Cluster |
Verifies that the number of asynchronous results in the database is below a threshold number. |
check_cluster_faults |
Cluster |
Verifies that there are no upgrade blocking cluster faults (as defined in Element source). |
check_upload_speed |
Node |
Measures the upload speed between the storage node and the management node. |
connection_speed_check |
Node |
Verifies that nodes have connectivity to the management node serving upgrade packages and estimates connection speed. |
check_cores |
Node |
Checks for kernel crash dump and core files on the node. The check fails for any crashes in a recent time period (threshold 7 days). |
check_root_disk_space |
Node |
Verifies the root file system has sufficient free space to perform an upgrade. |
check_var_log_disk_space |
Node |
Verifies that |
check_pending_nodes |
Cluster |
Verifies that there are no pending nodes on the cluster. |