Run compute node health checks prior to upgrading compute firmware
You must run health checks prior to upgrading compute firmware to ensure all compute nodes in your cluster are ready to be upgraded. Compute node health checks can only be run against compute clusters of one or more managed NetApp HCI compute nodes.
-
Management services: You have updated to the latest management services bundle (2.11 or later).
-
Management node: You are running management node 11.3 or later.
-
Element software: Your storage cluster is running NetApp Element software 11.3 or later.
-
End User License Agreement (EULA): Beginning with management services 2.20.69, you must accept and save the EULA before using the NetApp Hybrid Cloud Control UI or API to run compute node health checks:
-
Open the IP address of the management node in a web browser:
https://<ManagementNodeIP>
-
Log in to NetApp Hybrid Cloud Control by providing the storage cluster administrator credentials.
-
Select Upgrade near the top right of the interface.
-
The EULA pops up. Scroll down, select I accept for current and all future updates, and select Save.
-
You can run health checks using NetApp Hybrid Cloud Control (HCC) UI or HCC API:
You can also find out more about compute node health checks that are run by the service:
Use NetApp Hybrid Cloud Control to run compute node health checks prior to upgrading firmware
Using NetApp Hybrid Cloud Control (HCC), you can verify that a compute node is ready for a firmware upgrade.
If you have multiple two-node storage cluster configurations, each within their own vCenter, Witness Nodes health checks might not report accurately. Therefore, when you are ready to upgrade ESXi hosts, you must only shut down the Witness Node on the ESXi host that is being upgraded. You must ensure that you always have one Witness Node running in your NetApp HCI installation by powering off the Witness Nodes in an alternate fashion. |
-
Open the IP address of the management node in a web browser:
https://<ManagementNodeIP>/hcc
-
Log in to NetApp Hybrid Cloud Control by providing the storage cluster administrator credentials.
-
Select Upgrade near the top right of the interface.
-
On the Upgrades page, select the Compute firmware tab.
-
Select the health check for the cluster you want to check for upgrade readiness.
-
On the Compute Health Check page, select Run Health Check.
-
If there are issues, the page provides a report. Do the following:
-
Go to the specific KB article listed for each issue or perform the specified remedy.
-
If a KB is specified, complete the process described in the relevant KB article.
-
After you have resolved cluster issues, select Re-Run Health Check.
-
After the health check completes without errors, the compute nodes in the cluster are ready to upgrade. See Update compute node firmware to proceed.
Use API to run compute node health checks prior to upgrading firmware
You can use REST API to verify that compute nodes in a cluster are ready to be upgraded. The health check verifies that there are no obstacles to upgrading, such as ESXi host issues or other vSphere issues. You will need to run compute node health checks for each compute cluster in your environment.
-
Locate the controller ID and cluster ID:
-
Open the inventory service REST API UI on the management node:
https://<ManagementNodeIP>/inventory/1/
-
Select Authorize and complete the following:
-
Enter the cluster user name and password.
-
Enter the client ID as
mnode-client
if the value is not already populated. -
Select Authorize to begin a session.
-
-
From the REST API UI, select GET /installations.
-
Select Try it out.
-
Select Execute.
-
From the code 200 response body, copy the
"id"
for the installation you plan to use for health checks. -
From the REST API UI, select GET /installations/{id}.
-
Select Try it out.
-
Enter the installation ID.
-
Select Execute.
-
From the code 200 response body, copy the IDs for each of the following:
-
The cluster ID (
"clusterID"
) -
A controller ID (
"controllerId"
){ "_links": { "collection": "https://10.117.187.199/inventory/1/installations", "self": "https://10.117.187.199/inventory/1/installations/xx94f6f0-12a6-412f-8b5e-4cf2z58329x0" }, "compute": { "errors": [], "inventory": { "clusters": [ { "clusterId": "domain-1", "controllerId": "abc12c3a-aa87-4e33-9f94-xx588c2cdcf6", "datacenterName": "NetApp-HCI-Datacenter-01", "installationId": "xx94f6f0-12a6-412f-8b5e-4cf2z58329x0", "installationName": "test-nde-mnode", "inventoryType": "managed", "name": "NetApp-HCI-Cluster-01", "summary": { "nodeCount": 2, "virtualMachineCount": 2 } } ],
-
-
-
Run health checks on the compute nodes in the cluster:
-
Open the compute service REST API UI on the management node:
https://<ManagementNodeIP>/vcenter/1/
-
Select Authorize and complete the following:
-
Enter the cluster user name and password.
-
Enter the client ID as
mnode-client
if the value is not already populated. -
Select Authorize to begin a session.
-
-
Select POST /compute/{CONTROLLER_ID}/health-checks.
-
Select Try it out.
-
Enter the
"controllerId"
you copied from the previous step in the Controller_ID parameter field. -
In the payload, enter the
"clusterId"
that you copied from the previous step as the"cluster"
value and remove the"nodes"
parameter.{ "cluster": "domain-1" }
-
Select Execute to run a health check on the cluster.
The code 200 response gives a
"resourceLink"
URL with the task ID appended that is needed to confirm the health check results.{ "resourceLink": "https://10.117.150.84/vcenter/1/compute/tasks/[This is the task ID for health check task results]", "serviceName": "vcenter-v2-svc", "taskId": "ab12c345-06f7-42d7-b87c-7x64x56x321x", "taskName": "VCenter service health checks" }
-
Copy the task ID portion of the
"resourceLink"
URL to verify the task result.
-
-
Verify the result of the health checks:
-
Return to the compute service REST API UI on the management node:
https://<ManagementNodeIP>/vcenter/1/
-
Select GET /compute/tasks/{task_id}.
-
Select Try it out.
-
Enter the task ID portion of the
"resourceLink"
URL from the POST /compute/{CONTROLLER_ID}/health-checks code 200 response in thetask_id
parameter field. -
Select Execute.
-
If the
status
returned indicates that there were problems regarding compute node health, do the following:-
Go to the specific KB article (
KbLink
) listed for each issue or perform the specified remedy. -
If a KB is specified, complete the process described in the relevant KB article.
-
After you have resolved cluster issues, run POST /compute/{CONTROLLER_ID}/health-checks again (see step 2).
-
-
If health checks complete without issues, the response code 200 indicates a successful result.
Compute node health checks made by the service
Compute health checks, whether performed by HCC or API methods, make the following checks per node. Depending on your environment, some of these checks might be skipped. You should re-run health checks after resolving any detected issues.
Check description | Node/cluster | Action needed to resolve | Knowledgebase article with procedure |
---|---|---|---|
Is DRS enabled and fully automated? |
Cluster |
Turn on DRS and make sure it is fully automated. |
See this KB. NOTE: If you have standard licensing, put the ESXi host into maintenance mode and ignore this health check failure warning. |
Is DPM disabled in vSphere? |
Cluster |
Turn off Distributed Power Management. |
|
Is HA admission control disabled in vSphere? |
Cluster |
Turn off HA admission control. |
|
Is FT enabled for a VM on a host in the cluster? |
Node |
Suspend Fault Tolerance on any affected virtual machines. |
|
Are there critical alarms in vCenter for the cluster? |
Cluster |
Launch vSphere and resolve and/or acknowledge any alerts before proceeding. |
No KB needed to resolve issue. |
Are there generic/global informational alerts in vCenter? |
Cluster |
Launch vSphere and resolve and/or acknowledge any alerts before proceeding. |
No KB needed to resolve issue. |
Are management services up to date? |
HCI system |
You must update management services before you perform an upgrade or run pre-upgrade health checks. |
No KB needed to resolve issue. See this article for more information. |
Are there errors on the current ESXi node in vSphere? |
Node |
Launch vSphere and resolve and/or acknowledge any alerts before proceeding. |
No KB needed to resolve issue. |
Is virtual media mounted to a VM on a host in the cluster? |
Node |
Unmount all virtual media disks (CD/DVD/floppy) from the VMs. |
No KB needed to resolve issue. |
Is BMC version the minimum required version that has RedFish support? |
Node |
Manually update your BMC firmware. |
No KB needed to resolve issue. |
Is ESXi host up and running? |
Node |
Start your ESXi host. |
No KB needed to resolve issue. |
Do any virtual machines reside on local ESXi storage? |
Node/VM |
Remove or migrate local storage attached to virtual machines. |
No KB needed to resolve issue. |
Is BMC up and running? |
Node |
Power on your BMC and ensure it is connected to a network this management node can reach. |
No KB needed to resolve issue. |
Are there partner ESXi host(s) available? |
Node |
Make one or more ESXi host(s) in cluster available (not in maintenance mode) to migrate virtual machines. |
No KB needed to resolve issue. |
Are you able to connect with BMC via IPMI protocol? |
Node |
Enable IPMI protocol on Baseboard Management Controller (BMC). |
No KB needed to resolve issue. |
Is ESXi host mapped to hardware host (BMC) correctly? |
Node |
The ESXi host is not mapped to the Baseboard Management Controller (BMC) correctly. Correct the mapping between ESXi host and hardware host. |
No KB needed to resolve issue. See this article for more information. |
What is the status of the Witness Nodes in the cluster? None of the witness nodes identified are up and running. |
Node |
A Witness Node is not running on an alternate ESXi host. Power on the Witness Node on an alternate ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times. |
|
What is the status of the Witness Nodes in the cluster? The witness node is up and running on this ESXi host and the alternate witness node is not up and running. |
Node |
A Witness Node is not running on an alternate ESXi host. Power on the Witness Node on an alternate ESXi host. When you are ready to upgrade this ESXi host, shut down the witness node running on this ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times. |
|
What is the status of the Witness Nodes in the cluster? Witness node is up and running on this ESXi host and the alternate node is up but is running on the same ESXi host. |
Node |
Both Witness Nodes are running on this ESXi host. Relocate one Witness Node to an alternate ESXi host. When you are ready to upgrade this ESXi host, shut down the Witness Node remaining on this ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times. |
|
What is the status of the Witness Nodes in the cluster? Witness node is up and running on this ESXi host and the alternate witness node is up and running on another ESXi host. |
Node |
A Witness Node is running locally on this ESXi host. When you are ready to upgrade this ESXi host, shut down the Witness Node only on this ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times. |