Skip to main content
A newer release of this product is available.

Run compute node health checks prior to upgrading compute firmware

Contributors netapp-pcarriga netapp-dbagwell

You must run health checks prior to upgrading compute firmware to ensure all compute nodes in your cluster are ready to be upgraded. Compute node health checks can only be run against compute clusters of one or more managed NetApp HCI compute nodes.

What you'll need
  • Management services: You have updated to the latest management services bundle (2.11 or later).

  • Management node: You are running management node 11.3 or later.

  • Element software: Your storage cluster is running NetApp Element software 11.3 or later.

  • End User License Agreement (EULA): Beginning with management services 2.20.69, you must accept and save the EULA before using the NetApp Hybrid Cloud Control UI or API to run compute node health checks:

    1. Open the IP address of the management node in a web browser:

      https://<ManagementNodeIP>
    2. Log in to NetApp Hybrid Cloud Control by providing the storage cluster administrator credentials.

    3. Select Upgrade near the top right of the interface.

    4. The EULA pops up. Scroll down, select I accept for current and all future updates, and select Save.

Health check options

You can run health checks using NetApp Hybrid Cloud Control (HCC) UI or HCC API:

You can also find out more about compute node health checks that are run by the service:

Use NetApp Hybrid Cloud Control to run compute node health checks prior to upgrading firmware

Using NetApp Hybrid Cloud Control (HCC), you can verify that a compute node is ready for a firmware upgrade.

Note If you have multiple two-node storage cluster configurations, each within their own vCenter, Witness Nodes health checks might not report accurately. Therefore, when you are ready to upgrade ESXi hosts, you must only shut down the Witness Node on the ESXi host that is being upgraded. You must ensure that you always have one Witness Node running in your NetApp HCI installation by powering off the Witness Nodes in an alternate fashion.
Steps
  1. Open the IP address of the management node in a web browser:

    https://<ManagementNodeIP>/hcc
  2. Log in to NetApp Hybrid Cloud Control by providing the storage cluster administrator credentials.

  3. Select Upgrade near the top right of the interface.

  4. On the Upgrades page, select the Compute firmware tab.

  5. Select the health check icon for the cluster you want to check for upgrade readiness.

  6. On the Compute Health Check page, select Run Health Check.

  7. If there are issues, the page provides a report. Do the following:

    1. Go to the specific KB article listed for each issue or perform the specified remedy.

    2. If a KB is specified, complete the process described in the relevant KB article.

    3. After you have resolved cluster issues, select Re-Run Health Check.

After the health check completes without errors, the compute nodes in the cluster are ready to upgrade. See Update compute node firmware to proceed.

Use API to run compute node health checks prior to upgrading firmware

You can use REST API to verify that compute nodes in a cluster are ready to be upgraded. The health check verifies that there are no obstacles to upgrading, such as ESXi host issues or other vSphere issues. You will need to run compute node health checks for each compute cluster in your environment.

Steps
  1. Locate the controller ID and cluster ID:

    1. Open the inventory service REST API UI on the management node:

      https://<ManagementNodeIP>/inventory/1/
    2. Select Authorize and complete the following:

      1. Enter the cluster user name and password.

      2. Enter the client ID as mnode-client if the value is not already populated.

      3. Select Authorize to begin a session.

    3. From the REST API UI, select GET ​/installations.

    4. Select Try it out.

    5. Select Execute.

    6. From the code 200 response body, copy the "id" for the installation you plan to use for health checks.

    7. From the REST API UI, select GET ​/installations​/{id}.

    8. Select Try it out.

    9. Enter the installation ID.

    10. Select Execute.

    11. From the code 200 response body, copy the IDs for each of the following:

      1. The cluster ID ("clusterID")

      2. A controller ID ("controllerId")

        {
          "_links": {
            "collection": "https://10.117.187.199/inventory/1/installations",
            "self": "https://10.117.187.199/inventory/1/installations/xx94f6f0-12a6-412f-8b5e-4cf2z58329x0"
          },
          "compute": {
            "errors": [],
            "inventory": {
              "clusters": [
                {
                  "clusterId": "domain-1",
                  "controllerId": "abc12c3a-aa87-4e33-9f94-xx588c2cdcf6",
                  "datacenterName": "NetApp-HCI-Datacenter-01",
                  "installationId": "xx94f6f0-12a6-412f-8b5e-4cf2z58329x0",
                  "installationName": "test-nde-mnode",
                  "inventoryType": "managed",
                  "name": "NetApp-HCI-Cluster-01",
                  "summary": {
                    "nodeCount": 2,
                    "virtualMachineCount": 2
                  }
                }
              ],
  2. Run health checks on the compute nodes in the cluster:

    1. Open the compute service REST API UI on the management node:

      https://<ManagementNodeIP>/vcenter/1/
    2. Select Authorize and complete the following:

      1. Enter the cluster user name and password.

      2. Enter the client ID as mnode-client if the value is not already populated.

      3. Select Authorize to begin a session.

    3. Select POST /compute​/{CONTROLLER_ID}​/health-checks.

    4. Select Try it out.

    5. Enter the "controllerId" you copied from the previous step in the Controller_ID parameter field.

    6. In the payload, enter the "clusterId" that you copied from the previous step as the "cluster" value and remove the "nodes" parameter.

      {
        "cluster": "domain-1"
      }
    7. Select Execute to run a health check on the cluster.

      The code 200 response gives a "resourceLink" URL with the task ID appended that is needed to confirm the health check results.

      {
        "resourceLink": "https://10.117.150.84/vcenter/1/compute/tasks/[This is the task ID for health check task results]",
        "serviceName": "vcenter-v2-svc",
        "taskId": "ab12c345-06f7-42d7-b87c-7x64x56x321x",
        "taskName": "VCenter service health checks"
      }
    8. Copy the task ID portion of the "resourceLink" URL to verify the task result.

  3. Verify the result of the health checks:

    1. Return to the compute service REST API UI on the management node:

      https://<ManagementNodeIP>/vcenter/1/
    2. Select GET /compute​/tasks/{task_id}.

    3. Select Try it out.

    4. Enter the task ID portion of the "resourceLink" URL from the POST /compute​/{CONTROLLER_ID}​/health-checks code 200 response in the task_id parameter field.

    5. Select Execute.

    6. If the status returned indicates that there were problems regarding compute node health, do the following:

      1. Go to the specific KB article (KbLink) listed for each issue or perform the specified remedy.

      2. If a KB is specified, complete the process described in the relevant KB article.

      3. After you have resolved cluster issues, run POST /compute​/{CONTROLLER_ID}​/health-checks again (see step 2).

If health checks complete without issues, the response code 200 indicates a successful result.

Compute node health checks made by the service

Compute health checks, whether performed by HCC or API methods, make the following checks per node. Depending on your environment, some of these checks might be skipped. You should re-run health checks after resolving any detected issues.

Check description Node/cluster Action needed to resolve Knowledgebase article with procedure

Is DRS enabled and fully automated?

Cluster

Turn on DRS and make sure it is fully automated.

See this KB. NOTE: If you have standard licensing, put the ESXi host into maintenance mode and ignore this health check failure warning.

Is DPM disabled in vSphere?

Cluster

Turn off Distributed Power Management.

See this KB.

Is HA admission control disabled in vSphere?

Cluster

Turn off HA admission control.

See this KB.

Is FT enabled for a VM on a host in the cluster?

Node

Suspend Fault Tolerance on any affected virtual machines.

See this KB.

Are there critical alarms in vCenter for the cluster?

Cluster

Launch vSphere and resolve and/or acknowledge any alerts before proceeding.

No KB needed to resolve issue.

Are there generic/global informational alerts in vCenter?

Cluster

Launch vSphere and resolve and/or acknowledge any alerts before proceeding.

No KB needed to resolve issue.

Are management services up to date?

HCI system

You must update management services before you perform an upgrade or run pre-upgrade health checks.

No KB needed to resolve issue. See this article for more information.

Are there errors on the current ESXi node in vSphere?

Node

Launch vSphere and resolve and/or acknowledge any alerts before proceeding.

No KB needed to resolve issue.

Is virtual media mounted to a VM on a host in the cluster?

Node

Unmount all virtual media disks (CD/DVD/floppy) from the VMs.

No KB needed to resolve issue.

Is BMC version the minimum required version that has RedFish support?

Node

Manually update your BMC firmware.

No KB needed to resolve issue.

Is ESXi host up and running?

Node

Start your ESXi host.

No KB needed to resolve issue.

Do any virtual machines reside on local ESXi storage?

Node/VM

Remove or migrate local storage attached to virtual machines.

No KB needed to resolve issue.

Is BMC up and running?

Node

Power on your BMC and ensure it is connected to a network this management node can reach.

No KB needed to resolve issue.

Are there partner ESXi host(s) available?

Node

Make one or more ESXi host(s) in cluster available (not in maintenance mode) to migrate virtual machines.

No KB needed to resolve issue.

Are you able to connect with BMC via IPMI protocol?

Node

Enable IPMI protocol on Baseboard Management Controller (BMC).

No KB needed to resolve issue.

Is ESXi host mapped to hardware host (BMC) correctly?

Node

The ESXi host is not mapped to the Baseboard Management Controller (BMC) correctly. Correct the mapping between ESXi host and hardware host.

No KB needed to resolve issue. See this article for more information.

What is the status of the Witness Nodes in the cluster? None of the witness nodes identified are up and running.

Node

A Witness Node is not running on an alternate ESXi host. Power on the Witness Node on an alternate ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times.

See this KB

What is the status of the Witness Nodes in the cluster? The witness node is up and running on this ESXi host and the alternate witness node is not up and running.

Node

A Witness Node is not running on an alternate ESXi host. Power on the Witness Node on an alternate ESXi host. When you are ready to upgrade this ESXi host, shut down the witness node running on this ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times.

See this KB

What is the status of the Witness Nodes in the cluster? Witness node is up and running on this ESXi host and the alternate node is up but is running on the same ESXi host.

Node

Both Witness Nodes are running on this ESXi host. Relocate one Witness Node to an alternate ESXi host. When you are ready to upgrade this ESXi host, shut down the Witness Node remaining on this ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times.

See this KB

What is the status of the Witness Nodes in the cluster? Witness node is up and running on this ESXi host and the alternate witness node is up and running on another ESXi host.

Node

A Witness Node is running locally on this ESXi host. When you are ready to upgrade this ESXi host, shut down the Witness Node only on this ESXi host and re-run the health check. One Witness Node must be running in the HCI installation at all times.

See this KB

Find more information