English

Replace H610C and H615C nodes

Contributors netapp-amitha Download PDF of this page

You should replace a chassis to repair compute node failures related to the CPU, the motherboard, or if it does not power on. If you have a faulty DIMM in your H610C compute node that runs NetApp HCI Bootstrap OS version 1.6 or later, you can replace the DIMM and do not have to replace the chassis. For H615C nodes, you need not replace the chassis if a DIMM fails; you can replace only the failed DIMM.

For H610C and H615C, the terms "node" and "chassis" are used interchangeably, because the node and chassis are not separate components.
What you’ll need
  • You have verified that the node has failed.

  • You have a replacement chassis.
    To order a replacement, you should contact NetApp Support.

  • You have an electrostatic discharge (ESD) wristband, or you have taken other antistatic protection.

  • You have labeled each cable that is connected to the chassis.

About this task

Alarms in the VMware vSphere Web Client alert you when a host fails. You must match the serial number of the failed host from the VMware vSphere Web Client with the serial number on the sticker at the back of the node.

Steps overview

Here is a high-level overview of the steps in this procedure:
Prepare to replace the node
Replace the node
Install the GPU drivers
Add the node to the cluster

Prepare to replace the node

Before you replace the node, you should migrate the virtual machines (VMs) hosted on the node to an available host, and remove the node from the cluster. You should get details about the node, such as serial number and networking information.

In the case of component failures where the node is still online and functioning, for example, a dual inline memory module (DIMM) failure, you should remove the drives from the cluster before you remove the failed node.
Steps
  1. In the VMware vSphere Web Client, perform the steps to migrate the VMs to another available host.

    See the VMware documentation for the migration steps.
  2. Select the failed node, and select Monitor > Hardware Status > Sensors.

  3. Make a note of the serial number of the failed node. The following screenshot is only an example:

    Shows the serial number of the failed node in VMware vSphere Web Client.

    You need the serial number to identify the chassis by matching the number that you noted with the serial number on the sticker at the back of the node.

  4. Right-click the failed node and select Connection > Disconnect.

  5. Select Yes to confirm the action.

  6. Right-click the failed node and select Remove from Inventory.

  7. Click Yes to confirm the action.

Replace the node

After you remove the failed node from the cluster, you can remove the failed chassis, and install the replacement chassis.

Ensure that you have antistatic protection before you perform the steps here.
Steps
  1. Unpack the new chassis, and set it on a level surface.
    Keep the packaging material for when you return the failed chassis to NetApp.

  2. Label each cable that is inserted at the back of the chassis that you are going to remove.
    After you install the new chassis, you must insert the cables back into the original ports.

  3. Disconnect all the cables from the back of the chassis.

  4. Remove the chassis by unscrewing the thumbscrews on the mounting ears.
    You must package and return the failed chassis to NetApp.

  5. Only for H615C. Remove the DIMMs from the chassis. You will insert these DIMMs in the replacement chassis.

  6. Remove the two power supply units on either side of the chassis.
    You can use these power supply units in the new chassis.

  7. Optional: Remove the rails and install the new rails that were shipped with your replacement chassis.
    If you choose to reuse the existing rails, you can skip this step.

  8. Slide the replacement chassis on to the rails.

    Ensure that you do not use excessive force when sliding the chassis on to the rails.
  9. Place the chassis that you removed on a level surface.
    You should package it and return it to NetApp.

  10. Replace the power supply units.

  11. Only for H615C. Insert the DIMMs that you removed from the failed chassis, and insert them in the replacement chassis.

  12. Reconnect the cables to the ports from which you originally disconnected them.
    The labels you had added on the cables when you disconnected them will help guide you.

    If the airflow vents at the rear of the chassis are blocked by cables or labels, it can lead to premature component failures due to overheating.
    Do not force the cables into the ports; you might damage the cables, ports, or both.
  13. Power on the chassis.

Install the GPU drivers

Compute nodes with NVIDIA graphics processing units (GPUs), like the H610C node, need the NVIDIA software drivers installed in VMware ESXi so that they can take advantage of the increased processing power.

Steps
  1. Open a browser and browse to the NVIDIA licensing portal at the following URL:
    https://nvid.nvidia.com/dashboard/

  2. Download one of the following driver packages to your computer, depending on your environment:

    vSphere version Driver package

    vSphere 6.0

    NVIDIA-GRID-vSphere-6.0-390.94-390.96-392.05.zip

    vSphere 6.5

    NVIDIA-GRID-vSphere-6.5-410.92-410.91-412.16.zip

    vSphere 6.7

    NVIDIA-GRID-vSphere-6.7-410.92-410.91-412.16.zip

  3. Extract the driver package on your computer.
    The resulting .VIB file is the uncompressed driver file.

  4. Copy the .VIB driver file from your computer to ESXi running on the compute node. The following example commands for each version assume that the driver is located in the $HOME/NVIDIA/ESX6.x/ directory on the management host. The SCP utility is readily available in most Linux distributions, or available as a downloadable utility for all versions of Windows:

    Option Description

    ESXi 6.0

    scp $HOME/NVIDIA/ESX6.0/NVIDIA**.vib root@<ESXi_IP_ADDR>:/.

    ESXi 6.5

    scp $HOME/NVIDIA/ESX6.5/NVIDIA**.vib root@<ESXi_IP_ADDR>:/.

    ESXi 6.7

    scp $HOME/NVIDIA/ESX6.7/NVIDIA**.vib root@<ESXi_IP_ADDR>:/.

  5. Use the following steps to log in as root to the ESXi host and install the NVIDIA vGPU manager in ESXi.

    1. Run the following command to log in to the ESXi host as the root user:
      ssh root@<ESXi_IP_ADDRESS>

    2. Run the following command to verify that no NVIDIA GPU drivers are currently installed:
      nvidia-smi
      This command should return the message nvidia-smi: not found.

    3. Run the following commands to enable maintenance mode on the host and install the NVIDIA vGPU Manager from the VIB file:
      esxcli system maintenanceMode set --enable true
      esxcli software vib install -v /NVIDIA**.vib
      You should see the message Operation finished successfully.

    4. Run the following command and verify that all eight GPU drivers are listed in the command output:
      nvidia-smi

    5. Run the following command to verify that the NVIDIA vGPU package was installed and loaded correctly:
      vmkload_mod -l | grep nvidia
      The command should return output similar to the following: nvidia 816 13808

    6. Run the following commands to exit maintenance mode and reboot the host:
      esxcli system maintenanceMode set –enable false
      reboot -f

  6. Repeat steps 4-6 for any other newly deployed compute nodes with NVIDIA GPUs.

  7. Perform the following tasks using the instructions in the NVIDIA documentation site:

    1. Install the NVIDIA license server.

    2. Configure the virtual machine guests for NVIDIA vGPU software.

    3. If you are using vGPU-enabled desktops in a virtual desktop infrastructure (VDI) context, configure VMware Horizon View for NVIDIA vGPU software.

Add the node to the cluster

You should configure NetApp HCI to use the new compute node.

What you’ll need
  • The vSphere instance NetApp HCI is using has vSphere Enterprise Plus licensing if you are adding the node to a deployment with Virtual Distributed Switches.

  • None of the vCenter or vSphere instances in use with NetApp HCI have expired licenses.

  • You have free and unused IPv4 addresses on the same network segment as existing nodes (the new node must be installed on the same network as existing nodes of its type).

  • You have the vCenter administrator account credentials ready.

Steps
  1. Open a web browser and browse to the IP address of the management node. For example:
    https://<ManagementNodeIP>;

  2. Log in to NetApp Hybrid Cloud Control by providing the NetApp HCI storage cluster administrator credentials.

  3. In the Expand Installation pane, select Expand.
    The browser opens the NetApp Deployment Engine.

  4. Log in to the NetApp Deployment Engine by providing the NetApp HCI storage cluster administrator credentials.

  5. On the Welcome page, select Yes.

  6. On the End User License page, perform the following actions:

    1. Read the VMware End User License Agreement.

    2. If you accept the terms, select I accept at the end of the agreement text.

  7. Click Continue.

  8. On the vCenter page, perform the following steps:

    1. Enter a FQDN or IP address and administrator credentials for the vCenter instance associated with your NetApp HCI installation.

    2. Select Continue.

    3. Select an existing vSphere datacenter to which to add the new compute nodes, or select Create New Datacenter to add the new compute nodes to a new datacenter.

      If you select Create New Datacenter, the Cluster field is automatically populated.
    4. If you selected an existing datacenter, select a vSphere cluster with which the new compute nodes should be associated.

      If the NetApp HCI cannot recognize the network settings of the cluster you have selected for expansion, ensure that the vmkernel and vmnic mapping for the management, storage and vMotion networks are set to the deployment defaults.
    5. Select Continue.

  9. On the ESXi Credentials page, enter an ESXi root password for the compute node or nodes you are adding.
    You should use the same password that was created during the initial NetApp HCI deployment.

  10. Select Continue.

  11. If you created a new vSphere datacenter cluster, on the Network Topology page, select a network topology to match the new compute nodes you are adding.

    You can only select the two-cable option if your compute nodes are using the two-cable topology and the existing NetApp HCI deployment is configured with VLAN IDs.
  12. On the Available Inventory page, select the node to add to the existing NetApp HCI installation.

    For some compute nodes, you might need to enable EVC at the highest level your vCenter version supports before you can add them to your installation. You should use the vSphere client to enable EVC for these compute nodes. After you enable it, refresh the Inventory page and try adding the compute nodes again.
  13. Select Continue.

  14. Optional: If you created a new vSphere datacenter cluster, on the Network Settings page, import network information from an existing NetApp HCI deployment by selecting the Copy Setting from an Existing Cluster checkbox.
    This populates the default gateway and subnet information for each network.

  15. On the Network Settings page, some of the network information has been detected from the initial deployment. Each new compute node is listed by serial number, and you should assign new network information to it. For each new compute node, perform the following steps:

    1. If NetApp HCI detected a naming prefix, copy it from the Detected Naming Prefix field, and insert it as the prefix for the new unique hostname you add in the Hostname field.

    2. In the Management IP Address field, enter a management IP address for the compute node that is within the management network subnet.

    3. In the vMotion IP Address field, enter a vMotion IP address for the compute node that is within the vMotion network subnet.

    4. In the iSCSI A - IP Address field, enter an IP address for the first iSCSI port of the compute node that is within the iSCSI network subnet.

    5. In the iSCSI B - IP Address field, enter an IP address for the second iSCSI port of the compute node that is within the iSCSI network subnet.

  16. Select Continue.

  17. On the Review page in the Network Settings section, the new node is shown in bold text. If you need to make changes to information in any section, perform the following steps:

    1. Select Edit for that section.

    2. When finished making changes, select Continue on any subsequent pages to return to the Review page.

  18. Optional: If you do not want to send cluster statistics and support information to NetApp-hosted SolidFire Active IQ servers, clear the final checkbox.
    This disables real-time health and diagnostic monitoring for NetApp HCI. Disabling this feature removes the ability for NetApp to proactively support and monitor NetApp HCI to detect and resolve problems before production is affected.

  19. Select Add Nodes.
    You can monitor the progress while NetApp HCI adds and configures the resources.

  20. Optional: Verify that any new compute nodes are visible in vCenter.