Skip to main content
HCI
1.10

Replace H610C and H615C nodes

Contributors netapp-pcarriga netapp-dbagwell

You should replace a chassis to repair compute node failures related to the CPU, the motherboard, or if it does not power on. If you have a faulty DIMM in your H610C compute node that runs NetApp HCI Bootstrap OS version 1.6 or later, you can replace the DIMM and do not have to replace the chassis. For H615C nodes, you need not replace the chassis if a DIMM fails; you can replace only the failed DIMM.

Note

For H610C and H615C, the terms "node" and "chassis" are used interchangeably, because the node and chassis are not separate components.

NetApp recommends using the NetApp Deployment Engine to add a replacement compute node. If you cannot proceed with using the NetApp Deployment Engine for ESXi installation, see the NetApp Knowledge Base article How to install ESXi on NetApp HCI compute node manually.

What you'll need
  • You have verified that the node has failed.

  • You have a replacement chassis.
    To order a replacement, you should contact NetApp Support.

  • You have an electrostatic discharge (ESD) wristband, or you have taken other antistatic protection.

  • You have labeled each cable that is connected to the chassis.

About this task

Alarms in the VMware vSphere Web Client alert you when a host fails. You must match the serial number of the failed host from the VMware vSphere Web Client with the serial number on the sticker at the back of the node.

Step 1: Prepare to replace the node

Before you replace the node, you should migrate the virtual machines (VMs) hosted on the node to an available host, and remove the node from the cluster. You should record details about the node, such as the serial number and networking information. Migrating the VMs and recording the node details also applies in the case of component failures where the node is still online and functioning, for example, a dual inline memory module (DIMM) failure.

Steps
  1. In the VMware vSphere Web Client, perform the steps to migrate the VMs to another available host.

    Note See the VMware documentation for the migration steps.
  2. Select the failed node, and select Monitor > Hardware Status > Sensors.

  3. Make a note of the serial number of the failed node. The following screenshot is only an example:

    Shows the serial number of the failed node in VMware vSphere Web Client.

    You need the serial number to identify the chassis by matching the number that you noted with the serial number on the sticker at the back of the node.

  4. Right-click the failed node and select Connection > Disconnect.

  5. Select Yes to confirm the action.

  6. Right-click the failed node and select Remove from Inventory.

  7. Click Yes to confirm the action.

Step 2: Replace the node

After you remove the failed node from the cluster, you can remove the failed chassis, and install the replacement chassis.

Note Ensure that you have antistatic protection before you perform the steps here.
Steps
  1. Unpack the new chassis, and set it on a level surface.
    Keep the packaging material for when you return the failed chassis to NetApp.

  2. Label each cable that is inserted at the back of the chassis that you are going to remove.
    After you install the new chassis, you must insert the cables back into the original ports.

  3. Disconnect all the cables from the back of the chassis.

  4. Remove the chassis by unscrewing the thumbscrews on the mounting ears.
    You must package and return the failed chassis to NetApp.

  5. Slide the replacement chassis on to the rails.

    Caution Ensure that you do not use excessive force when sliding the chassis on to the rails.
  6. Only for H615C. Remove the DIMMs from the failed chassis and insert these DIMMs in the replacement chassis.

    Note You should replace the DIMMs in the same slots they were removed from in the failed node.
  7. Remove the two power supply units on either side of the failed chassis and insert them in the replacement chassis.

  8. Reconnect the cables to the ports from which you originally disconnected them.
    The labels you had added on the cables when you disconnected them will help guide you.

    Caution If the airflow vents at the rear of the chassis are blocked by cables or labels, it can lead to premature component failures due to overheating.
    Do not force the cables into the ports; you might damage the cables, ports, or both.
  9. Power on the chassis.

Step 3: Add the node to the cluster

You should configure NetApp HCI to use the new compute node.

What you'll need
  • The vSphere instance NetApp HCI is using has vSphere Enterprise Plus licensing if you are adding the node to a deployment with Virtual Distributed Switches.

  • None of the vCenter or vSphere instances in use with NetApp HCI have expired licenses.

  • You have free and unused IPv4 addresses on the same network segment as existing nodes (the new node must be installed on the same network as existing nodes of its type).

  • You have the vCenter administrator account credentials ready.

Steps
  1. Open the IP address of the management node in a web browser. For example:

    https://<ManagementNodeIP>
  2. Log in to NetApp Hybrid Cloud Control by providing the NetApp HCI storage cluster administrator credentials.

  3. In the Expand Installation pane, select Expand.

    The browser opens the NetApp Deployment Engine.

  4. Log in to the NetApp Deployment Engine by providing the local NetApp HCI storage cluster administrator credentials.

    Note You cannot log in using Lightweight Directory Access Protocol credentials.
  5. On the Welcome page, select Yes.

  6. On the End User License page, perform the following actions:

    1. Read the VMware End User License Agreement.

    2. If you accept the terms, select I accept at the end of the agreement text.

  7. Click Continue.

  8. On the vCenter page, perform the following steps:

    1. Enter a FQDN or IP address and administrator credentials for the vCenter instance associated with your NetApp HCI installation.

    2. Select Continue.

    3. Select an existing vSphere datacenter to which to add the new compute nodes, or select Create New Datacenter to add the new compute nodes to a new datacenter.

      Note If you select Create New Datacenter, the Cluster field is automatically populated.
    4. If you selected an existing datacenter, select a vSphere cluster with which the new compute nodes should be associated.

      Note If the NetApp HCI cannot recognize the network settings of the cluster you have selected for expansion, ensure that the vmkernel and vmnic mapping for the management, storage and vMotion networks are set to the deployment defaults.
    5. Select Continue.

  9. On the ESXi Credentials page, enter an ESXi root password for the compute node or nodes you are adding.
    You should use the same password that was created during the initial NetApp HCI deployment.

  10. Select Continue.

  11. If you created a new vSphere datacenter cluster, on the Network Topology page, select a network topology to match the new compute nodes you are adding.

    Note You can only select the two-cable option if your compute nodes are using the two-cable topology and the existing NetApp HCI deployment is configured with VLAN IDs.
  12. On the Available Inventory page, select the node to add to the existing NetApp HCI installation.

    Tip For some compute nodes, you might need to enable EVC at the highest level your vCenter version supports before you can add them to your installation. You should use the vSphere client to enable EVC for these compute nodes. After you enable it, refresh the Inventory page and try adding the compute nodes again.
  13. Select Continue.

  14. Optional: If you created a new vSphere datacenter cluster, on the Network Settings page, import network information from an existing NetApp HCI deployment by selecting the Copy Setting from an Existing Cluster checkbox.
    This populates the default gateway and subnet information for each network.

  15. On the Network Settings page, some of the network information has been detected from the initial deployment. Each new compute node is listed by serial number, and you should assign new network information to it. For each new compute node, perform the following steps:

    1. If NetApp HCI detected a naming prefix, copy it from the Detected Naming Prefix field, and insert it as the prefix for the new unique hostname you add in the Hostname field.

    2. In the Management IP Address field, enter a management IP address for the compute node that is within the management network subnet.

    3. In the vMotion IP Address field, enter a vMotion IP address for the compute node that is within the vMotion network subnet.

    4. In the iSCSI A - IP Address field, enter an IP address for the first iSCSI port of the compute node that is within the iSCSI network subnet.

    5. In the iSCSI B - IP Address field, enter an IP address for the second iSCSI port of the compute node that is within the iSCSI network subnet.

  16. Select Continue.

  17. On the Review page in the Network Settings section, the new node is shown in bold text. If you need to make changes to information in any section, perform the following steps:

    1. Select Edit for that section.

    2. When finished making changes, select Continue on any subsequent pages to return to the Review page.

  18. Optional: If you do not want to send cluster statistics and support information to NetApp-hosted SolidFire Active IQ servers, clear the final checkbox.
    This disables real-time health and diagnostic monitoring for NetApp HCI. Disabling this feature removes the ability for NetApp to proactively support and monitor NetApp HCI to detect and resolve problems before production is affected.

  19. Select Add Nodes.
    You can monitor the progress while NetApp HCI adds and configures the resources.

  20. Optional: Verify that any new compute nodes are visible in vCenter.

Step 4: Install the GPU drivers

Compute nodes with NVIDIA graphics processing units (GPUs), like the H610C node, need the NVIDIA software drivers installed in VMware ESXi so that they can take advantage of the increased processing power. To install the GPU drivers, the compute node must have a GPU card.

Steps
  1. Open a browser and browse to the NVIDIA licensing portal at the following URL:
    https://nvid.nvidia.com/dashboard/

  2. Download the driver package version to your computer, depending on your environment.

    The following example shows the driver package version for vSphere 6.0, 6.5, and 6.7:

    vSphere version Driver package

    vSphere 6.0

    NVIDIA-GRID-vSphere-6.0-390.94-390.96-392.05.zip

    vSphere 6.5

    NVIDIA-GRID-vSphere-6.5-410.92-410.91-412.16.zip

    vSphere 6.7

    NVIDIA-GRID-vSphere-6.7-410.92-410.91-412.16.zip

  3. Extract the driver package on your computer.
    The resulting .VIB file is the uncompressed driver file.

  4. Copy the .VIB driver file from your computer to ESXi running on the compute node. The Secure Copy Protocol (SCP) utility is readily available in most Linux distributions, or available as a downloadable utility for all versions of Windows.

    The following example shows the commands for ESXi 6.0, 6.5, and 6.7. The commands assume that the driver is located in the $HOME/NVIDIA/ESX6.x/ directory on the management host:

    Option Description

    ESXi 6.0

    scp $HOME/NVIDIA/ESX6.0/NVIDIA**.vib root@<ESXi_IP_ADDR>:/.

    ESXi 6.5

    scp $HOME/NVIDIA/ESX6.5/NVIDIA**.vib root@<ESXi_IP_ADDR>:/.

    ESXi 6.7

    scp $HOME/NVIDIA/ESX6.7/NVIDIA**.vib root@<ESXi_IP_ADDR>:/.

  5. Use the following steps to log in as root to the ESXi host and install the NVIDIA vGPU manager in ESXi.

    1. Run the following command to log in to the ESXi host as the root user:
      ssh root@<ESXi_IP_ADDRESS>

    2. Run the following command to verify that no NVIDIA GPU drivers are currently installed:
      nvidia-smi
      This command should return the message nvidia-smi: not found.

    3. Run the following commands to enable maintenance mode on the host and install the NVIDIA vGPU Manager from the VIB file:
      esxcli system maintenanceMode set --enable true
      esxcli software vib install -v /NVIDIA**.vib
      You should see the message Operation finished successfully.

    4. Run the following command and verify that all eight GPU drivers are listed in the command output:
      nvidia-smi

    5. Run the following command to verify that the NVIDIA vGPU package was installed and loaded correctly:
      vmkload_mod -l | grep nvidia
      The command should return output similar to the following: nvidia 816 13808

    6. Run the following commands to exit maintenance mode and reboot the host:
      esxcli system maintenanceMode set –enable false
      reboot -f

  6. Repeat steps 4-6 for any other newly deployed compute nodes with NVIDIA GPUs.

  7. Perform the following tasks using the instructions in the NVIDIA documentation site:

    1. Install the NVIDIA license server.

    2. Configure the virtual machine guests for NVIDIA vGPU software.

    3. If you are using vGPU-enabled desktops in a virtual desktop infrastructure (VDI) context, configure VMware Horizon View for NVIDIA vGPU software.