Preparing for DIMM replacement

When issues with the dual inline memory module (DIMM) occur, VMware ESXi displays alerts, such as Memory Configuration Error, Memory Uncorrectable ECC, Memory Transition to Critical, and Memory Critical Overtemperature. Even if the alerts disappear after a while, the hardware problem will persist. You must diagnose and address the failed DIMM. You can get information about the failed DIMM from VMware vCenter Web Client. If you need more information than what is available from vCenter, you must run the hardware check in the terminal user interface (TUI).

Before you begin

Steps

  1. Access the node by logging in to the VMware vCenter Web Client, right-click the node that is reporting the error, and select the option to place the node in maintenance mode.
  2. Migrate the virtual machines (VMs) to another available host.
    See the VMware documentation for the migration steps.
  3. Power down the compute node from vSphere.
    Note: If you have the information about which DIMM needs to be replaced, you can skip the following steps and go to the next task.

    Removing and replacing the failed DIMM

    You need to perform the following steps only if you need to access the TUI to further diagnose the DIMM issue.
  4. Plug in a keyboard, video, and mouse (KVM) to the back of the node that reported the error in vSphere.
  5. Press the power button at the front of the node.
    It takes approximately six minutes for the node to boot. The screen displays a boot menu when the node boots up.
  6. Use the keyboard to select NetApp Safe Mode.
    Warning: You must do this in three seconds. If you miss the window, you will need to go through the boot process again.
  7. In the TUI window that opens, navigate to Maintenance Tasks > Check Hardware , and select OK.
    A window opens with the results of the hardware check. If the check detects a DIMM failure, the results include a timestamp and a slot identifier. You must record the CPU number and the DIMM slot number/ID; this will help you identify the failed DIMM in the chassis.

    The following screenshot shows sample output from the hardware check:

  8. Applies only to H410C and H615C. Perform the steps to identify the DIMM manufacturer.
    This step is important, because H410C and H615C include DIMMs from different manufacturers. You need to identify the manufacturer of the failed DIMM, so that you can order the correct replacement.

    Identifying the DIMM manufacturer for H410C and H615C