Replace a DIMM - FAS500f

Contributors dougthomp netapp-martyh

You must replace a DIMM in the controller module when your system registers an increasing number of correctable error correction codes (ECC); failure to do so causes a system panic.

All other components in the system must be functioning properly; if not, you must contact technical support.

You must replace the failed component with a replacement FRU component you received from your provider.

Step 1: Shut down the impaired controller

You can shut down or take over the impaired controller using different procedures, depending on the storage system hardware configuration.

Option 1: Most configurations

To shut down the impaired node, you must determine the status of the node and, if necessary, take over the node so that the healthy node continues to serve data from the impaired node storage.

About this task

If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy node shows false for eligibility and health, you must correct the issue before shutting down the impaired node; see the Administration overview with the CLI.

Steps
  1. If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=number_of_hours_downh

    The following AutoSupport message suppresses automatic case creation for two hours: cluster1:*> system node autosupport invoke -node * -type all -message MAINT=2h

  2. Disable automatic giveback from the console of the healthy node: storage failover modify –node local -auto-giveback false

  3. Take the impaired node to the LOADER prompt:

    If the impaired node is displaying…​ Then…​

    The LOADER prompt

    Go to the next step.

    Waiting for giveback…​

    Press Ctrl-C, and then respond y when prompted.

    System prompt or password prompt (enter system password)

    Take over or halt the impaired node:

    • For an HA pair, take over the impaired node from the healthy node: storage failover takeover -ofnode impaired_node_name

      When the impaired node shows Waiting for giveback…​, press Ctrl-C, and then respond y.

Option 2: Controller is in a MetroCluster

Note Do not use this procedure if your system is in a two-node MetroCluster configuration.

To shut down the impaired node, you must determine the status of the node and, if necessary, take over the node so that the healthy node continues to serve data from the impaired node storage.

  • If you have a cluster with more than two nodes, it must be in quorum. If the cluster is not in quorum or a healthy node shows false for eligibility and health, you must correct the issue before shutting down the impaired node; see the Administration overview with the CLI.

  • If you have a MetroCluster configuration, you must have confirmed that the MetroCluster Configuration State is configured and that the nodes are in an enabled and normal state (metrocluster node show).

Steps
  1. If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=number_of_hours_downh

    The following AutoSupport message suppresses automatic case creation for two hours: cluster1:*> system node autosupport invoke -node * -type all -message MAINT=2h

  2. Disable automatic giveback from the console of the healthy node: storage failover modify –node local -auto-giveback false

  3. Take the impaired node to the LOADER prompt:

    If the impaired node is displaying…​ Then…​

    The LOADER prompt

    Go to the next step.

    Waiting for giveback…​

    Press Ctrl-C, and then respond y when prompted.

    System prompt or password prompt (enter system password)

    Take over or halt the impaired node:

    • For an HA pair, take over the impaired node from the healthy node: storage failover takeover -ofnode impaired_node_name

      When the impaired node shows Waiting for giveback…​, press Ctrl-C, and then respond y.

Step 2: Remove the controller module

You must remove the controller module from the chassis when you replace a component inside the controller module.

Make sure that you label the cables so that you know where they came from.

  1. If you are not already grounded, properly ground yourself.

  2. Unplug the controller module power supplies from the source.

  3. Release the power cable retainers, and then unplug the cables from the power supplies.

  4. Insert your forefinger into the latching mechanism on either side of the controller module, press the lever with your thumb, and gently pull the controller a few inches out of the chassis.

    Note If you have difficulty removing the controller module, place your index fingers through the finger holes from the inside (by crossing your arms)
    drw a250 pcm remove install

    legend icon 01

    Lever

    legend icon 02

    Latching mechanism

  5. Using both hands, grasp the controller module sides and gently pull it out of the chassis and set it on a flat, stable surface.

  6. Turn the thumbscrew on the front of the controller module anti-clockwise and open the controller module cover.

    drw a250 open controller module cover

    legend icon 01

    Thumbscrew

    legend icon 02

    Controller module cover.

  7. Lift out the air duct cover.

    drw a250 remove airduct cover

Step 3: Replace a DIMM

To replace a DIMM, you must locate it in the controller module using the DIMM map label on top of the air duct or locating it using the LED next to the DIMM, and then replace it following the specific sequence of steps.

You can use the following video or the tabulated steps to replace a DIMM:

  1. Replace the impaired DIMM on your controller module.

    The DIMMs are in slot 3 or 1 on the motherboard. Slot 2 and 4 are left empty. Do not attempt to install DIMMs into these slots.

    Note The fault LED located on the board next to each DIMM blinks every two seconds.
    drw a250 dimm replace
    1. Note the orientation of the DIMM in the socket so that you can insert the replacement DIMM in the proper orientation.

    2. Slowly push apart the DIMM ejector tabs on either side of the DIMM, and slide the DIMM out of the slot.

    3. Leave DIMM ejector tabs on the connector in the open position.

    4. Remove the replacement DIMM from the antistatic shipping bag, hold the DIMM by the corners, and align it to the slot.

      Note Hold the DIMM by the edges to avoid pressure on the components on the DIMM circuit board.
    5. Insert the replacement DIMM squarely into the slot.

      The DIMMs fit tightly in the socket. If not, reinsert the DIMM to realign it with the socket.

    6. Visually inspect the DIMM to verify that it is evenly aligned and fully inserted into the socket.

Step 4: Install the controller module

After you have replaced the component in the controller module, you must re-install the controller module into the chassis, and then boot it to Maintenance mode.

You can use the following illustration or the written steps to install the replacement controller module in the chassis.

  1. If you have not already done so, install the air duct.

    drw a250 install airduct cover
  2. Close the controller module cover and tighten the thumbscrew.

    drw a250 close controller module cover

    legend icon 01

    Controller module cover

    legend icon 02

    Thumbscrew

  3. Insert the controller module into the chassis:

    1. Ensure the latching mechanism arms are locked in the fully extended position.

    2. Using both hands, align and gently slide the controller module into the latching mechanism arms until it stops.

    3. Place your index fingers through the finger holes from the inside of the latching mechanism.

    4. Press your thumbs down on the orange tabs on top of the latching mechanism and gently push the controller module over the stop.

    5. Release your thumbs from the top of the latching mechanisms and continue pushing until the latching mechanisms snap into place.

      The controller module begins to boot as soon as it is fully seated in the chassis. Be prepared to interrupt the boot process.

    The controller module should be fully inserted and flush with the edges of the chassis.

  4. Cable the management and console ports only, so that you can access the system to perform the tasks in the following sections.

    Note You will connect the rest of the cables to the controller module later in this procedure.

Step 5: Run diagnostics

After you have replaced a component in your system, you should run diagnostic tests on that component.

Your system must be at the LOADER prompt to start diagnostics.

All commands in the diagnostic procedures are issued from the node where the component is being replaced.

  1. If the node to be serviced is not at the LOADER prompt, reboot the node: system node halt -node node_name

    After you issue the command, you should wait until the system stops at the LOADER prompt.

  2. At the LOADER prompt, access the special drivers specifically designed for system-level diagnostics to function properly: boot_diags

  3. Select Scan System from the displayed menu to enable running the diagnostics tests.

  4. Select Test Memory from the displayed menu.

  5. Proceed based on the result of the preceding step:

    • If the test failed, correct the failure, and then rerun the test.

    • If the test reported no failures, select Reboot from the menu to reboot the system.

Step 6: Return the failed part to NetApp

After you replace the part, you can return the failed part to NetApp, as described in the RMA instructions shipped with the kit. Contact technical support at NetApp Support, 888-463-8277 (North America), 00-800-44-638277 (Europe), or +800-800-80-800 (Asia/Pacific) if you need the RMA number or additional help with the replacement procedure.