Skip to main content
Install and maintain

Hot swap an I/O module - AFF C80

Contributors dougthomp

You can hot swap an Ethernet I/O module in your AFF C80 storage system if a module fails and your storage system meets all ONTAP version requirements.

To hot swap an I/O module, make sure your storage system is running ONTAP 9.18.1 GA or later, prepare your storage system and I/O module, hot-swap the failed module, bring the replacement module online, restore the storage system to normal operation, and return the failed module to NetApp.

About this task
  • You do not need to perform a manual takeover before replacing the failed I/O module.

  • Apply commands to the correct controller and I/O slot during the hot-swap:

    • The impaired controller is the controller where you are replacing the I/O module.

    • The healthy controller is the HA partner of the impaired controller.

  • You can turn on the storage system location (blue) LEDs to aid in physically locating the affected storage system. Log into the BMC using SSH and enter the system location-led on command.

    The storage system includes three location LEDs: one on the operator display panel and one on each controller. The LEDs remain illuminated for 30 minutes.

    You can turn them off by entering the system location-led off command. If you are unsure if the LEDs are on or off, you can check their state by entering the system location-led show command.

Step 1: Ensure the storage system meets the procedure requirements

To use this procedure, your storage system must be running ONTAP 9.18.1 GA or later, and your storage system must meet all requirements.

Note If your storage system is not running ONTAP 9.18.1 GA or later, you cannot use this procedure, you must use the replace an I/O module procedure.
  • You are hot swapping an Ethernet I/O module in any slot having any combination of ports used for cluster, HA, and client with an equivalent I/O module. You cannot change the I/O module type.

    Ethernet I/O modules with ports used for storage or MetroCluster are not hot-swappable.

  • Your storage system (switchless or switched cluster configuration) can have any number of nodes supported for your storage system.

  • All nodes in the cluster must be running the same ONTAP version (ONTAP 9.18.1GA or later) or running different patch levels of the same ONTAP version.

    If nodes in your cluster are running different ONTAP versions, this is considered a mixed-version cluster and hot-swapping an I/O module is not supported.

  • The controllers in your storage system can be in either of the following states:

    • Both controllers can be up and running I/O (serving data).

    • Either controller can be in a takeover state if the takeover was caused by the failed I/O module and the nodes are otherwise functioning properly.

      In certain situations, ONTAP can automatically perform a takeover of either controller due to the failed I/O module. For example, if the failed I/O module contained all of the cluster ports (all of the cluster links on that controller go down) ONTAP automatically performs a takeover.

  • All other components in the storage system must be functioning properly; if not, contact NetApp Support before continuing with this procedure.

Step 2: Prepare the storage system and I/O module slot

Prepare the storage system and I/O module slot so that it is safe to remove the failed I/O module:

Steps
  1. Properly ground yourself.

  2. Label the cables to identify where they came from, and then unplug all cables from the target I/O module.

    Note

    The I/O module should be failed (ports should be in the link down state); however, if the links are still up and they contain the last functioning cluster port, unplugging the cables triggers an automatic takeover.

    Wait five minutes after unplugging the cables to ensure any takeovers or LIF failovers complete before continuing with this procedure.

  3. If AutoSupport is enabled, suppress automatic case creation by invoking an AutoSupport message:

    system node autosupport invoke -node * -type all -message MAINT=<number of hours down>h

    For example, the following AutoSupport message suppresses automatic case creation for two hours:

    node2::> system node autosupport invoke -node * -type all -message MAINT=2h

  4. Disable automatic giveback if the partner node has been taken over:

    If…​ Then…​

    If either controller took over its partner automatically

    Disable automatic giveback:

    1. Enter the following command from the console of the controller that took over its partner:

      storage failover modify -node local -auto-giveback false

    2. Enter y when you see the prompt Do you want to disable auto-giveback?

    Both controllers are up and running I/O (serving data)

    Go to the next step.

  5. Prepare the failed I/O module for removal by removing it from service and powering it off:

    1. Enter the following command:

      system controller slot module remove -node impaired_node_name -slot slot_number

    2. Enter y when you see the prompt Do you want to continue?

      For example, the following command prepares the failed module in slot 7 on node 2 (the impaired controller) for removal, and displays a message that it is safe to remove:

      node2::> system controller slot module remove -node node2 -slot 7
      
      Warning: IO_2X_100GBE_NVDA_NIC module in slot 7 of node node2 will be powered off for removal.
      
      Do you want to continue? {y|n}: y
      
      The module has been successfully removed from service and powered off. It can now be safely removed.
  6. Verify the failed I/O module is powered off:

    system controller slot module show

    The output should show powered-off in the status column for the failed module and its slot number.

Step 3: Replace the failed I/O module

Replace the failed I/O module with an equivalent I/O module.

Steps
  1. If you are not already grounded, properly ground yourself.

  2. Rotate the cable management tray down by pulling the buttons on the inside of the cable management tray and rotating it down.

  3. Remove the I/O module from the controller module:

    Note The following illustration shows removing a horizontal and vertical I/O module. Typically, you will only remove one I/O module.
    Remove I/O module

    Callout number 1

    Cam locking button

    1. Depress the cam latch button.

    2. Rotate the cam latch away from the module as far as it will go.

    3. Remove the module from the controller module by hooking your finger into the cam lever opening and pulling the module out of the controller module.

      Keep track of which slot the I/O module was in.

  4. Set the I/O module aside.

  5. Install the replacement I/O module into the target slot:

    1. Align the I/O module with the edges of the slot.

    2. Gently slide the module into the slot all the way into the controller module, and then rotate the cam latch all the way up to lock the module in place.

  6. Cable the I/O module.

  7. Rotate the cable management tray into the locked position.

Step 4: Bring the replacement I/O module online

Bring the replacement I/O module online, verify the I/O module ports initialized successfully, verify the slot is powered on, and then verify the I/O module is online and recognized.

About this task

After the I/O module is replaced and the ports are returned to a healthy state, LIFs are reverted to the replaced I/O module.

Steps
  1. Bring the replacement I/O module online:

    1. Enter the following command:

      system controller slot module insert -node impaired_node_name -slot slot_number

    2. Enter y when you see the prompt, Do you want to continue?

      The output should confirm the I/O module was successfully brought online (powered on, initialized, and placed into service).

      For example, the following command brings slot 7 on node 2 (the impaired controller) online, and displays a message that the process was successful:

      node2::> system controller slot module insert -node node2 -slot 7
      
      Warning: IO_2X_100GBE_NVDA_NIC module in slot 7 of node node2 will be powered on and initialized.
      
      Do you want to continue? {y|n}: `y`
      
      The module has been successfully powered on, initialized and placed into service.
  2. Verify that each port on the I/O module successfully initialized:

    1. Enter the following command from the console of the impaired controller:

      event log show -event *hotplug.init*

      Note It might take several minutes for any required firmware updates and port initialization.

      The output should show one or more hotplug.init.success EMS events and hotplug.init.success: in the Event column, indicating each port on the I/O module initialized successfully.

      For example, the following output shows initialization succeeded for I/O ports e7b and e7a:

      node2::> event log show -event *hotplug.init*
      
      Time                Node             Severity      Event
      
      ------------------- ---------------- ------------- ---------------------------
      
      7/11/2025 16:04:06  node2      NOTICE        hotplug.init.success: Initialization of ports "e7b" in slot 7 succeeded
      
      7/11/2025 16:04:06  node2      NOTICE        hotplug.init.success: Initialization of ports "e7a" in slot 7 succeeded
      
      2 entries were displayed.
    2. If the port initialization fails, review the EMS log for the next steps to take.

  3. Verify the I/O module slot is powered on and ready for operation:

    system controller slot module show

    The output should show the slot status as powered-on and therefore ready for operation of the I/O module.

  4. Verify that the I/O module is online and recognized.

    Enter the command from the console of the impaired controller:

    system controller config show -node local -slot slot_number

    If the I/O module was successfully brought online and is recognized, the output shows I/O module information, including port information for the slot.

    For example, you should see output similar to the following for an I/O module in slot 7:

    node2::> system controller config show -node local -slot 7
    
    Node: node2
    Sub- Device/
    Slot slot Information
    ---- ---- -----------------------------
       7    - Dual 40G/100G Ethernet Controller CX6-DX
                      e7a MAC Address: d0:39:ea:59:69:74 (auto-100g_cr4-fd-up)
                              QSFP Vendor:        CISCO-BIZLINK
                              QSFP Part Number:   L45593-D218-D10
                              QSFP Serial Number: LCC2807GJFM-B
                      e7b MAC Address: d0:39:ea:59:69:75 (auto-100g_cr4-fd-up)
                              QSFP Vendor:        CISCO-BIZLINK
                              QSFP Part Number:   L45593-D218-D10
                              QSFP Serial Number: LCC2809G26F-A
                      Device Type:        CX6-DX PSID(NAP0000000027)
                      Firmware Version:   22.44.1700
                      Part Number:        111-05341
                      Hardware Revision:  20
                      Serial Number:      032403001370

Step 5: Restore the storage system to normal operation

Restore your storage system to normal operation by giving back storage to the controller that was taken over (as needed), restoring automatic giveback (as needed), verifying LIFs are on their home ports, and reenabling AutoSupport automatic case creation.

Steps
  1. As needed for the version of ONTAP your storage system is running and the state of the controllers, give back storage and restore automatic giveback on the controller that was taken over:

    If…​ Then…​

    If either controller took over its partner automatically

    1. Return the controller that was taken over to normal operation by giving back its storage:

      storage failover giveback -ofnode controller that was taken over_name

    2. Restore automatic giveback from the console of the controller that was taken over:

      storage failover modify -node local -auto-giveback true

    Both controllers are up and running I/O (serving data)

    Go to the next step.

  2. Verify that the logical interfaces are reporting to their home node and ports: network interface show -is-home false

    If any LIFs are listed as false, revert them to their home ports: network interface revert -vserver * -lif *

  3. If AutoSupport is enabled, restore automatic case creation:

    system node autosupport invoke -node * -type all -message MAINT=end

Step 6: Return the failed part to NetApp

Return the failed part to NetApp, as described in the RMA instructions shipped with the kit. See the Part Return and Replacements page for further information.