Skip to main content
Cluster and storage switches

Replace a NVIDIA SN2100 storage switch

Contributors netapp-yvonneo netapp-jolieg

You must be aware of certain configuration information, port connections and cabling requirements when you replace NVIDIA SN2100 storage switches.

Before you begin

You must verify that the following conditions exist before installing the Cumulus software and RCFs on a NVIDIA SN2100 storage switch:

  • Your system can support NVIDIA SN2100 storage switches.

  • You must have downloaded the applicable RCFs.

  • The Hardware Universe provides full details of supported ports and their configurations.

About this task

The existing network configuration must have the following characteristics:

  • Make sure that all troubleshooting steps have been completed to confirm that your switch needs replacing.

  • Management connectivity must exist on both switches.

    Note Make sure that all troubleshooting steps have been completed to confirm that your switch needs replacing.

The replacement NVIDIA SN2100 switch must have the following characteristics:

  • Management network connectivity must be functional.

  • Console access to the replacement switch must be in place.

  • The appropriate RCF and Cumulus operating system image must be loaded onto the switch.

  • Initial customization of the switch must be complete.

Procedure summary

This procedure replaces the second NVIDIA SN2100 storage switch sw2 with the new NVIDIA SN2100 switch nsw2. The two nodes are node1 and node2.

Steps to complete:

  • Confirm the switch to be replaced is sw2.

  • Disconnect the cables from switch sw2.

  • Reconnect the cables to switch nsw2.

  • Verify all device configurations on switch nsw2.

Steps
  1. If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all - message MAINT=xh

    x is the duration of the maintenance window in hours.

  2. Change the privilege level to advanced, entering y when prompted to continue: set -privilege advanced

  3. Check on the health status of the storage node ports to make sure that there is connection to storage switch S1:

    storage port show -port-type ENET

    Show example
    cluster1::*> storage port show -port-type ENET
                                      Speed                     VLAN
    Node           Port Type  Mode    (Gb/s) State    Status      ID
    -------------- ---- ----- ------- ------ -------- --------- ----
    node1
                   e3a  ENET  storage 100    enabled  online      30
                   e3b  ENET  storage   0    enabled  offline     30
                   e7a  ENET  storage   0    enabled  offline     30
                   e7b  ENET  storage 100    enabled  online      30
    node2
                   e3a  ENET  storage 100    enabled  online      30
                   e3b  ENET  storage   0    enabled  offline     30
                   e7a  ENET  storage   0    enabled  offline     30
                   e7b  ENET  storage 100    enabled  online      30
    cluster1::*>
  4. Verify that storage switch sw1 is available: network device-discovery show

    Show example
    cluster1::*> network device-discovery show protocol lldp
    Node/      Local Discovered
    Protocol   Port	 Device (LLDP: ChassisID)  Interface   Platform
    --------   ----  -----------------------   ---------   ---------
    node1/lldp
               e3a   sw1 (b8:ce:f6:19:1b:42)   swp3        -
    node2/lldp
               e3a   sw1 (b8:ce:f6:19:1b:42)   swp4        -
    cluster1::*>
  5. Run the net show interface command on the working switch to confirm that you can see both nodes and all shelves: net show interface

    Show example
    cumulus@sw1:~$ net show interface
    
    State  Name    Spd   MTU    Mode        LLDP                  Summary
    -----  ------  ----  -----  ----------  --------------------  --------------------
    ...
    ...
    UP     swp1    100G  9216   Trunk/L2   node1 (e3a)             Master: bridge(UP)
    UP     swp2    100G  9216   Trunk/L2   node2 (e3a)             Master: bridge(UP)
    UP     swp3    100G  9216   Trunk/L2   SHFFG1826000112 (e0b)   Master: bridge(UP)
    UP     swp4    100G  9216   Trunk/L2   SHFFG1826000112 (e0b)   Master: bridge(UP)
    UP     swp5    100G  9216   Trunk/L2   SHFFG1826000102 (e0b)   Master: bridge(UP)
    UP     swp6    100G  9216   Trunk/L2   SHFFG1826000102 (e0b)   Master: bridge(UP))
    ...
    ...
  6. Verify the shelf ports in the storage system: storage shelf port show -fields remote-device, remote-port

    Show example
    cluster1::*> storage shelf port show -fields remote-device, remote-port
    shelf   id  remote-port   remote-device
    -----   --  -----------   -------------
    3.20    0   swp3          sw1
    3.20    1   -             -
    3.20    2   swp4          sw1
    3.20    3   -             -
    3.30    0   swp5          sw1
    3.20    1   -             -
    3.30    2   swp6          sw1
    3.20    3   -             -
    cluster1::*>
  7. Remove all cables attached to storage switch sw2.

  8. Reconnect all cables to the replacement switch nsw2.

  9. Recheck the health status of the storage node ports: storage port show -port-type ENET

    Show example
    cluster1::*> storage port show -port-type ENET
                                        Speed                     VLAN
    Node             Port Type  Mode    (Gb/s) State    Status      ID
    ---------------- ---- ----- ------- ------ -------- --------- ----
    node1
                     e3a  ENET  storage 100    enabled  online      30
                     e3b  ENET  storage   0    enabled  offline     30
                     e7a  ENET  storage   0    enabled  offline     30
                     e7b  ENET  storage 100    enabled  online      30
    node2
                     e3a  ENET  storage 100    enabled  online      30
                     e3b  ENET  storage   0    enabled  offline     30
                     e7a  ENET  storage   0    enabled  offline     30
                     e7b  ENET  storage 100    enabled  online      30
    cluster1::*>
  10. Verify that both switches are available: net device-discovery show

    Show example
    cluster1::*> network device-discovery show protocol lldp
    Node/     Local Discovered
    Protocol  Port  Device (LLDP: ChassisID)  Interface	  Platform
    --------  ----  -----------------------   ---------   ---------
    node1/lldp
              e3a  sw1 (b8:ce:f6:19:1b:96)    swp1        -
              e7b  nsw2 (b8:ce:f6:19:1a:7e)   swp1        -
    node2/lldp
              e3a  sw1 (b8:ce:f6:19:1b:96)    swp2        -
              e7b  nsw2 (b8:ce:f6:19:1a:7e)   swp2        -
    cluster1::*>
  11. Verify the shelf ports in the storage system: storage shelf port show -fields remote-device, remote-port

    Show example
    cluster1::*> storage shelf port show -fields remote-device, remote-port
    shelf   id    remote-port     remote-device
    -----   --    -----------     -------------
    3.20    0     swp3            sw1
    3.20    1     swp3            nsw2
    3.20    2     swp4            sw1
    3.20    3     swp4            nsw2
    3.30    0     swp5            sw1
    3.20    1     swp5            nsw2
    3.30    2     swp6            sw1
    3.20    3     swp6            nsw2
    cluster1::*>
  12. Change the privilege level back to admin:

    set -privilege admin

  13. If you suppressed automatic case creation, re-enable it by invoking an AutoSupport message:

    system node autosupport invoke -node * -type all -message MAINT=END