Skip to main content
Cluster and storage switches

Replace a NVIDIA SN2100 storage switch

Contributors netapp-yvonneo netapp-jolieg

You can replace a defective NVIDIA SN2100 storage switch. This is a nondisruptive procedure.

What you'll need

Before installing the Cumulus software and RCFs on a NVIDIA SN2100 storage switch, ensure that:

  • Your system can support NVIDIA SN2100 storage switches.

  • You have downloaded the applicable RCFs.

The Hardware Universe provides full details of supported ports and their configurations.

The existing network configuration must have the following characteristics:

  • Make sure that all troubleshooting steps have been completed to confirm that your switch needs replacing.

  • Management connectivity must exist on both switches.

    Note Make sure that all troubleshooting steps have been completed to confirm that your switch needs replacing.

The replacement NVIDIA SN2100 switch must have the following characteristics:

  • Management network connectivity is functional.

  • Console access to the replacement switch is in place.

  • The appropriate RCF and Cumulus operating system image is loaded onto the switch.

  • Initial customization of the switch is complete.

Procedure summary

This procedure replaces the second NVIDIA SN2100 storage switch sw2 with the new NVIDIA SN2100 switch nsw2. The two nodes are node1 and node2.

Steps to complete:

  • Confirm the switch to be replaced is sw2.

  • Disconnect the cables from switch sw2.

  • Reconnect the cables to switch nsw2.

  • Verify all device configurations on switch nsw2.

Steps
  1. If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message:

    system node autosupport invoke -node * -type all - message MAINT=xh

    x is the duration of the maintenance window in hours.

  2. Change the privilege level to advanced, entering y when prompted to continue:

    set -privilege advanced

  3. Check on the health status of the storage node ports to make sure that there is connection to storage switch S1:

    storage port show -port-type ENET

    Show example
    cluster1::*> storage port show -port-type ENET
                                      Speed                     VLAN
    Node           Port Type  Mode    (Gb/s) State    Status      ID
    -------------- ---- ----- ------- ------ -------- --------- ----
    node1
                   e3a  ENET  storage 100    enabled  online      30
                   e3b  ENET  storage   0    enabled  offline     30
                   e7a  ENET  storage   0    enabled  offline     30
                   e7b  ENET  storage 100    enabled  online      30
    node2
                   e3a  ENET  storage 100    enabled  online      30
                   e3b  ENET  storage   0    enabled  offline     30
                   e7a  ENET  storage   0    enabled  offline     30
                   e7b  ENET  storage 100    enabled  online      30
    cluster1::*>
  4. Verify that storage switch sw1 is available:

    network device-discovery show -protocol lldp

    Show example
    cluster1::*> network device-discovery show -protocol lldp
    Node/       Local  Discovered
    Protocol    Port   Device (LLDP: ChassisID)  Interface         Platform
    ----------- ------ ------------------------- ----------------  ----------------
    node1/lldp
                e0M    sw1 (00:ea:bd:68:6a:e8)   Eth1/46           -
                e0b    sw2 (6c:b2:ae:5f:a5:b2)   Ethernet1/16      -
                e0c    SHFFG1827000286 (d0:39:ea:1c:16:92)
                                                 e0a               -
                e0e    sw3 (6c:b2:ae:5f:a5:ba)   Ethernet1/18      -
                e0f    SHFFG1827000286 (00:a0:98:fd:e4:a9)
                                                 e0b               -
                e0g    sw4 (28:ac:9e:d5:4a:9c)   Ethernet1/11      -
                e0h    sw5 (6c:b2:ae:5f:a5:ca)   Ethernet1/22      -
                e1a    sw6 (00:f6:63:10:be:7c)   Ethernet1/33      -
                e1b    sw7 (00:f6:63:10:be:7d)   Ethernet1/34      -
                e2a    sw8 (b8:ce:f6:91:3d:88)   Ethernet1/35      -
    Press <space> to page down, <return> for next line, or 'q' to quit...
    10 entries were displayed.
  5. Run the net show interface command on the working switch to confirm that you can see both nodes and all shelves:

    net show interface

    Show example
    cumulus@sw1:~$ net show interface
    
    State  Name    Spd   MTU    Mode        LLDP                  Summary
    -----  ------  ----  -----  ----------  --------------------  --------------------
    ...
    ...
    UP     swp1    100G  9216   Trunk/L2   node1 (e3a)             Master: bridge(UP)
    UP     swp2    100G  9216   Trunk/L2   node2 (e3a)             Master: bridge(UP)
    UP     swp3    100G  9216   Trunk/L2   SHFFG1826000112 (e0b)   Master: bridge(UP)
    UP     swp4    100G  9216   Trunk/L2   SHFFG1826000112 (e0b)   Master: bridge(UP)
    UP     swp5    100G  9216   Trunk/L2   SHFFG1826000102 (e0b)   Master: bridge(UP)
    UP     swp6    100G  9216   Trunk/L2   SHFFG1826000102 (e0b)   Master: bridge(UP))
    ...
    ...
  6. Verify the shelf ports in the storage system:

    storage shelf port show -fields remote-device, remote-port

    Show example
    cluster1::*> storage shelf port show -fields remote-device, remote-port
    shelf   id  remote-port   remote-device
    -----   --  -----------   -------------
    3.20    0   swp3          sw1
    3.20    1   -             -
    3.20    2   swp4          sw1
    3.20    3   -             -
    3.30    0   swp5          sw1
    3.20    1   -             -
    3.30    2   swp6          sw1
    3.20    3   -             -
    cluster1::*>
  7. Remove all cables attached to storage switch sw2.

  8. Reconnect all cables to the replacement switch nsw2.

  9. Recheck the health status of the storage node ports:

    storage port show -port-type ENET

    Show example
    cluster1::*> storage port show -port-type ENET
                                        Speed                     VLAN
    Node             Port Type  Mode    (Gb/s) State    Status      ID
    ---------------- ---- ----- ------- ------ -------- --------- ----
    node1
                     e3a  ENET  storage 100    enabled  online      30
                     e3b  ENET  storage   0    enabled  offline     30
                     e7a  ENET  storage   0    enabled  offline     30
                     e7b  ENET  storage 100    enabled  online      30
    node2
                     e3a  ENET  storage 100    enabled  online      30
                     e3b  ENET  storage   0    enabled  offline     30
                     e7a  ENET  storage   0    enabled  offline     30
                     e7b  ENET  storage 100    enabled  online      30
    cluster1::*>
  10. Verify that both switches are available:

    net device-discovery show -protocol lldp

    Show example
    cluster1::*> network device-discovery show -protocol lldp
    Node/       Local  Discovered
    Protocol    Port   Device (LLDP: ChassisID)  Interface         Platform
    ----------- ------ ------------------------- ----------------  ----------------
    node1/lldp
                e0M    sw1 (00:ea:bd:68:6a:e8)   Eth1/46           -
                e0b    sw2 (6c:b2:ae:5f:a5:b2)   Ethernet1/16      -
                e0c    SHFFG1827000286 (d0:39:ea:1c:16:92)
                                                 e0a               -
                e0e    sw3 (6c:b2:ae:5f:a5:ba)   Ethernet1/18      -
                e0f    SHFFG1827000286 (00:a0:98:fd:e4:a9)
                                                 e0b               -
                e0g    sw4 (28:ac:9e:d5:4a:9c)   Ethernet1/11      -
                e0h    sw5 (6c:b2:ae:5f:a5:ca)   Ethernet1/22      -
                e1a    sw6 (00:f6:63:10:be:7c)   Ethernet1/33      -
                e1b    sw7 (00:f6:63:10:be:7d)   Ethernet1/34      -
                e2a    sw8 (b8:ce:f6:91:3d:88)   Ethernet1/35      -
    Press <space> to page down, <return> for next line, or 'q' to quit...
    10 entries were displayed.
  11. Verify the shelf ports in the storage system:

    storage shelf port show -fields remote-device, remote-port

    Show example
    cluster1::*> storage shelf port show -fields remote-device, remote-port
    shelf   id    remote-port     remote-device
    -----   --    -----------     -------------
    3.20    0     swp3            sw1
    3.20    1     swp3            nsw2
    3.20    2     swp4            sw1
    3.20    3     swp4            nsw2
    3.30    0     swp5            sw1
    3.20    1     swp5            nsw2
    3.30    2     swp6            sw1
    3.20    3     swp6            nsw2
    cluster1::*>
  12. Enable the Ethernet switch health monitor log collection feature for collecting switch-related log files, using the two commands:

    system switch ethernet log setup-password and system switch ethernet log enable-collection

    Enter: system switch ethernet log setup-password

    Show example
    cluster1::*> system switch ethernet log setup-password
    Enter the switch name: <return>
    The switch name entered is not recognized.
    Choose from the following list:
    sw1
    nsw2
    
    cluster1::*> system switch ethernet log setup-password
    
    Enter the switch name: sw1
    RSA key fingerprint is e5:8b:c6:dc:e2:18:18:09:36:63:d9:63:dd:03:d9:cc
    Do you want to continue? {y|n}::[n] y
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> system switch ethernet log setup-password
    
    Enter the switch name: nsw2
    RSA key fingerprint is 57:49:86:a1:b9:80:6a:61:9a:86:8e:3c:e3:b7:1f:b1
    Do you want to continue? {y|n}:: [n] y
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>

    Followed by:

    system switch ethernet log enable-collection

    Show example
    cluster1::*> system  switch ethernet log enable-collection
    
    Do you want to enable cluster log collection for all nodes in the cluster?
    {y|n}: [n] y
    
    Enabling cluster switch log collection.
    
    cluster1::*>
    Note If any of these commands return an error, contact NetApp support.
  13. Test the switch log collection feature:

    system switch ethernet log collect -device *

    Wait for 10 minutes and then check that the log collection was successful using the command: system switch ethernet log show

    Show example
    cluster1::*> system switch ethernet log show
    Log Collection Enabled: true
    
    Index  Switch                       Log Timestamp        Status
    ------ ---------------------------- -------------------  ---------    
    1      sw1 (b8:ce:f6:19:1b:42)      4/29/2022 03:05:25   complete   
    2      nsw2 (b8:ce:f6:19:1b:96)     4/29/2022 03:07:42   complete
  14. Change the privilege level back to admin:

    set -privilege admin

  15. If you suppressed automatic case creation, re-enable it by invoking an AutoSupport message:

    system node autosupport invoke -node * -type all -message MAINT=END