Skip to main content
Cluster and storage switches

Replace a NVIDIA SN2100 cluster switch

Contributors netapp-jolieg

Follow this procedure to replace a defective NVIDIA SN2100 switch in a cluster network. This is a nondisruptive procedure (NDU).

Review requirements

Existing cluster and network infrastructure

Ensure that:

  • The existing cluster are verified as completely functional, with at least one fully connected cluster switch.

  • All cluster ports are up.

  • All cluster logical interfaces (LIFs) are up and on their home ports.

  • The ONTAP cluster ping-cluster -node node1 command indicates that basic connectivity and larger than PMTU communication are successful on all paths.

NVIDIA SN2100 replacement switch

Ensure that:

  • Management network connectivity on the replacement switch are functional.

  • Console access to the replacement switch are in place.

  • The node connections are ports swp1 through swp14.

  • All Inter-Switch Link (ISL) ports are disabled on ports swp15 and swp16.

  • The desired reference configuration file (RCF) and Cumulus operating system image switch are loaded onto the switch.

  • Initial customization of the switch is complete.

Also make sure that any previous site customizations, such as STP, SNMP, and SSH, are copied to the new switch.

Note You must execute the command for migrating a cluster LIF from the node where the cluster LIF is hosted.

Replace the switch

About the examples

The examples in this procedure use the following switch and node nomenclature:

  • The names of the existing NVIDIA SN2100 switches are sw1 and sw2.

  • The name of the new NVIDIA SN2100 switch is nsw2.

  • The node names are node1 and node2.

  • The cluster ports on each node are named e3a and e3b.

  • The cluster LIF names are node1_clus1 and node1_clus2 for node1, and node2_clus1 and node2_clus2 for node2.

  • The prompt for changes to all cluster nodes is cluster1::*>

  • Breakout ports take the format: swp[port]s[breakout port 0-3]. For example, four breakout ports on swp1 are swp1s0, swp1s1, swp1s2, and swp1s3.

About the cluster network topology

This procedure is based on the following cluster network topology:

Show example topology
cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false

Node: node2
                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false


cluster1::*> network interface show -vserver Cluster

            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
            node1_clus1  up/up    169.254.209.69/16  node1         e3a     true
            node1_clus2  up/up    169.254.49.125/16  node1         e3b     true
            node2_clus1  up/up    169.254.47.194/16  node2         e3a     true
            node2_clus2  up/up    169.254.19.183/16  node2         e3b     true


cluster1::*> network device-discovery show -protocol lldp
Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface     Platform
----------- ------ ------------------------- ------------  ----------------
node1      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3          -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp3          -
node2      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4          -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp4          -

+

cumulus@sw1:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    sw2                e3a
swp4       100G   Trunk/L2    sw2                e3a
swp15      100G   BondMember  sw2                swp15
swp16      100G   BondMember  sw2                swp16


cumulus@sw2:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    sw1                e3b
swp4       100G   Trunk/L2    sw1                e3b
swp15      100G   BondMember  sw1                swp15
swp16      100G   BondMember  sw1                swp16

Step 1: Prepare for replacement

  1. If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message:

    system node autosupport invoke -node * -type all -message MAINT=xh

    where x is the duration of the maintenance window in hours.

  2. Change the privilege level to advanced, entering y when prompted to continue:

    set -privilege advanced

    The advanced prompt (*>) appears.

  3. Install the appropriate RCF and image on the switch, nsw2, and make any necessary site preparations.

    If necessary, verify, download, and install the appropriate versions of the RCF and Cumulus software for the new switch.

    1. You can download the applicable Cumulus software for your cluster switches from the NVIDIA Support site. Follow the steps on the Download page to download the Cumulus Linux for the version of ONTAP software you are installing.

    2. The appropriate RCF is available from the NVIDIA Cluster and Storage Switches page. Follow the steps on the Download page to download the correct RCF for the version of ONTAP software you are installing.

Step 2: Configure ports and cabling

  1. On the new switch nsw2, log in as admin and shut down all of the ports that will be connected to the node cluster interfaces (ports swp1 to swp14).

    The LIFs on the cluster nodes should have already failed over to the other cluster port for each node.

    Show example
    cumulus@nsw2:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down
    cumulus@nsw2:~$ net pending
    cumulus@nsw2:~$ net commit
  2. Disable auto-revert on the cluster LIFs:

    network interface modify -vserver Cluster -lif * -auto-revert false

    Show example
    cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert false
    
    Warning: Disabling the auto-revert feature of the cluster logical interface may effect the availability of your cluster network. Are you sure you want to continue? {y|n}: y
  3. Verify that all cluster LIFs have auto-revert enabled:

    net interface show -vserver Cluster -fields auto-revert

  4. Shut down the ISL ports swp15 and swp16 on the SN2100 switch sw1.

    Show example
    cumulus@sw1:~$ net add interface swp15-16 link down
    cumulus@sw1:~$ net pending
    cumulus@sw1:~$ net commit
  5. Remove all the cables from the SN2100 sw1 switch, and then connect them to the same ports on the SN2100 nsw2 switch.

  6. Bring up the ISL ports swp15 and swp16 between the sw1 and nsw2 switches.

    Show example

    The following commands enable ISL ports swp15 and swp16 on switch sw1:

    cumulus@sw1:~$ net del interface swp15-16 link down
    cumulus@sw1:~$ net pending
    cumulus@sw1:~$ net commit

    The following example shows that the ISL ports are up on switch sw1:

    cumulus@sw1:~$ net show interface
    
    State  Name         Spd   MTU    Mode        LLDP           Summary
    -----  -----------  ----  -----  ----------  -------------- ----------------------
    ...
    ...
    UP     swp15        100G  9216   BondMember  nsw2 (swp15)   Master: cluster_isl(UP)
    UP     swp16        100G  9216   BondMember  nsw2 (swp16)   Master: cluster_isl(UP)

    + The following example shows that the ISL ports are up on switch nsw2:

    +

    cumulus@nsw2:~$ net show interface
    
    State  Name         Spd   MTU    Mode        LLDP           Summary
    -----  -----------  ----  -----  ----------  -------------  -----------------------
    ...
    ...
    UP     swp15        100G  9216   BondMember  sw1 (swp15)    Master: cluster_isl(UP)
    UP     swp16        100G  9216   BondMember  sw1 (swp16)    Master: cluster_isl(UP)
  7. Verify that port e3b is up on all nodes:

    network port show -ipspace Cluster

    Show example

    The output should be similar to the following:

    cluster1::*> network port show -ipspace Cluster
    
    Node: node1
                                                                             Ignore
                                                       Speed(Mbps)  Health   Health
    Port      IPspace      Broadcast Domain Link MTU   Admin/Oper   Status   Status
    --------- ------------ ---------------- ---- ----- ------------ -------- -------
    e3a       Cluster      Cluster          up   9000  auto/100000  healthy  false
    e3b       Cluster      Cluster          up   9000  auto/100000  healthy  false
    
    
    Node: node2
                                                                             Ignore
                                                       Speed(Mbps) Health    Health
    Port      IPspace      Broadcast Domain Link MTU   Admin/Oper  Status    Status
    --------- ------------ ---------------- ---- ----- ----------- --------- -------
    e3a       Cluster      Cluster          up   9000  auto/100000  healthy  false
    e3b       Cluster      Cluster          up   9000  auto/100000  healthy  false
  8. The cluster ports on each node are now connected to cluster switches in the following way, from the nodes' perspective:

    Show example
    cluster1::*> network device-discovery show -protocol lldp
    Node/       Local  Discovered
    Protocol    Port   Device (LLDP: ChassisID)  Interface     Platform
    ----------- ------ ------------------------- ------------  ----------------
    node1      /lldp
                e3a    sw1  (b8:ce:f6:19:1a:7e)   swp3          -
                e3b    nsw2 (b8:ce:f6:19:1b:b6)   swp3          -
    node2      /lldp
                e3a    sw1  (b8:ce:f6:19:1a:7e)   swp4          -
                e3b    nsw2 (b8:ce:f6:19:1b:b6)   swp4          -
  9. Verify that all node cluster ports are up:

    net show interface

    Show example
    cumulus@nsw2:~$ net show interface
    
    State  Name         Spd   MTU    Mode        LLDP              Summary
    -----  -----------  ----  -----  ----------  ----------------- ----------------------
    ...
    ...
    UP     swp3         100G  9216   Trunk/L2                      Master: bridge(UP)
    UP     swp4         100G  9216   Trunk/L2                      Master: bridge(UP)
    UP     swp15        100G  9216   BondMember  sw1 (swp15)       Master: cluster_isl(UP)
    UP     swp16        100G  9216   BondMember  sw1 (swp16)       Master: cluster_isl(UP)
  10. Verify that both nodes each have one connection to each switch:

    net show lldp

    Show example

    The following example shows the appropriate results for both switches:

    cumulus@sw1:~$ net show lldp
    
    LocalPort  Speed  Mode        RemoteHost         RemotePort
    ---------  -----  ----------  -----------------  -----------
    swp3       100G   Trunk/L2    node1              e3a
    swp4       100G   Trunk/L2    node2              e3a
    swp15      100G   BondMember  nsw2               swp15
    swp16      100G   BondMember  nsw2               swp16
    
    
    cumulus@nsw2:~$ net show lldp
    
    LocalPort  Speed  Mode        RemoteHost         RemotePort
    ---------  -----  ----------  -----------------  -----------
    swp3       100G   Trunk/L2    node1                e3b
    swp4       100G   Trunk/L2    node2                e3b
    swp15      100G   BondMember  sw1                swp15
    swp16      100G   BondMember  sw1                swp16
  11. Enable auto-revert on the cluster LIFs:

    cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert true

  12. On switch nsw2, bring up the ports connected to the network ports of the nodes.

    Show example
    cumulus@nsw2:~$ net del interface swp1-14 link down
    cumulus@nsw2:~$ net pending
    cumulus@nsw2:~$ net commit
  13. Display information about the nodes in a cluster:

    cluster show

    Show example

    This example shows that the node health for node1 and node2 in this cluster is true:

    cluster1::*> cluster show
    
    Node          Health  Eligibility
    ------------- ------- ------------
    node1         true    true
    node2         true    true
  14. Verify that all physical cluster ports are up:

    network port show ipspace Cluster

    Show example
    cluster1::*> network port show -ipspace Cluster
    
    Node node1                                                               Ignore
                                                        Speed(Mbps) Health   Health
    Port      IPspace     Broadcast Domain  Link  MTU   Admin/Oper  Status   Status
    --------- ----------- ----------------- ----- ----- ----------- -------- ------
    e3a       Cluster     Cluster           up    9000  auto/10000  healthy  false
    e3b       Cluster     Cluster           up    9000  auto/10000  healthy  false
    
    Node: node2
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
    Port      IPspace      Broadcast Domain Link  MTU   Admin/Oper  Status   Status
    --------- ------------ ---------------- ----- ----- ----------- -------- ------
    e3a       Cluster      Cluster          up    9000  auto/10000  healthy  false
    e3b       Cluster      Cluster          up    9000  auto/10000  healthy  false

Step 3: Complete the procedure

  1. Verify that the cluster network is healthy.

    Show example
    cumulus@sw1:~$ net show lldp
    
    LocalPort  Speed  Mode        RemoteHost      RemotePort
    ---------  -----  ----------  --------------  -----------
    swp3       100G   Trunk/L2    node1           e3a
    swp4       100G   Trunk/L2    node2           e3a
    swp15      100G   BondMember  nsw2            swp15
    swp16      100G   BondMember  nsw2            swp16
  2. Create a password for the Ethernet switch health monitor log collection feature:

    system switch ethernet log setup-password

    Show example
    cluster1::*> system switch ethernet log setup-password
    Enter the switch name: <return>
    The switch name entered is not recognized.
    Choose from the following list:
    cs1
    cs2
    
    cluster1::*> system switch ethernet log setup-password
    
    Enter the switch name: cs1
    Would you like to specify a user other than admin for log collection? {y|n}: n
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> system switch ethernet log setup-password
    
    Enter the switch name: cs2
    Would you like to specify a user other than admin for log collection? {y|n}: n
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
  3. Enable the Ethernet switch health monitor log collection feature.

    system switch ethernet log modify -device <switch-name> -log-request true

    Show example
    cluster1::*> system switch ethernet log modify -device cs1 -log-request true
    
    Do you want to modify the cluster switch log collection configuration? {y|n}: [n] y
    
    Enabling cluster switch log collection.
    
    cluster1::*> system switch ethernet log modify -device cs2 -log-request true
    
    Do you want to modify the cluster switch log collection configuration? {y|n}: [n] y
    
    Enabling cluster switch log collection.

    Wait for 10 minutes and then check that the log collection completes:

    system switch ethernet log show

    Show example
    cluster1::*> system switch ethernet log show
    Log Collection Enabled: true
    
    Index  Switch                       Log Timestamp        Status
    ------ ---------------------------- -------------------  ---------    
    1      cs1 (b8:ce:f6:19:1b:42)      4/29/2022 03:05:25   complete   
    2      cs2 (b8:ce:f6:19:1b:96)      4/29/2022 03:07:42   complete
    Caution If any of these commands return an error or if the log collection does not complete, contact NetApp support.
  4. Change the privilege level back to admin:

    set -privilege admin

  5. If you suppressed automatic case creation, re-enable it by invoking an AutoSupport message:

    system node autosupport invoke -node * -type all -message MAINT=END