Replace a NVIDIA SN2100 cluster switch

04/21/2026 Contributors

Follow this procedure to replace a defective NVIDIA SN2100 switch in a cluster network. This is a nondisruptive procedure (NDU).

Review requirements

Existing cluster and network infrastructure

Ensure that:

The existing cluster are verified as completely functional, with at least one fully connected cluster switch.
All cluster ports are up.
All cluster logical interfaces (LIFs) are up and on their home ports.
The ONTAP cluster ping-cluster -node node1 command indicates that basic connectivity and larger than PMTU communication are successful on all paths.

NVIDIA SN2100 replacement switch

Ensure that:

Management network connectivity on the replacement switch are functional.
Console access to the replacement switch are in place.
The node connections are ports swp1 through swp14.
All Inter-Switch Link (ISL) ports are disabled on ports swp15 and swp16.
The desired reference configuration file (RCF) and Cumulus operating system image switch are loaded onto the switch.
Initial customization of the switch is complete.

Also make sure that any previous site customizations, such as STP, SNMP, and SSH, are copied to the new switch.

You must execute the command for migrating a cluster LIF from the node where the cluster LIF is hosted.

Enable console logging

NetApp strongly recommends that you enable console logging on the devices that you are using and take the following actions when replacing your switch:

Leave AutoSupport enabled during maintenance.
Trigger a maintenance AutoSupport before and after maintenance to disable case creation for the duration of the maintenance. Refer to this Knowledge Base article SU92: How to suppress automatic case creation during scheduled maintenance windows for further details.
Enable session logging for any CLI sessions. For instructions on how to enable session logging, review the "Logging Session Output" section in this Knowledge Base article How to configure PuTTY for optimal connectivity to ONTAP systems.

Replace the switch

About the examples

The examples in this procedure use the following switch and node nomenclature:

The names of the existing NVIDIA SN2100 switches are sw1 and sw2.
The name of the new NVIDIA SN2100 switch is nsw2.
The node names are node1 and node2.
The cluster ports on each node are named e3a and e3b.
The cluster LIF names are node1_clus1 and node1_clus2 for node1, and node2_clus1 and node2_clus2 for node2.
The prompt for changes to all cluster nodes is cluster1::*>
Breakout ports take the format: swp[port]s[breakout port 0-3]. For example, four breakout ports on swp1 are swp1s0, swp1s1, swp1s2, and swp1s3.

About the cluster network topology

This procedure is based on the following cluster network topology:

Show example topology

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false

Node: node2
                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false


cluster1::*> network interface show -vserver Cluster

            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
            node1_clus1  up/up    169.254.209.69/16  node1         e3a     true
            node1_clus2  up/up    169.254.49.125/16  node1         e3b     true
            node2_clus1  up/up    169.254.47.194/16  node2         e3a     true
            node2_clus2  up/up    169.254.19.183/16  node2         e3b     true


cluster1::*> network device-discovery show -protocol lldp
Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface     Platform
----------- ------ ------------------------- ------------  ----------------
node1      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3          -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp3          -
node2      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4          -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp4          -

cumulus@sw1:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    sw2                e3a
swp4       100G   Trunk/L2    sw2                e3a
swp15      100G   BondMember  sw2                swp15
swp16      100G   BondMember  sw2                swp16


cumulus@sw2:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    sw1                e3b
swp4       100G   Trunk/L2    sw1                e3b
swp15      100G   BondMember  sw1                swp15
swp16      100G   BondMember  sw1                swp16

Step 1: Prepare for replacement

If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message:

system node autosupport invoke -node * -type all -message MAINT=xh

where x is the duration of the maintenance window in hours.
Change the privilege level to advanced, entering y when prompted to continue:

set -privilege advanced

The advanced prompt (*>) appears.
Install the appropriate RCF and image on the switch, nsw2, and make any necessary site preparations.

If necessary, verify, download, and install the appropriate versions of the RCF and Cumulus software for the new switch.
1. You can download the applicable Cumulus software for your cluster switches from the NVIDIA Support site. Follow the steps on the Download page to download the Cumulus Linux for the version of ONTAP software you are installing.
2. The appropriate RCF is available from the NVIDIA Cluster and Storage Switches page. Follow the steps on the Download page to download the correct RCF for the version of ONTAP software you are installing.

Step 2: Configure ports and cabling

Cumulus Linux 4.4.3

On the new switch nsw2, log in as admin and shut down all of the ports that will be connected to the node cluster interfaces (ports swp1 to swp14).

The LIFs on the cluster nodes should have already failed over to the other cluster port for each node.
```
cumulus@nsw2:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down
cumulus@nsw2:~$ net pending
cumulus@nsw2:~$ net commit
```

Disable auto-revert on the cluster LIFs:

network interface modify -vserver Cluster -lif * -auto-revert false

cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert false

Warning: Disabling the auto-revert feature of the cluster logical interface may effect the availability of your cluster network. Are you sure you want to continue? {y|n}: y

Verify that all cluster LIFs have auto-revert disabled:

net interface show -vserver Cluster -fields auto-revert

Shut down the ISL ports swp15 and swp16 on the SN2100 switch sw1.

cumulus@sw1:~$ net add interface swp15-16 link down
cumulus@sw1:~$ net pending
cumulus@sw1:~$ net commit

Remove all the cables from the SN2100 sw1 switch, and then connect them to the same ports on the SN2100 nsw2 switch.

Bring up the ISL ports swp15 and swp16 between the sw1 and nsw2 switches.

The following commands enable ISL ports swp15 and swp16 on switch sw1:

cumulus@sw1:~$ net del interface swp15-16 link down
cumulus@sw1:~$ net pending
cumulus@sw1:~$ net commit

The following example shows that the ISL ports are up on switch sw1:

cumulus@sw1:~$ net show interface

State  Name         Spd   MTU    Mode        LLDP           Summary
-----  -----------  ----  -----  ----------  -------------- ----------------------
...
...
UP     swp15        100G  9216   BondMember  nsw2 (swp15)   Master: cluster_isl(UP)
UP     swp16        100G  9216   BondMember  nsw2 (swp16)   Master: cluster_isl(UP)

The following example shows that the ISL ports are up on switch nsw2:

cumulus@nsw2:~$ net show interface

State  Name         Spd   MTU    Mode        LLDP           Summary
-----  -----------  ----  -----  ----------  -------------  -----------------------
...
...
UP     swp15        100G  9216   BondMember  sw1 (swp15)    Master: cluster_isl(UP)
UP     swp16        100G  9216   BondMember  sw1 (swp16)    Master: cluster_isl(UP)

Verify that port e3b is up on all nodes:

network port show -ipspace Cluster

The output should be similar to the following:

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                         Ignore
                                                   Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ----- ------------ -------- -------
e3a       Cluster      Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000  healthy  false


Node: node2
                                                                         Ignore
                                                   Speed(Mbps) Health    Health
Port      IPspace      Broadcast Domain Link MTU   Admin/Oper  Status    Status
--------- ------------ ---------------- ---- ----- ----------- --------- -------
e3a       Cluster      Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000  healthy  false

The cluster ports on each node are now connected to cluster switches in the following way, from the nodes' perspective:

cluster1::*> network device-discovery show -protocol lldp
Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface     Platform
----------- ------ ------------------------- ------------  ----------------
node1      /lldp
            e3a    sw1  (b8:ce:f6:19:1a:7e)   swp3          -
            e3b    nsw2 (b8:ce:f6:19:1b:b6)   swp3          -
node2      /lldp
            e3a    sw1  (b8:ce:f6:19:1a:7e)   swp4          -
            e3b    nsw2 (b8:ce:f6:19:1b:b6)   swp4          -

Verify that all node cluster ports are up:

net show interface

cumulus@nsw2:~$ net show interface

State  Name         Spd   MTU    Mode        LLDP              Summary
-----  -----------  ----  -----  ----------  ----------------- ----------------------
...
...
UP     swp3         100G  9216   Trunk/L2                      Master: bridge(UP)
UP     swp4         100G  9216   Trunk/L2                      Master: bridge(UP)
UP     swp15        100G  9216   BondMember  sw1 (swp15)       Master: cluster_isl(UP)
UP     swp16        100G  9216   BondMember  sw1 (swp16)       Master: cluster_isl(UP)

Verify that both nodes each have one connection to each switch:

net show lldp

The following example shows the appropriate results for both switches:

cumulus@sw1:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    node1              e3a
swp4       100G   Trunk/L2    node2              e3a
swp15      100G   BondMember  nsw2               swp15
swp16      100G   BondMember  nsw2               swp16


cumulus@nsw2:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    node1                e3b
swp4       100G   Trunk/L2    node2                e3b
swp15      100G   BondMember  sw1                swp15
swp16      100G   BondMember  sw1                swp16

Enable auto-revert on the cluster LIFs:

cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert true

On switch nsw2, bring up the ports connected to the network ports of the nodes.

cumulus@nsw2:~$ net del interface swp1-14 link down
cumulus@nsw2:~$ net pending
cumulus@nsw2:~$ net commit

Display information about the nodes in a cluster:

cluster show

This example shows that the node health for node1 and node2 in this cluster is true:

cluster1::*> cluster show

Node          Health  Eligibility
------------- ------- ------------
node1         true    true
node2         true    true

Verify that all physical cluster ports are up:

network port show ipspace Cluster

cluster1::*> network port show -ipspace Cluster

Node node1                                                               Ignore
                                                    Speed(Mbps) Health   Health
Port      IPspace     Broadcast Domain  Link  MTU   Admin/Oper  Status   Status
--------- ----------- ----------------- ----- ----- ----------- -------- ------
e3a       Cluster     Cluster           up    9000  auto/10000  healthy  false
e3b       Cluster     Cluster           up    9000  auto/10000  healthy  false

Node: node2
                                                                         Ignore
                                                    Speed(Mbps) Health   Health
Port      IPspace      Broadcast Domain Link  MTU   Admin/Oper  Status   Status
--------- ------------ ---------------- ----- ----- ----------- -------- ------
e3a       Cluster      Cluster          up    9000  auto/10000  healthy  false
e3b       Cluster      Cluster          up    9000  auto/10000  healthy  false

Cumulus Linux 5.x

On the new switch nsw2, log in as admin and shut down all of the ports that will be connected to the node cluster interfaces (ports swp1 to swp14).

The LIFs on the cluster nodes should have already failed over to the other cluster port for each node.
```
cumulus@nsw2:~$ nv set interface swp15-16 link state down
cumulus@nsw2:~$ nv config apply
```