Migrate from a Cisco cluster switch to a NVIDIA SN2100 cluster switch

10/24/2024 Contributors

You can migrate Cisco cluster switches for an ONTAP cluster to NVIDIA SN2100 cluster switches. This is a nondisruptive procedure.

Review requirements

You must be aware of certain configuration information, port connections and cabling requirements when you are replacing some older Cisco cluster switches with NVIDIA SN2100 cluster switches. See Overview of installation and configuration for NVIDIA SN2100 switches.

Supported switches

The following Cisco cluster switches are supported:

Nexus 9336C-FX2
Nexus 92300YC
Nexus 5596UP
Nexus 3232C
Nexus 3132Q-V

For details of supported ports and their configurations, see the Hardware Universe .

What you'll need

Ensure that:

The existing cluster is properly set up and functioning.
All cluster ports are in the up state to ensure nondisruptive operations.
The NVIDIA SN2100 cluster switches are configured and operating under the proper version of Cumulus Linux installed with the reference configuration file (RCF) applied.
The existing cluster network configuration have the following:
- A redundant and fully functional NetApp cluster using both older Cisco switches.
- Management connectivity and console access to both the older Cisco switches and the new switches.
- All cluster LIFs in the up state with the cluster LIfs are on their home ports.
- ISL ports enabled and cabled between the older Cisco switches and between the new switches.
Some of the ports are configured on NVIDIA SN2100 switches to run at 40 GbE or 100 GbE.
You have planned, migrated, and documented 40 GbE and 100 GbE connectivity from nodes to NVIDIA SN2100 cluster switches.

If you are changing the port speed of the e0a and e1a cluster ports on AFF A800 or AFF C800 systems, you might observe malformed packets being received after the speed conversion. See Bug 1570339 and the Knowledge Base article CRC errors on T6 ports after converting from 40GbE to 100GbE for guidance.

Migrate the switches

About the examples

In this procedure, Cisco Nexus 3232C cluster switches are used for example commands and outputs.

The examples in this procedure use the following switch and node nomenclature:

The existing Cisco Nexus 3232C cluster switches are c1 and c2.
The new NVIDIA SN2100 cluster switches are sw1 and sw2.
The nodes are node1 and node2.
The cluster LIFs are node1_clus1 and node1_clus2 on node 1, and node2_clus1 and node2_clus2 on node 2 respectively.
The cluster1::*> prompt indicates the name of the cluster.
The cluster ports used in this procedure are e3a and e3b.
Breakout ports take the format: swp[port]s[breakout port 0-3]. For example, four breakout ports on swp1 are swp1s0, swp1s1, swp1s2, and swp1s3.

About this task

This procedure covers the following scenario:

Switch c2 is replaced by switch sw2 first.
- Shut down the ports to the cluster nodes. All ports must be shut down simultaneously to avoid cluster instability.
- Cabling between the nodes and c2 are then disconnected from c2 and reconnected to sw2.
Switch c1 is replaced by switch sw1.
- Shut down the ports to the cluster nodes. All ports must be shut down simultaneously to avoid cluster instability.
- Cabling between the nodes and c1 are then disconnected from c1 and reconnected to sw1.

Step 1: Prepare for migration

If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message:

system node autosupport invoke -node * -type all -message MAINT=xh

where x is the duration of the maintenance window in hours.
Change the privilege level to advanced, entering y when prompted to continue:

set -privilege advanced

The advanced prompt (*>) appears.
Disable auto-revert on the cluster LIFs:

network interface modify -vserver Cluster -lif * -auto-revert false

Step 2: Configure ports and cabling

Determine the administrative or operational status for each cluster interface.

Each port should display up for Link and healthy for Health Status.

Display the network port attributes:

network port show -ipspace Cluster

Show example

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                       Ignore
                                                 Speed(Mbps)  Health   Health
Port      IPspace    Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ---------- ---------------- ---- ----- ------------ -------- ------
e3a       Cluster    Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster    Cluster          up   9000  auto/100000  healthy  false

Node: node2
                                                                       Ignore
                                                 Speed(Mbps)  Health   Health
Port      IPspace    Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ---------- ---------------- ---- ----- ------------ -------- ------
e3a       Cluster    Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster    Cluster          up   9000  auto/100000  healthy  false

Display information about the logical interfaces and their designated home nodes:

network interface show -vserver Cluster

Each LIF should display up/up for Status Admin/Oper and true for Is Home.

Show example

cluster1::*> network interface show -vserver Cluster

            Logical      Status     Network            Current     Current Is
Vserver     Interface    Admin/Oper Address/Mask       Node        Port    Home
----------- -----------  ---------- ------------------ ----------- ------- ----
Cluster
            node1_clus1  up/up      169.254.209.69/16  node1       e3a     true
            node1_clus2  up/up      169.254.49.125/16  node1       e3b     true
            node2_clus1  up/up      169.254.47.194/16  node2       e3a     true
            node2_clus2  up/up      169.254.19.183/16  node2       e3b     true

The cluster ports on each node are connected to existing cluster switches in the following way (from the nodes' perspective):

network device-discovery show -protocol lldp

Show example

cluster1::*> network device-discovery show -protocol lldp
Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface         Platform
----------- ------ ------------------------- ----------------  ----------------
node1      /lldp
            e3a    c1 (6a:ad:4f:98:3b:3f)    Eth1/1            -
            e3b    c2 (6a:ad:4f:98:4c:a4)    Eth1/1            -
node2      /lldp
            e3a    c1 (6a:ad:4f:98:3b:3f)    Eth1/2            -
            e3b    c2 (6a:ad:4f:98:4c:a4)    Eth1/2            -

The cluster ports and switches are connected in the following way (from the switches' perspective):

show cdp neighbors

Show example

c1# show cdp neighbors

Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater,
                  V - VoIP-Phone, D - Remotely-Managed-Device,
                  s - Supports-STP-Dispute

Device-ID             Local Intrfce Hldtme Capability  Platform         Port ID
node1                 Eth1/1         124   H           AFF-A400         e3a
node2                 Eth1/2         124   H           AFF-A400         e3a
c2                    Eth1/31        179   S I s       N3K-C3232C       Eth1/31
c2                    Eth1/32        175   S I s       N3K-C3232C       Eth1/32

c2# show cdp neighbors

Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater,
                  V - VoIP-Phone, D - Remotely-Managed-Device,
                  s - Supports-STP-Dispute


Device-ID             Local Intrfce Hldtme Capability  Platform         Port ID
node1                 Eth1/1        124    H           AFF-A400         e3b
node2                 Eth1/2        124    H           AFF-A400         e3b
c1                    Eth1/31       175    S I s       N3K-C3232C       Eth1/31
c1                    Eth1/32       175    S I s       N3K-C3232C       Eth1/32

Verify the connectivity of the remote cluster interfaces:

ONTAP 9.9.1 and later

You can use the network interface check cluster-connectivity command to start an accessibility check for cluster connectivity and then display the details:

network interface check cluster-connectivity start and network interface check cluster-connectivity show

cluster1::*> network interface check cluster-connectivity start

NOTE: Wait for a number of seconds before running the show command to display the details.

cluster1::*> network interface check cluster-connectivity show
                                  Source           Destination      Packet
Node   Date                       LIF              LIF              Loss
------ -------------------------- ---------------- ---------------- -----------
node1
       3/5/2022 19:21:18 -06:00   node1_clus2      node2-clus1      none
       3/5/2022 19:21:20 -06:00   node1_clus2      node2_clus2      none
node2
       3/5/2022 19:21:18 -06:00   node2_clus2      node1_clus1      none
       3/5/2022 19:21:20 -06:00   node2_clus2      node1_clus2      none

All ONTAP releases

For all ONTAP releases, you can also use the cluster ping-cluster -node <name> command to check the connectivity:

cluster ping-cluster -node <name>

cluster1::*> cluster ping-cluster -node local
Host is node2
Getting addresses from network interface table...
Cluster node1_clus1 169.254.209.69 node1     e3a
Cluster node1_clus2 169.254.49.125 node1     e3b
Cluster node2_clus1 169.254.47.194 node2     e3a
Cluster node2_clus2 169.254.19.183 node2     e3b
Local = 169.254.47.194 169.254.19.183
Remote = 169.254.209.69 169.254.49.125
Cluster Vserver Id = 4294967293
Ping status:
....
Basic connectivity succeeds on 4 path(s)
Basic connectivity fails on 0 path(s)
................
Detected 9000 byte MTU on 4 path(s):
    Local 169.254.19.183 to Remote 169.254.209.69
    Local 169.254.19.183 to Remote 169.254.49.125
    Local 169.254.47.194 to Remote 169.254.209.69
    Local 169.254.47.194 to Remote 169.254.49.125
Larger than PMTU communication succeeds on 4 path(s)
RPC status:
2 paths up, 0 paths down (tcp check)
2 paths up, 0 paths down (udp check)

On switch c2, shut down the ports connected to the cluster ports of the nodes in order to fail over the cluster LIFs.

(c2)# configure
Enter configuration commands, one per line. End with CNTL/Z.

(c2)(Config)# interface
(c2)(config-if-range)# shutdown <interface_list>
(c2)(config-if-range)# exit
(c2)(Config)# exit
(c2)#

Move the node cluster ports from the old switch c2 to the new switch sw2, using appropriate cabling supported by NVIDIA SN2100.

Display the network port attributes:

network port show -ipspace Cluster

Show example

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                       Ignore
                                                 Speed(Mbps)  Health   Health
Port      IPspace    Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ---------- ---------------- ---- ----- ------------ -------- ------
e3a       Cluster    Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster    Cluster          up   9000  auto/100000  healthy  false

Node: node2
                                                                       Ignore
                                                 Speed(Mbps)  Health   Health
Port      IPspace    Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ---------- ---------------- ---- ----- ------------ -------- ------
e3a       Cluster    Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster    Cluster          up   9000  auto/100000  healthy  false

The cluster ports on each node are now connected to cluster switches in the following way, from the nodes' perspective:

Show example

cluster1::*> network device-discovery show -protocol lldp

Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface         Platform
----------- ------ ------------------------- ----------------  ----------------
node1      /lldp
            e3a    c1  (6a:ad:4f:98:3b:3f)   Eth1/1            -
            e3b    sw2 (b8:ce:f6:19:1a:7e)   swp3              -
node2      /lldp
            e3a    c1  (6a:ad:4f:98:3b:3f)   Eth1/2            -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp4              -

On switch sw2, verify that all node cluster ports are up:

net show interface

Show example

cumulus@sw2:~$ net show interface

State  Name         Spd   MTU    Mode        LLDP              Summary
-----  -----------  ----  -----  ----------  ----------------- ----------------------
...
...
UP     swp3         100G  9216   Trunk/L2    e3b               Master: bridge(UP)
UP     swp4         100G  9216   Trunk/L2    e3b               Master: bridge(UP)
UP     swp15        100G  9216   BondMember  sw1 (swp15)       Master: cluster_isl(UP)
UP     swp16        100G  9216   BondMember  sw1 (swp16)       Master: cluster_isl(UP)

On switch c1, shut down the ports connected to the cluster ports of the nodes in order to fail over the cluster LIFs.

(c1)# configure
Enter configuration commands, one per line. End with CNTL/Z.

(c1)(Config)# interface
(c1)(config-if-range)# shutdown <interface_list>
(c1)(config-if-range)# exit
(c1)(Config)# exit
(c1)#

Move the node cluster ports from the old switch c1 to the new switch sw1, using appropriate cabling supported by NVIDIA SN2100.

Verify the final configuration of the cluster:

network port show -ipspace Cluster

Each port should display up for Link and healthy for Health Status.

Show example

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                       Ignore
                                                 Speed(Mbps)  Health   Health
Port      IPspace    Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ---------- ---------------- ---- ----- ------------ -------- ------
e3a       Cluster    Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster    Cluster          up   9000  auto/100000  healthy  false

Node: node2
                                                                       Ignore
                                                 Speed(Mbps)  Health   Health
Port      IPspace    Broadcast Domain Link MTU   Admin/Oper   Status   Status
--------- ---------- ---------------- ---- ----- ------------ -------- ------
e3a       Cluster    Cluster          up   9000  auto/100000  healthy  false
e3b       Cluster    Cluster          up   9000  auto/100000  healthy  false

The cluster ports on each node are now connected to cluster switches in the following way, from the nodes' perspective:

Show example

cluster1::*> network device-discovery show -protocol lldp

Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface       Platform
----------- ------ ------------------------- --------------  ----------------
node1      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3            -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp3            -
node2      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4            -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp4            -

On switches sw1 and sw2, verify that all node cluster ports are up:

net show interface

Show example

cumulus@sw1:~$ net show interface

State  Name         Spd   MTU    Mode        LLDP              Summary
-----  -----------  ----  -----  ----------  ----------------- ----------------------
...
...
UP     swp3         100G  9216   Trunk/L2    e3a               Master: bridge(UP)
UP     swp4         100G  9216   Trunk/L2    e3a               Master: bridge(UP)
UP     swp15        100G  9216   BondMember  sw2 (swp15)       Master: cluster_isl(UP)
UP     swp16        100G  9216   BondMember  sw2 (swp16)       Master: cluster_isl(UP)


cumulus@sw2:~$ net show interface

State  Name         Spd   MTU    Mode        LLDP              Summary
-----  -----------  ----  -----  ----------  ----------------- -----------------------
...
...
UP     swp3         100G  9216   Trunk/L2    e3b               Master: bridge(UP)
UP     swp4         100G  9216   Trunk/L2    e3b               Master: bridge(UP)
UP     swp15        100G  9216   BondMember  sw1 (swp15)       Master: cluster_isl(UP)
UP     swp16        100G  9216   BondMember  sw1 (swp16)       Master: cluster_isl(UP)

Verify that both nodes each have one connection to each switch:

net show lldp

Show example

The following example shows the appropriate results for both switches:

cumulus@sw1:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost          RemotePort
---------  -----  ----------  ------------------  -----------
swp3       100G   Trunk/L2    node1               e3a
swp4       100G   Trunk/L2    node2               e3a
swp15      100G   BondMember  sw2                 swp15
swp16      100G   BondMember  sw2                 swp16

cumulus@sw2:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost          RemotePort
---------  -----  ----------  ------------------  -----------
swp3       100G   Trunk/L2    node1               e3b
swp4       100G   Trunk/L2    node2               e3b
swp15      100G   BondMember  sw1                 swp15
swp16      100G   BondMember  sw1                 swp16

Step 3: Verify the configuration

Enable auto-revert on the cluster LIFs:

cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert true

Verify that all cluster network LIFs are back on their home ports:

network interface show

Show example

cluster1::*> network interface show -vserver Cluster

            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
            node1_clus1  up/up    169.254.209.69/16  node1         e3a     true
            node1_clus2  up/up    169.254.49.125/16  node1         e3b     true
            node2_clus1  up/up    169.254.47.194/16  node2         e3a     true
            node2_clus2  up/up    169.254.19.183/16  node2         e3b     true

Change the privilege level back to admin:

set -privilege admin
If you suppressed automatic case creation, re-enable it by invoking an AutoSupport message:

system node autosupport invoke -node * -type all -message MAINT=END

What's next?

Configure switch health monitoring.

Migrate from a Cisco cluster switch to a NVIDIA SN2100 cluster switch

Creating your file...

Review requirements

Migrate the switches

Step 1: Prepare for migration

Step 2: Configure ports and cabling

Step 3: Verify the configuration