Migrate to a two-node switched cluster with NVIDIA SN2100 cluster switches

01/12/2023 Contributors

If you have an existing two-node switchless cluster environment, you can migrate to a two-node switched cluster environment using NVIDIA SN2100 switches to enable you to scale beyond two nodes in the cluster.

The procedure you use depends on whether you have two dedicated cluster-network ports on each controller or a single cluster port on each controller. The process documented works for all nodes using optical or Twinax ports, but is not supported on this switch if nodes are using onboard 10GBASE-T RJ45 ports for the cluster-network ports.

Review requirements

What you'll need

For the two-node switchless configuration, ensure that:

The two-node switchless configuration is properly set up and functioning.
The nodes is running ONTAP 9.10.1P3 and later.
All cluster ports are in the up state.
All cluster logical interfaces (LIFs) are in the up state and on their home ports.

For the NVIDIA SN2100 cluster switch configuration, ensure that:

Both switches have management network connectivity.
There is console access to the cluster switches.
NVIDIA SN2100 node-to-node switch and switch-to-switch connections use Twinax or fiber cables.

See Cabling and configuration considerations for caveats and further details.

The Hardware Universe - Switches contains more information about cabling.
Inter-Switch Link (ISL) cables are connected to ports swp15 and swp16 on both NVIDIA SN2100 switches.
Initial customization of both the SN2100 switches are completed, so that:
- SN2100 switches are running the latest version of Cumulus Linux
- Reference Configuration Files (RCFs) have been applied to the switches
- Any site customization, such as SMTP, SNMP, and SSH are configured on the new switches.

Migrate the switches

About the examples

The examples in this procedure use the following cluster switch and node nomenclature:

The names of the SN2100 switches are sw1 and sw2.
The names of the cluster SVMs are node1 and node2.
The names of the LIFs are node1_clus1 and node1_clus2 on node 1, and node2_clus1 and node2_clus2 on node 2 respectively.
The cluster1::*> prompt indicates the name of the cluster.
The cluster ports used in this procedure are e3a and e3b.
Breakout ports take the format: swp[port]s[breakout port 0-3]. For example, four breakout ports on swp1 are swp1s0, swp1s1, swp1s2, and swp1s3.

The Hardware Universe contains the latest information about the actual cluster ports for your platforms.

Step 1: Prepare for migration

If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message:

system node autosupport invoke -node * -type all -message MAINT=xh

where x is the duration of the maintenance window in hours.
Change the privilege level to advanced, entering y when prompted to continue:

set -privilege advanced

The advanced prompt (*>) appears.

Step 2: Configure cables and ports

Disable all node-facing ports (not ISL ports) on both the new cluster switches sw1 and sw2.

You must not disable the ISL ports.

Show example

The following commands disable the node-facing ports on switches sw1 and sw2:

cumulus@sw1:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down
cumulus@sw1:~$ net pending
cumulus@sw1:~$ net commit

cumulus@sw2:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down
cumulus@sw2:~$ net pending
cumulus@sw2:~$ net commit

Verify that the ISL and the physical ports on the ISL between the two SN2100 switches sw1 and sw2 are up on ports swp15 and swp16:

net show interface

Show example

The following example shows that the ISL ports are up on switch sw1:

cumulus@sw1:~$ net show interface

State  Name       Spd   MTU    Mode        LLDP         Summary
-----  ---------  ----  -----  ----------  -----------  -----------------------
...
...
UP     swp15      100G  9216   BondMember  sw2 (swp15)  Master: cluster_isl(UP)
UP     swp16      100G  9216   BondMember  sw2 (swp16)  Master: cluster_isl(UP)

+ The following example shows that the ISL ports are up on switch sw2:

cumulus@sw2:~$ net show interface

State  Name       Spd   MTU    Mode        LLDP         Summary
-----  ---------  ----  -----  ----------  -----------  -----------------------
...
...
UP     swp15      100G  9216   BondMember  sw1 (swp15)  Master: cluster_isl(UP)
UP     swp16      100G  9216   BondMember  sw1 (swp16)  Master: cluster_isl(UP)

Verify that all cluster ports are up:

network port show

Each port should display up for Link and healthy for Health Status.

Show example

cluster1::*> network port show

Node: node1

                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false

Node: node2

                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false

Verify that all cluster LIFs are up and operational:

network interface show

Each cluster LIF should display true for Is Home and have a Status Admin/Oper of up/up

Show example

cluster1::*> network interface show -vserver Cluster

            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- -----
Cluster
            node1_clus1  up/up    169.254.209.69/16  node1         e3a     true
            node1_clus2  up/up    169.254.49.125/16  node1         e3b     true
            node2_clus1  up/up    169.254.47.194/16  node2         e3a     true
            node2_clus2  up/up    169.254.19.183/16  node2         e3b     true

Disable auto-revert on the cluster LIFs:

network interface modify -vserver Cluster -lif * -auto-revert false

Show example

cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert false

          Logical
Vserver   Interface     Auto-revert
--------- ------------- ------------
Cluster
          node1_clus1   false
          node1_clus2   false
          node2_clus1   false
          node2_clus2   false

Disconnect the cable from cluster port e3a on node1, and then connect e3a to port 3 on cluster switch sw1, using the appropriate cabling supported by the SN2100 switches.

The Hardware Universe - Switches contains more information about cabling.
Disconnect the cable from cluster port e3a on node2, and then connect e3a to port 4 on cluster switch sw1, using the appropriate cabling supported by the SN2100 switches.

On switch sw1, enable all node-facing ports.

Show example

The following command enables all node-facing ports on switch sw1:

cumulus@sw1:~$ net del interface swp1s0-3, swp2s0-3, swp3-14 link down
cumulus@sw1:~$ net pending
cumulus@sw1:~$ net commit

On switch sw1, verify that all ports are up:

net show interface all

Show example

cumulus@sw1:~$ net show interface all

State  Name      Spd   MTU    Mode       LLDP            Summary
-----  --------- ----  -----  ---------- --------------- --------
...
DN     swp1s0    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp1s1    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp1s2    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp1s3    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s0    25G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s1    25G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s2    25G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s3    25G   9216   Trunk/L2                   Master: br_default(UP)
UP     swp3      100G  9216   Trunk/L2    node1 (e3a)    Master: br_default(UP)
UP     swp4      100G  9216   Trunk/L2    node2 (e3a)    Master: br_default(UP)
...
...
UP     swp15     100G  9216   BondMember  swp15          Master: cluster_isl(UP)
UP     swp16     100G  9216   BondMember  swp16          Master: cluster_isl(UP)
...

Verify that all cluster ports are up:

network port show -ipspace Cluster

Show example

The following example shows that all of the cluster ports are up on node1 and node2:

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false

Node: node2
                                                                        Ignore
                                                  Speed(Mbps)  Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
--------- ------------ ---------------- ---- ---- ------------ -------- ------
e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false

Display information about the status of the nodes in the cluster:

cluster show

Show example

The following example displays information about the health and eligibility of the nodes in the cluster:

cluster1::*> cluster show

Node                 Health  Eligibility   Epsilon
-------------------- ------- ------------  ------------
node1                true    true          false
node2                true    true          false

Disconnect the cable from cluster port e3b on node1, and then connect e3b to port 3 on cluster switch sw2, using the appropriate cabling supported by the SN2100 switches.
Disconnect the cable from cluster port e3b on node2, and then connect e3b to port 4 on cluster switch sw2, using the appropriate cabling supported by the SN2100 switches.

On switch sw2, enable all node-facing ports.

Show example

The following commands enable the node-facing ports on switch sw2:

cumulus@sw2:~$ net del interface swp1s0-3, swp2s0-3, swp3-14 link down
cumulus@sw2:~$ net pending
cumulus@sw2:~$ net commit

On switch sw2, verify that all ports are up:

net show interface all

Show example

cumulus@sw2:~$ net show interface all

State  Name      Spd   MTU    Mode       LLDP            Summary
-----  --------- ----  -----  ---------- --------------- --------
...
DN     swp1s0    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp1s1    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp1s2    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp1s3    10G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s0    25G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s1    25G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s2    25G   9216   Trunk/L2                   Master: br_default(UP)
DN     swp2s3    25G   9216   Trunk/L2                   Master: br_default(UP)
UP     swp3      100G  9216   Trunk/L2    node1 (e3b)    Master: br_default(UP)
UP     swp4      100G  9216   Trunk/L2    node2 (e3b)    Master: br_default(UP)
...
...
UP     swp15     100G  9216   BondMember  swp15          Master: cluster_isl(UP)
UP     swp16     100G  9216   BondMember  swp16          Master: cluster_isl(UP)
...

On both switches sw1 and sw2, verify that both nodes each have one connection to each switch:

net show lldp

Show example

The following example shows the appropriate results for both switches sw1 and sw2:

cumulus@sw1:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    node1              e3a
swp4       100G   Trunk/L2    node2              e3a
swp15      100G   BondMember  sw2                swp15
swp16      100G   BondMember  sw2                swp16

cumulus@sw2:~$ net show lldp

LocalPort  Speed  Mode        RemoteHost         RemotePort
---------  -----  ----------  -----------------  -----------
swp3       100G   Trunk/L2    node1              e3b
swp4       100G   Trunk/L2    node2              e3b
swp15      100G   BondMember  sw1                swp15
swp16      100G   BondMember  sw1                swp16

Step 3: Complete the procedure

Display information about the discovered network devices in your cluster:

net device-discovery show -protocol lldp

Show example

cluster1::*> network device-discovery show -protocol lldp
Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface     Platform
----------- ------ ------------------------- ------------  ----------------
node1      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3          -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp3          -
node2      /lldp
            e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4          -
            e3b    sw2 (b8:ce:f6:19:1b:96)   swp4          -

Verify that all cluster ports are up:

network port show -ipspace Cluster

Show example

The following example shows that all of the cluster ports are up on node1 and node2:

cluster1::*> network port show -ipspace Cluster

Node: node1
                                                                       Ignore
                                                  Speed(Mbps) Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
--------- ------------ ---------------- ---- ---- ----------- -------- ------
e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false

Node: node2
                                                                       Ignore
                                                  Speed(Mbps) Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
--------- ------------ ---------------- ---- ---- ----------- -------- ------
e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false

Enable auto-revert on all cluster LIFs:

net interface modify -vserver Cluster -lif * -auto-revert true

Show example

cluster1::*> net interface modify -vserver Cluster -lif * -auto-revert true

          Logical
Vserver   Interface     Auto-revert
--------- ------------- ------------
Cluster
          node1_clus1   true
          node1_clus2   true
          node2_clus1   true
          node2_clus2   true

Verify that all interfaces display true for Is Home:

net interface show -vserver Cluster

This might take a minute to complete.

Show example

The following example shows that all LIFs are up on node1 and node2 and that Is Home results are true:

cluster1::*> net interface show -vserver Cluster

          Logical      Status     Network            Current    Current Is
Vserver   Interface    Admin/Oper Address/Mask       Node       Port    Home
--------- ------------ ---------- ------------------ ---------- ------- ----
Cluster
          node1_clus1  up/up      169.254.209.69/16  node1      e3a     true
          node1_clus2  up/up      169.254.49.125/16  node1      e3b     true
          node2_clus1  up/up      169.254.47.194/16  node2      e3a     true
          node2_clus2  up/up      169.254.19.183/16  node2      e3b     true

Verify that the settings are disabled:

network options switchless-cluster show
Show example
The false output in the following example shows that the configuration settings are disabled:
cluster1::*> network options switchless-cluster show Enable Switchless Cluster: false

Verify the status of the node members in the cluster:

cluster show

Show example

The following example shows information about the health and eligibility of the nodes in the cluster:

cluster1::*> cluster show

Node                 Health  Eligibility   Epsilon
-------------------- ------- ------------  --------
node1                true    true          false
node2                true    true          false

Ensure that the cluster network has full connectivity:

cluster ping-cluster -node node-name

Show example

cluster1::*> cluster ping-cluster -node node1
Host is node1
Getting addresses from network interface table...
Cluster node1_clus1 169.254.209.69 node1 e3a
Cluster node1_clus2 169.254.49.125 node1 e3b
Cluster node2_clus1 169.254.47.194 node2 e3a
Cluster node2_clus2 169.254.19.183 node2 e3b
Local = 169.254.47.194 169.254.19.183
Remote = 169.254.209.69 169.254.49.125
Cluster Vserver Id = 4294967293
Ping status:

Basic connectivity succeeds on 4 path(s)
Basic connectivity fails on 0 path(s)

Detected 9000 byte MTU on 4 path(s):
Local 169.254.47.194 to Remote 169.254.209.69
Local 169.254.47.194 to Remote 169.254.49.125
Local 169.254.19.183 to Remote 169.254.209.69
Local 169.254.19.183 to Remote 169.254.49.125
Larger than PMTU communication succeeds on 4 path(s)
RPC status:
2 paths up, 0 paths down (tcp check)
2 paths up, 0 paths down (udp check)

Enable the Ethernet switch health monitor log collection feature for collecting switch-related log files, using the commands:

system switch ethernet log setup-password and system switch ethernet log enable-collection

Enter: system switch ethernet log setup-password

Show example

cluster1::*> system switch ethernet log setup-password
Enter the switch name: <return>
The switch name entered is not recognized.
Choose from the following list:
sw1
sw2

cluster1::*> system switch ethernet log setup-password

Enter the switch name: sw1
RSA key fingerprint is e5:8b:c6:dc:e2:18:18:09:36:63:d9:63:dd:03:d9:cc
Do you want to continue? {y|n}::[n] y

Enter the password: <enter switch password>
Enter the password again: <enter switch password>

cluster1::*> system switch ethernet log setup-password

Enter the switch name: sw2
RSA key fingerprint is 57:49:86:a1:b9:80:6a:61:9a:86:8e:3c:e3:b7:1f:b1
Do you want to continue? {y|n}:: [n] y

Enter the password: <enter switch password>
Enter the password again: <enter switch password>

Followed by:

system switch ethernet log enable-collection

Show example

cluster1::*> system switch ethernet log enable-collection

Do you want to enable cluster log collection for all nodes in the cluster?
{y|n}: [n] y

Enabling cluster switch log collection.

cluster1::*>

If any of these commands return an error, contact NetApp support.

Initiate the switch log collection feature:

system switch ethernet log collect -device *

Wait for 10 minutes and then check that the log collection was successful using the command:

system switch ethernet log show

Show example

cluster1::*> system switch ethernet log show
Log Collection Enabled: true

Index  Switch                       Log Timestamp        Status
------ ---------------------------- -------------------  ---------    
1      sw1 (b8:ce:f6:19:1b:42)      4/29/2022 03:05:25   complete   
2      sw2 (b8:ce:f6:19:1b:96)      4/29/2022 03:07:42   complete

Change the privilege level back to admin:

set -privilege admin
If you suppressed automatic case creation, reenable it by invoking an AutoSupport message:

system node autosupport invoke -node * -type all -message MAINT=END

Migrate to a two-node switched cluster with NVIDIA SN2100 cluster switches

Creating your file...

Review requirements

Migrate the switches

Step 1: Prepare for migration

Step 2: Configure cables and ports

Step 3: Complete the procedure