Migrate to a two-node switched cluster with NVIDIA SN2100 cluster switches
If you have an existing two-node switchless cluster environment, you can migrate to a two-node switched cluster environment using NVIDIA SN2100 switches to enable you to scale beyond two nodes in the cluster.
The procedure you use depends on whether you have two dedicated cluster-network ports on each controller or a single cluster port on each controller. The process documented works for all nodes using optical or Twinax ports, but is not supported on this switch if nodes are using onboard 10GBASE-T RJ45 ports for the cluster-network ports.
Review requirements
For the two-node switchless configuration, ensure that:
-
The two-node switchless configuration is properly set up and functioning.
-
The nodes is running ONTAP 9.10.1P3 and later.
-
All cluster ports are in the up state.
-
All cluster logical interfaces (LIFs) are in the up state and on their home ports.
For the NVIDIA SN2100 cluster switch configuration, ensure that:
-
Both switches have management network connectivity.
-
There is console access to the cluster switches.
-
NVIDIA SN2100 node-to-node switch and switch-to-switch connections use Twinax or fiber cables.
See Cabling and configuration considerations for caveats and further details. The Hardware Universe - Switches contains more information about cabling.
-
Inter-Switch Link (ISL) cables are connected to ports swp15 and swp16 on both NVIDIA SN2100 switches.
-
Initial customization of both the SN2100 switches are completed, so that:
-
SN2100 switches are running the latest version of Cumulus Linux
-
Reference Configuration Files (RCFs) have been applied to the switches
-
Any site customization, such as SMTP, SNMP, and SSH are configured on the new switches.
-
Migrate the switches
The examples in this procedure use the following cluster switch and node nomenclature:
-
The names of the SN2100 switches are sw1 and sw2.
-
The names of the cluster SVMs are node1 and node2.
-
The names of the LIFs are node1_clus1 and node1_clus2 on node 1, and node2_clus1 and node2_clus2 on node 2 respectively.
-
The
cluster1::*>
prompt indicates the name of the cluster. -
The cluster ports used in this procedure are e3a and e3b.
-
Breakout ports take the format: swp[port]s[breakout port 0-3]. For example, four breakout ports on swp1 are swp1s0, swp1s1, swp1s2, and swp1s3.
The Hardware Universe contains the latest information about the actual cluster ports for your platforms.
Step 1: Prepare for migration
-
If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message:
system node autosupport invoke -node * -type all -message MAINT=xh
where x is the duration of the maintenance window in hours.
-
Change the privilege level to advanced, entering
y
when prompted to continue:set -privilege advanced
The advanced prompt (
*>
) appears.
Step 2: Configure cables and ports
-
Disable all node-facing ports (not ISL ports) on both the new cluster switches sw1 and sw2.
You must not disable the ISL ports.
Show example
The following commands disable the node-facing ports on switches sw1 and sw2:
cumulus@sw1:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down cumulus@sw1:~$ net pending cumulus@sw1:~$ net commit cumulus@sw2:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down cumulus@sw2:~$ net pending cumulus@sw2:~$ net commit
-
Verify that the ISL and the physical ports on the ISL between the two SN2100 switches sw1 and sw2 are up on ports swp15 and swp16:
net show interface
Show example
The following example shows that the ISL ports are up on switch sw1:
cumulus@sw1:~$ net show interface State Name Spd MTU Mode LLDP Summary ----- --------- ---- ----- ---------- ----------- ----------------------- ... ... UP swp15 100G 9216 BondMember sw2 (swp15) Master: cluster_isl(UP) UP swp16 100G 9216 BondMember sw2 (swp16) Master: cluster_isl(UP)
+ The following example shows that the ISL ports are up on switch sw2:
+
cumulus@sw2:~$ net show interface State Name Spd MTU Mode LLDP Summary ----- --------- ---- ----- ---------- ----------- ----------------------- ... ... UP swp15 100G 9216 BondMember sw1 (swp15) Master: cluster_isl(UP) UP swp16 100G 9216 BondMember sw1 (swp16) Master: cluster_isl(UP)
-
Verify that all cluster ports are up:
network port show
Each port should display up for
Link
and healthy forHealth Status
.Show example
cluster1::*> network port show Node: node1 Ignore Speed(Mbps) Health Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status --------- ------------ ---------------- ---- ---- ------------ -------- ------ e3a Cluster Cluster up 9000 auto/100000 healthy false e3b Cluster Cluster up 9000 auto/100000 healthy false Node: node2 Ignore Speed(Mbps) Health Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status --------- ------------ ---------------- ---- ---- ------------ -------- ------ e3a Cluster Cluster up 9000 auto/100000 healthy false e3b Cluster Cluster up 9000 auto/100000 healthy false
-
Verify that all cluster LIFs are up and operational:
network interface show
Each cluster LIF should display true for
Is Home
and have aStatus Admin/Oper
of up/upShow example
cluster1::*> network interface show -vserver Cluster Logical Status Network Current Current Is Vserver Interface Admin/Oper Address/Mask Node Port Home ----------- ---------- ---------- ------------------ ------------- ------- ----- Cluster node1_clus1 up/up 169.254.209.69/16 node1 e3a true node1_clus2 up/up 169.254.49.125/16 node1 e3b true node2_clus1 up/up 169.254.47.194/16 node2 e3a true node2_clus2 up/up 169.254.19.183/16 node2 e3b true
-
Disable auto-revert on the cluster LIFs:
network interface modify -vserver Cluster -lif * -auto-revert false
Show example
cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert false Logical Vserver Interface Auto-revert --------- ------------- ------------ Cluster node1_clus1 false node1_clus2 false node2_clus1 false node2_clus2 false
-
Disconnect the cable from cluster port e3a on node1, and then connect e3a to port 3 on cluster switch sw1, using the appropriate cabling supported by the SN2100 switches.
The Hardware Universe - Switches contains more information about cabling.
-
Disconnect the cable from cluster port e3a on node2, and then connect e3a to port 4 on cluster switch sw1, using the appropriate cabling supported by the SN2100 switches.
-
On switch sw1, enable all node-facing ports.
Show example
The following command enables all node-facing ports on switch sw1:
cumulus@sw1:~$ net del interface swp1s0-3, swp2s0-3, swp3-14 link down cumulus@sw1:~$ net pending cumulus@sw1:~$ net commit
-
On switch sw1, verify that all ports are up:
net show interface all
Show example
cumulus@sw1:~$ net show interface all State Name Spd MTU Mode LLDP Summary ----- --------- ---- ----- ---------- --------------- -------- ... DN swp1s0 10G 9216 Trunk/L2 Master: br_default(UP) DN swp1s1 10G 9216 Trunk/L2 Master: br_default(UP) DN swp1s2 10G 9216 Trunk/L2 Master: br_default(UP) DN swp1s3 10G 9216 Trunk/L2 Master: br_default(UP) DN swp2s0 25G 9216 Trunk/L2 Master: br_default(UP) DN swp2s1 25G 9216 Trunk/L2 Master: br_default(UP) DN swp2s2 25G 9216 Trunk/L2 Master: br_default(UP) DN swp2s3 25G 9216 Trunk/L2 Master: br_default(UP) UP swp3 100G 9216 Trunk/L2 node1 (e3a) Master: br_default(UP) UP swp4 100G 9216 Trunk/L2 node2 (e3a) Master: br_default(UP) ... ... UP swp15 100G 9216 BondMember swp15 Master: cluster_isl(UP) UP swp16 100G 9216 BondMember swp16 Master: cluster_isl(UP) ...
-
Verify that all cluster ports are up:
network port show -ipspace Cluster
Show example
The following example shows that all of the cluster ports are up on node1 and node2:
cluster1::*> network port show -ipspace Cluster Node: node1 Ignore Speed(Mbps) Health Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status --------- ------------ ---------------- ---- ---- ------------ -------- ------ e3a Cluster Cluster up 9000 auto/100000 healthy false e3b Cluster Cluster up 9000 auto/100000 healthy false Node: node2 Ignore Speed(Mbps) Health Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status --------- ------------ ---------------- ---- ---- ------------ -------- ------ e3a Cluster Cluster up 9000 auto/100000 healthy false e3b Cluster Cluster up 9000 auto/100000 healthy false
-
Display information about the status of the nodes in the cluster:
cluster show
Show example
The following example displays information about the health and eligibility of the nodes in the cluster:
cluster1::*> cluster show Node Health Eligibility Epsilon -------------------- ------- ------------ ------------ node1 true true false node2 true true false
-
Disconnect the cable from cluster port e3b on node1, and then connect e3b to port 3 on cluster switch sw2, using the appropriate cabling supported by the SN2100 switches.
-
Disconnect the cable from cluster port e3b on node2, and then connect e3b to port 4 on cluster switch sw2, using the appropriate cabling supported by the SN2100 switches.
-
On switch sw2, enable all node-facing ports.
Show example
The following commands enable the node-facing ports on switch sw2:
cumulus@sw2:~$ net del interface swp1s0-3, swp2s0-3, swp3-14 link down cumulus@sw2:~$ net pending cumulus@sw2:~$ net commit
-
On switch sw2, verify that all ports are up:
net show interface all
Show example
cumulus@sw2:~$ net show interface all State Name Spd MTU Mode LLDP Summary ----- --------- ---- ----- ---------- --------------- -------- ... DN swp1s0 10G 9216 Trunk/L2 Master: br_default(UP) DN swp1s1 10G 9216 Trunk/L2 Master: br_default(UP) DN swp1s2 10G 9216 Trunk/L2 Master: br_default(UP) DN swp1s3 10G 9216 Trunk/L2 Master: br_default(UP) DN swp2s0 25G 9216 Trunk/L2 Master: br_default(UP) DN swp2s1 25G 9216 Trunk/L2 Master: br_default(UP) DN swp2s2 25G 9216 Trunk/L2 Master: br_default(UP) DN swp2s3 25G 9216 Trunk/L2 Master: br_default(UP) UP swp3 100G 9216 Trunk/L2 node1 (e3b) Master: br_default(UP) UP swp4 100G 9216 Trunk/L2 node2 (e3b) Master: br_default(UP) ... ... UP swp15 100G 9216 BondMember swp15 Master: cluster_isl(UP) UP swp16 100G 9216 BondMember swp16 Master: cluster_isl(UP) ...
-
On both switches sw1 and sw2, verify that both nodes each have one connection to each switch:
net show lldp
Show example
The following example shows the appropriate results for both switches sw1 and sw2:
cumulus@sw1:~$ net show lldp LocalPort Speed Mode RemoteHost RemotePort --------- ----- ---------- ----------------- ----------- swp3 100G Trunk/L2 node1 e3a swp4 100G Trunk/L2 node2 e3a swp15 100G BondMember sw2 swp15 swp16 100G BondMember sw2 swp16 cumulus@sw2:~$ net show lldp LocalPort Speed Mode RemoteHost RemotePort --------- ----- ---------- ----------------- ----------- swp3 100G Trunk/L2 node1 e3b swp4 100G Trunk/L2 node2 e3b swp15 100G BondMember sw1 swp15 swp16 100G BondMember sw1 swp16
Step 3: Complete the procedure
-
Display information about the discovered network devices in your cluster:
net device-discovery show -protocol lldp
Show example
cluster1::*> network device-discovery show -protocol lldp Node/ Local Discovered Protocol Port Device (LLDP: ChassisID) Interface Platform ----------- ------ ------------------------- ------------ ---------------- node1 /lldp e3a sw1 (b8:ce:f6:19:1a:7e) swp3 - e3b sw2 (b8:ce:f6:19:1b:96) swp3 - node2 /lldp e3a sw1 (b8:ce:f6:19:1a:7e) swp4 - e3b sw2 (b8:ce:f6:19:1b:96) swp4 -
-
Verify that all cluster ports are up:
network port show -ipspace Cluster
Show example
The following example shows that all of the cluster ports are up on node1 and node2:
cluster1::*> network port show -ipspace Cluster Node: node1 Ignore Speed(Mbps) Health Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status --------- ------------ ---------------- ---- ---- ----------- -------- ------ e3a Cluster Cluster up 9000 auto/10000 healthy false e3b Cluster Cluster up 9000 auto/10000 healthy false Node: node2 Ignore Speed(Mbps) Health Health Port IPspace Broadcast Domain Link MTU Admin/Oper Status Status --------- ------------ ---------------- ---- ---- ----------- -------- ------ e3a Cluster Cluster up 9000 auto/10000 healthy false e3b Cluster Cluster up 9000 auto/10000 healthy false
-
Enable auto-revert on all cluster LIFs:
net interface modify -vserver Cluster -lif * -auto-revert true
Show example
cluster1::*> net interface modify -vserver Cluster -lif * -auto-revert true Logical Vserver Interface Auto-revert --------- ------------- ------------ Cluster node1_clus1 true node1_clus2 true node2_clus1 true node2_clus2 true
-
Verify that all interfaces display true for
Is Home
:net interface show -vserver Cluster
This might take a minute to complete. Show example
The following example shows that all LIFs are up on node1 and node2 and that
Is Home
results are true:cluster1::*> net interface show -vserver Cluster Logical Status Network Current Current Is Vserver Interface Admin/Oper Address/Mask Node Port Home --------- ------------ ---------- ------------------ ---------- ------- ---- Cluster node1_clus1 up/up 169.254.209.69/16 node1 e3a true node1_clus2 up/up 169.254.49.125/16 node1 e3b true node2_clus1 up/up 169.254.47.194/16 node2 e3a true node2_clus2 up/up 169.254.19.183/16 node2 e3b true
-
Verify that the settings are disabled:
network options switchless-cluster show
Show example
The false output in the following example shows that the configuration settings are disabled:
cluster1::*> network options switchless-cluster show Enable Switchless Cluster: false
-
Verify the status of the node members in the cluster:
cluster show
Show example
The following example shows information about the health and eligibility of the nodes in the cluster:
cluster1::*> cluster show Node Health Eligibility Epsilon -------------------- ------- ------------ -------- node1 true true false node2 true true false
-
Ensure that the cluster network has full connectivity:
cluster ping-cluster -node node-name
Show example
cluster1::*> cluster ping-cluster -node node1 Host is node1 Getting addresses from network interface table... Cluster node1_clus1 169.254.209.69 node1 e3a Cluster node1_clus2 169.254.49.125 node1 e3b Cluster node2_clus1 169.254.47.194 node2 e3a Cluster node2_clus2 169.254.19.183 node2 e3b Local = 169.254.47.194 169.254.19.183 Remote = 169.254.209.69 169.254.49.125 Cluster Vserver Id = 4294967293 Ping status: Basic connectivity succeeds on 4 path(s) Basic connectivity fails on 0 path(s) Detected 9000 byte MTU on 4 path(s): Local 169.254.47.194 to Remote 169.254.209.69 Local 169.254.47.194 to Remote 169.254.49.125 Local 169.254.19.183 to Remote 169.254.209.69 Local 169.254.19.183 to Remote 169.254.49.125 Larger than PMTU communication succeeds on 4 path(s) RPC status: 2 paths up, 0 paths down (tcp check) 2 paths up, 0 paths down (udp check)
-
Enable the Ethernet switch health monitor log collection feature for collecting switch-related log files, using the commands:
system switch ethernet log setup-password
andsystem switch ethernet log enable-collection
Enter:
system switch ethernet log setup-password
Show example
cluster1::*> system switch ethernet log setup-password Enter the switch name: <return> The switch name entered is not recognized. Choose from the following list: sw1 sw2 cluster1::*> system switch ethernet log setup-password Enter the switch name: sw1 RSA key fingerprint is e5:8b:c6:dc:e2:18:18:09:36:63:d9:63:dd:03:d9:cc Do you want to continue? {y|n}::[n] y Enter the password: <enter switch password> Enter the password again: <enter switch password> cluster1::*> system switch ethernet log setup-password Enter the switch name: sw2 RSA key fingerprint is 57:49:86:a1:b9:80:6a:61:9a:86:8e:3c:e3:b7:1f:b1 Do you want to continue? {y|n}:: [n] y Enter the password: <enter switch password> Enter the password again: <enter switch password>
Followed by:
system switch ethernet log enable-collection
Show example
cluster1::*> system switch ethernet log enable-collection Do you want to enable cluster log collection for all nodes in the cluster? {y|n}: [n] y Enabling cluster switch log collection. cluster1::*>
If any of these commands return an error, contact NetApp support. -
Initiate the switch log collection feature:
system switch ethernet log collect -device *
Wait for 10 minutes and then check that the log collection was successful using the command:
system switch ethernet log show
Show example
cluster1::*> system switch ethernet log show Log Collection Enabled: true Index Switch Log Timestamp Status ------ ---------------------------- ------------------- --------- 1 sw1 (b8:ce:f6:19:1b:42) 4/29/2022 03:05:25 complete 2 sw2 (b8:ce:f6:19:1b:96) 4/29/2022 03:07:42 complete
-
Change the privilege level back to admin:
set -privilege admin
-
If you suppressed automatic case creation, reenable it by invoking an AutoSupport message:
system node autosupport invoke -node * -type all -message MAINT=END