Migrate to a two-node switched cluster with NVIDIA SN2100 cluster switches

Contributors netapp-yvonneo

You must be aware of certain configuration information, port connections, and cabling requirements when you migrate a two-node switchless cluster, non-disruptively, to a cluster with NVIDIA SN2100 cluster switches. The procedure you use depends on whether you have two dedicated cluster-network ports on each controller or a single cluster port on each controller. The process documented works for all nodes using optical or Twinax ports but is not supported on this switch if nodes are using onboard 10GBASE-T RJ45 ports for the cluster-network ports.

Two-node switchless configuration
  • The two-node switchless configuration must be properly set up and functioning.

  • The nodes must be running ONTAP 9.10.1P3 and later.

  • All cluster ports must be in the up state.

  • All cluster logical interfaces (LIFs) must be in the up state and on their home ports.

NVIDIA SN2100 cluster switch configuration
  • Both switches must have management network connectivity.

  • There must be console access to the cluster switches.

  • NVIDIA SN2100 node-to-node switch and switch-to-switch connections must use Twinax or fiber cables.

    Important See Cabling and configuration considerations for caveats and further details.

    The Hardware Universe - Switches contains more information about cabling.

  • Inter-Switch Link (ISL) cables must be connected to ports swp15 and swp16 on both NVIDIA SN2100 switches.

  • Initial customization of both the SN2100 switches must be completed. So that the:

    • SN2100 switches are running the latest version of Cumulus Linux

    • Reference Configuration Files (RCFs) have been applied to the switches

    • Any site customization, such as SMTP, SNMP, and SSH must be configured on the new switches.

About this task

The examples in this procedure use the following cluster switch and node nomenclature:

  • The names of the SN2100 switches are sw1 and sw2.

  • The names of the cluster SVMs are node1 and node2.

  • The names of the LIFs are node1_clus1 and node1_clus2 on node 1, and node2_clus1 and node2_clus2 on node 2 respectively.

  • The cluster1::*> prompt indicates the name of the cluster.

  • The cluster ports used in this procedure are e3a and e3b.

  • Breakout ports take the format: swp[port]s[breakout port 0-3]. For example, four breakout ports on swp1 are swp1s0, swp1s1, swp1s2, and swp1s3.

    The Hardware Universe contains the latest information about the actual cluster ports for your platforms.

Steps
  1. If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=xh

    where x is the duration of the maintenance window in hours.

  2. Change the privilege level to advanced, entering y when prompted to continue: set -privilege advanced

    The advanced prompt (*>) appears.

  3. Disable all node-facing ports (not ISL ports) on both the new cluster switches sw1 and sw2.

    You must not disable the ISL ports.

    The following commands disable the node-facing ports on switches sw1 and sw2:

    cumulus@sw1:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down
    cumulus@sw1:~$ net pending
    cumulus@sw1:~$ net commit
    
    cumulus@sw2:~$ net add interface swp1s0-3, swp2s0-3, swp3-14 link down
    cumulus@sw2:~$ net pending
    cumulus@sw2:~$ net commit
  4. Verify that the ISL and the physical ports on the ISL between the two SN2100 switches sw1 and sw2 are up on ports swp15 and swp16: net show interface

    The following example shows that the ISL ports are up on switch sw1:

    cumulus@sw1:~$ net show interface
    
    State  Name       Spd   MTU    Mode        LLDP         Summary
    -----  ---------  ----  -----  ----------  -----------  -----------------------
    ...
    ...
    UP     swp15      100G  9216   BondMember  sw2 (swp15)  Master: cluster_isl(UP)
    UP     swp16      100G  9216   BondMember  sw2 (swp16)  Master: cluster_isl(UP)

    The following example shows that the ISL ports are up on switch sw2:

    cumulus@sw2:~$ net show interface
    
    State  Name       Spd   MTU    Mode        LLDP         Summary
    -----  ---------  ----  -----  ----------  -----------  -----------------------
    ...
    ...
    UP     swp15      100G  9216   BondMember  sw1 (swp15)  Master: cluster_isl(UP)
    UP     swp16      100G  9216   BondMember  sw1 (swp16)  Master: cluster_isl(UP)
  5. Verify that all cluster ports are up: network port show

    Each port should display up for Link and healthy for Health Status.

    cluster1::*> network port show
    
    Node: node1
    
                                                                            Ignore
                                                      Speed(Mbps)  Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
    --------- ------------ ---------------- ---- ---- ------------ -------- ------
    e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
    e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false
    
    Node: node2
    
                                                                            Ignore
                                                      Speed(Mbps)  Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
    --------- ------------ ---------------- ---- ---- ------------ -------- ------
    e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
    e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false
  6. Verify that all cluster LIFs are up and operational: network interface show

    Each cluster LIF should display true for Is Home and have a Status Admin/Oper of up/up

    cluster1::*> network interface show -vserver Cluster
    
                Logical    Status     Network            Current       Current Is
    Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
    ----------- ---------- ---------- ------------------ ------------- ------- -----
    Cluster
                node1_clus1  up/up    169.254.209.69/16  node1         e3a     true
                node1_clus2  up/up    169.254.49.125/16  node1         e3b     true
                node2_clus1  up/up    169.254.47.194/16  node2         e3a     true
                node2_clus2  up/up    169.254.19.183/16  node2         e3b     true
  7. Disable auto-revert on the cluster LIFs: network interface modify -vserver Cluster -lif * -auto-revert false

    cluster1::*> network interface modify -vserver Cluster -lif * -auto-revert false
    
              Logical
    Vserver   Interface     Auto-revert
    --------- ------------- ------------
    Cluster
              node1_clus1   false
              node1_clus2   false
              node2_clus1   false
              node2_clus2   false
  8. Disconnect the cable from cluster port e3a on node1, and then connect e3a to port 3 on cluster switch sw1, using the appropriate cabling supported by the SN2100 switches.

    The Hardware Universe - Switches contains more information about cabling.

  9. Disconnect the cable from cluster port e3a on node2, and then connect e3a to port 4 on cluster switch sw1, using the appropriate cabling supported by the SN2100 switches.

  10. On switch sw1, enable all node-facing ports.

    The following command enables all node-facing ports on switch sw1:

    cumulus@sw1:~$ net del interface swp1s0-3, swp2s0-3, swp3-14 link down
    cumulus@sw1:~$ net pending
    cumulus@sw1:~$ net commit
  11. On switch sw1, verify that all ports are up: net show interface all

    cumulus@sw1:~$ net show interface all
    
    State  Name      Spd   MTU    Mode       LLDP            Summary
    -----  --------- ----  -----  ---------- --------------- --------
    ...
    DN     swp1s0    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp1s1    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp1s2    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp1s3    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s0    25G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s1    25G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s2    25G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s3    25G   9216   Trunk/L2                   Master: br_default(UP)
    UP     swp3      100G  9216   Trunk/L2    node1 (e3a)    Master: br_default(UP)
    UP     swp4      100G  9216   Trunk/L2    node2 (e3a)    Master: br_default(UP)
    ...
    ...
    UP     swp15     100G  9216   BondMember  swp15          Master: cluster_isl(UP)
    UP     swp16     100G  9216   BondMember  swp16          Master: cluster_isl(UP)
    ...
  12. Verify that all cluster ports are up: network port show -ipspace Cluster

    The following example shows that all of the cluster ports are up on node1 and node2:

    cluster1::*> network port show -ipspace Cluster
    
    Node: node1
                                                                            Ignore
                                                      Speed(Mbps)  Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
    --------- ------------ ---------------- ---- ---- ------------ -------- ------
    e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
    e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false
    
    Node: node2
                                                                            Ignore
                                                      Speed(Mbps)  Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper   Status   Status
    --------- ------------ ---------------- ---- ---- ------------ -------- ------
    e3a       Cluster      Cluster          up   9000  auto/100000 healthy  false
    e3b       Cluster      Cluster          up   9000  auto/100000 healthy  false
  13. Display information about the status of the nodes in the cluster: cluster show

    The following example displays information about the health and eligibility of the nodes in the cluster:

    cluster1::*> cluster show
    
    Node                 Health  Eligibility   Epsilon
    -------------------- ------- ------------  ------------
    node1                true    true          false
    node2                true    true          false
  14. Disconnect the cable from cluster port e3b on node1, and then connect e3b to port 3 on cluster switch sw2, using the appropriate cabling supported by the SN2100 switches.

  15. Disconnect the cable from cluster port e3b on node2, and then connect e3b to port 4 on cluster switch sw2, using the appropriate cabling supported by the SN2100 switches.

  16. On switch sw2, enable all node-facing ports.

    The following commands enable the node-facing ports on switch sw2:

    cumulus@sw2:~$ net del interface swp1s0-3, swp2s0-3, swp3-14 link down
    cumulus@sw2:~$ net pending
    cumulus@sw2:~$ net commit
  17. On switch sw2, verify that all ports are up: net show interface all

    cumulus@sw2:~$ net show interface all
    
    State  Name      Spd   MTU    Mode       LLDP            Summary
    -----  --------- ----  -----  ---------- --------------- --------
    ...
    DN     swp1s0    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp1s1    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp1s2    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp1s3    10G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s0    25G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s1    25G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s2    25G   9216   Trunk/L2                   Master: br_default(UP)
    DN     swp2s3    25G   9216   Trunk/L2                   Master: br_default(UP)
    UP     swp3      100G  9216   Trunk/L2    node1 (e3b)    Master: br_default(UP)
    UP     swp4      100G  9216   Trunk/L2    node2 (e3b)    Master: br_default(UP)
    ...
    ...
    UP     swp15     100G  9216   BondMember  swp15          Master: cluster_isl(UP)
    UP     swp16     100G  9216   BondMember  swp16          Master: cluster_isl(UP)
    ...
  18. On both switches sw1 and sw2, verify that both nodes each have one connection to each switch: net show lldp

    The following example shows the appropriate results for both switches sw1 and sw2:

    cumulus@sw1:~$ net show lldp
    
    LocalPort  Speed  Mode        RemoteHost         RemotePort
    ---------  -----  ----------  -----------------  -----------
    swp3       100G   Trunk/L2    node1              e3a
    swp4       100G   Trunk/L2    node2              e3a
    swp15      100G   BondMember  sw2                swp15
    swp16      100G   BondMember  sw2                swp16
    
    cumulus@sw2:~$ net show lldp
    
    LocalPort  Speed  Mode        RemoteHost         RemotePort
    ---------  -----  ----------  -----------------  -----------
    swp3       100G   Trunk/L2    node1              e3b
    swp4       100G   Trunk/L2    node2              e3b
    swp15      100G   BondMember  sw1                swp15
    swp16      100G   BondMember  sw1                swp16
  19. Display information about the discovered network devices in your cluster: net device-discovery show -protocol lldp

    cluster1::*> network device-discovery show -protocol lldp
    Node/       Local  Discovered
    Protocol    Port   Device (LLDP: ChassisID)  Interface     Platform
    ----------- ------ ------------------------- ------------  ----------------
    node1      /lldp
                e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3          -
                e3b    sw2 (b8:ce:f6:19:1b:96)   swp3          -
    node2      /lldp
                e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4          -
                e3b    sw2 (b8:ce:f6:19:1b:96)   swp4          -
  20. Verify that all cluster ports are up: network port show -ipspace Cluster

    The following example shows that all of the cluster ports are up on node1 and node2:

    cluster1::*> network port show -ipspace Cluster
    
    Node: node1
                                                                           Ignore
                                                      Speed(Mbps) Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
    --------- ------------ ---------------- ---- ---- ----------- -------- ------
    e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
    e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
    
    Node: node2
                                                                           Ignore
                                                      Speed(Mbps) Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
    --------- ------------ ---------------- ---- ---- ----------- -------- ------
    e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
    e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
  21. Enable auto-revert on all cluster LIFs: net interface modify -vserver Cluster -lif * -auto-revert true

    cluster1::*> net interface modify -vserver Cluster -lif * -auto-revert true
    
              Logical
    Vserver   Interface     Auto-revert
    --------- ------------- ------------
    Cluster
              node1_clus1   true
              node1_clus2   true
              node2_clus1   true
              node2_clus2   true
  22. Verify that all interfaces display true for Is Home: net interface show -vserver Cluster

    Note This might take a minute to complete.

    The following example shows that all LIFs are up on node1 and node2 and that Is Home results are true:

    cluster1::*> net interface show -vserver Cluster
    
              Logical      Status     Network            Current    Current Is
    Vserver   Interface    Admin/Oper Address/Mask       Node       Port    Home
    --------- ------------ ---------- ------------------ ---------- ------- ----
    Cluster
              node1_clus1  up/up      169.254.209.69/16  node1      e3a     true
              node1_clus2  up/up      169.254.49.125/16  node1      e3b     true
              node2_clus1  up/up      169.254.47.194/16  node2      e3a     true
              node2_clus2  up/up      169.254.19.183/16  node2      e3b     true
  23. Verify that the settings are disabled: network options switchless-cluster show

    The false output in the following example shows that the configuration settings are disabled:

    cluster1::*> network options switchless-cluster show
    Enable Switchless Cluster: false
  24. Verify the status of the node members in the cluster: cluster show

    The following example shows information about the health and eligibility of the nodes in the cluster:

    cluster1::*> cluster show
    
    Node                 Health  Eligibility   Epsilon
    -------------------- ------- ------------  --------
    node1                true    true          false
    node2                true    true          false
  25. Ensure that the cluster network has full connectivity: cluster ping-cluster -node node-name

    cluster1::*> cluster ping-cluster -node node1
    Host is node1
    Getting addresses from network interface table...
    Cluster node1_clus1 169.254.209.69 node1 e3a
    Cluster node1_clus2 169.254.49.125 node1 e3b
    Cluster node2_clus1 169.254.47.194 node2 e3a
    Cluster node2_clus2 169.254.19.183 node2 e3b
    Local = 169.254.47.194 169.254.19.183
    Remote = 169.254.209.69 169.254.49.125
    Cluster Vserver Id = 4294967293
    Ping status:
    
    Basic connectivity succeeds on 4 path(s)
    Basic connectivity fails on 0 path(s)
    
    Detected 9000 byte MTU on 4 path(s):
    Local 169.254.47.194 to Remote 169.254.209.69
    Local 169.254.47.194 to Remote 169.254.49.125
    Local 169.254.19.183 to Remote 169.254.209.69
    Local 169.254.19.183 to Remote 169.254.49.125
    Larger than PMTU communication succeeds on 4 path(s)
    RPC status:
    2 paths up, 0 paths down (tcp check)
    2 paths up, 0 paths down (udp check)
  26. Enable the Ethernet switch health monitor log collection feature for collecting switch-related log files, using the commands: system switch ethernet log setup-password and system switch ethernet log enable-collection

    Enter: system switch ethernet log setup-password

    cluster1::*> system switch ethernet log setup-password
    Enter the switch name: <return>
    The switch name entered is not recognized.
    Choose from the following list:
    sw1
    sw2
    
    cluster1::*> system switch ethernet log setup-password
    
    Enter the switch name: sw1
    RSA key fingerprint is e5:8b:c6:dc:e2:18:18:09:36:63:d9:63:dd:03:d9:cc
    Do you want to continue? {y|n}::[n] y
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> system switch ethernet log setup-password
    
    Enter the switch name: sw2
    RSA key fingerprint is 57:49:86:a1:b9:80:6a:61:9a:86:8e:3c:e3:b7:1f:b1
    Do you want to continue? {y|n}:: [n] y
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>

    Followed by: system switch ethernet log enable-collection

    cluster1::*> system switch ethernet log enable-collection
    
    Do you want to enable cluster log collection for all nodes in the cluster?
    {y|n}: [n] y
    
    Enabling cluster switch log collection.
    
    cluster1::*>
    Note If any of these commands return an error, contact NetApp support.
  27. Initiate the switch log collection feature: system switch ethernet log collect -device *

    Wait for 10 minutes and then check that the log collection was successful using the command: system switch ethernet log show

    cluster1::*> system switch ethernet log show
    Log Collection Enabled: true
    
    Index  Switch                       Log Timestamp        Status
    ------ ---------------------------- -------------------  ---------    
    1      sw1 (b8:ce:f6:19:1b:42)      4/29/2022 03:05:25   complete   
    2      sw2 (b8:ce:f6:19:1b:96)      4/29/2022 03:07:42   complete
  28. Change the privilege level back to admin: set -privilege admin

  29. If you suppressed automatic case creation, reenable it by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=END