Replace a Cisco Nexus 9336C-FX2 cluster switch

Contributors netapp-martyh

Replacing a defective Nexus 9336C-FX2 switch in a cluster network is a nondisruptive procedure (NDU).

Before you begin

The following conditions must exist before performing the switch replacement in the current environment and on the replacement switch.

  • Existing cluster and network infrastructure:

    • The existing cluster must be verified as completely functional, with at least one fully connected cluster switch.

    • All cluster ports must be up.

    • All cluster logical interfaces (LIFs) must be up and on their home ports.

    • The ONTAP cluster ping-cluster -node node1 command must indicate that basic connectivity and larger than PMTU communication are successful on all paths.

  • Nexus 9336C-FX2 replacement switch:

    • Management network connectivity on the replacement switch must be functional.

    • Console access to the replacement switch must be in place.

    • The node connections are ports 1/1 through 1/34.

    • All Inter-Switch Link (ISL) ports must be disabled on ports 1/35 and 1/36.

    • The desired reference configuration file (RCF) and NX-OS operating system image switch must be loaded onto the switch.

    • Initial customization of the switch must be complete, as detailed in:

      Any previous site customizations, such as STP, SNMP, and SSH, should be copied to the new switch.

You must execute the command for migrating a cluster LIF from the node where the cluster LIF is hosted.

The examples in this procedure use the following switch and node nomenclature:

  • The names of the existing Nexus 9336C-FX2 switches are cs1 and cs2.

  • The name of the new Nexus 9336C-FX2 switch is newcs2.

  • The node names are node1 and node2.

  • The cluster ports on each node are named e0a and e0b.

  • The cluster LIF names are node1_clus1 and node1_clus2 for node1, and node2_clus1 and node2_clus2 for node2.

  • The prompt for changes to all cluster nodes is cluster1::*>

Note The following procedure is based on the following cluster network topology:
cluster1::*> **network port show -ipspace Cluster**

Node: node1
                                                                       Ignore
                                                  Speed(Mbps) Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
--------- ------------ ---------------- ---- ---- ----------- -------- ------
e0a       Cluster      Cluster          up   9000  auto/10000 healthy  false
e0b       Cluster      Cluster          up   9000  auto/10000 healthy  false

Node: node2
                                                                       Ignore
                                                  Speed(Mbps) Health   Health
Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
--------- ------------ ---------------- ---- ---- ----------- -------- ------
e0a       Cluster      Cluster          up   9000  auto/10000 healthy  false
e0b       Cluster      Cluster          up   9000  auto/10000 healthy  false
4 entries were displayed.



cluster1::*> **network interface show -vserver Cluster**
            Logical    Status     Network            Current       Current Is
Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
----------- ---------- ---------- ------------------ ------------- ------- ----
Cluster
            node1_clus1  up/up    169.254.209.69/16  node1         e0a     true
            node1_clus2  up/up    169.254.49.125/16  node1         e0b     true
            node2_clus1  up/up    169.254.47.194/16  node2         e0a     true
            node2_clus2  up/up    169.254.19.183/16  node2         e0b     true
4 entries were displayed.



cluster1::*> **network device-discovery show -protocol cdp**
Node/       Local  Discovered
Protocol    Port   Device (LLDP: ChassisID)  Interface         Platform
----------- ------ ------------------------- ----------------  ----------------
node2      /cdp
            e0a    cs1                       Eth1/2            N9K-C9336C
            e0b    cs2                       Eth1/2            N9K-C9336C
node1      /cdp
            e0a    cs1                       Eth1/1            N9K-C9336C
            e0b    cs2                       Eth1/1            N9K-C9336C
4 entries were displayed.



cs1# **show cdp neighbors**

Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater,
                  V - VoIP-Phone, D - Remotely-Managed-Device,
                  s - Supports-STP-Dispute

Device-ID          Local Intrfce  Hldtme Capability  Platform      Port ID
node1              Eth1/1         144    H           FAS2980       e0a
node2              Eth1/2         145    H           FAS2980       e0a
cs2                Eth1/35        176    R S I s     N9K-C9336C    Eth1/35
cs2(FDO220329V5)   Eth1/36        176    R S I s     N9K-C9336C    Eth1/36

Total entries displayed: 4


cs2# **show cdp neighbors**

Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
                  S - Switch, H - Host, I - IGMP, r - Repeater,
                  V - VoIP-Phone, D - Remotely-Managed-Device,
                  s - Supports-STP-Dispute

Device-ID          Local Intrfce  Hldtme Capability  Platform      Port ID
node1              Eth1/1         139    H           FAS2980       e0b
node2              Eth1/2         124    H           FAS2980       e0b
cs1                Eth1/35        178    R S I s     N9K-C9336C    Eth1/35
cs1                Eth1/36        178    R S I s     N9K-C9336C    Eth1/36

Total entries displayed: 4
Steps
  1. If AutoSupport is enabled on this cluster, suppress automatic case creation by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=xh

    where x is the duration of the maintenance window in hours.

    Note The AutoSupport message notifies technical support of this maintenance task so that automatic case creation is suppressed during the maintenance window.
  2. Install the appropriate RCF and image on the switch, newcs2, and make any necessary site preparations.

    If necessary, verify, download, and install the appropriate versions of the RCF and NX-OS software for the new switch. If you have verified that the new switch is correctly set up and does not need updates to the RCF and NX-OS software, continue to step 2.

    1. Go to the NetApp Cluster and Management Network Switches Reference Configuration File Description Page on the NetApp Support Site.

    2. Click the link for the Cluster Network and Management Network Compatibility Matrix, and then note the required switch software version.

    3. Click your browser’s back arrow to return to the Description page, click CONTINUE, accept the license agreement, and then go to the Download page.

    4. Follow the steps on the Download page to download the correct RCF and NX-OS files for the version of ONTAP software you are installing.

  3. On the new switch, log in as admin and shut down all of the ports that will be connected to the node cluster interfaces (ports 1/1 to 1/34).

    If the switch that you are replacing is not functional and is powered down, go to Step 4. The LIFs on the cluster nodes should have already failed over to the other cluster port for each node.

    newcs2# **config**
    Enter configuration commands, one per line. End with CNTL/Z.
    newcs2(config)# **interface e1/1-34**
    newcs2(config-if-range)# **shutdown**
  4. Verify that all cluster LIFs have auto-revert enabled: network interface show -vserver Cluster -fields auto-revert

    cluster1::> **network interface show -vserver Cluster -fields auto-revert**
    
                 Logical
    Vserver      Interface     Auto-revert
    ------------ ------------- -------------
    Cluster      node1_clus1   true
    Cluster      node1_clus2   true
    Cluster      node2_clus1   true
    Cluster      node2_clus2   true
    
    4 entries were displayed.
  5. Verify that all the cluster LIFs can communicate: cluster ping-cluster

    cluster1::*> **cluster ping-cluster node1**
    
    Host is node2
    Getting addresses from network interface table...
    Cluster node1_clus1 169.254.209.69 node1 e0a
    Cluster node1_clus2 169.254.49.125 node1 e0b
    Cluster node2_clus1 169.254.47.194 node2 e0a
    Cluster node2_clus2 169.254.19.183 node2 e0b
    Local = 169.254.47.194 169.254.19.183
    Remote = 169.254.209.69 169.254.49.125
    Cluster Vserver Id = 4294967293
    Ping status:
    ....
    Basic connectivity succeeds on 4 path(s)
    Basic connectivity fails on 0 path(s)
    ................
    Detected 9000 byte MTU on 4 path(s):
    Local 169.254.47.194 to Remote 169.254.209.69
    Local 169.254.47.194 to Remote 169.254.49.125
    Local 169.254.19.183 to Remote 169.254.209.69
    Local 169.254.19.183 to Remote 169.254.49.125
    Larger than PMTU communication succeeds on 4 path(s)
    RPC status:
    2 paths up, 0 paths down (tcp check)
    2 paths up, 0 paths down (udp check)
  6. Shut down the ISL ports 1/35 and 1/36 on the Nexus 9336C-FX2 switch cs1:

    cs1# **configure**
    Enter configuration commands, one per line. End with CNTL/Z.
    cs1(config)# **interface e1/35-36**
    cs1(config-if-range)# **shutdown**
    cs1(config-if-range)#
  7. Remove all of the cables from the Nexus 9336C-FX2 cs2 switch, and then connect them to the same ports on the Nexus C9336C-FX2 newcs2 switch.

  8. Bring up the ISLs ports 1/35 and 1/36 between the cs1 and newcs2 switches, and then verify the port channel operation status.

    Port-Channel should indicate Po1(SU) and Member Ports should indicate Eth1/35(P) and Eth1/36(P).

    This example enables ISL ports 1/35 and 1/36 and displays the port channel summary on switch cs1:

    cs1# **configure**
    Enter configuration commands, one per line. End with CNTL/Z.
    cs1(config)# **int e1/35-36**
    cs1(config-if-range)# **no shutdown**
    
    cs1(config-if-range)# show port-channel summary
    Flags:  D - Down        P - Up in port-channel (members)
            I - Individual  H - Hot-standby (LACP only)
            s - Suspended   r - Module-removed
            b - BFD Session Wait
            S - Switched    R - Routed
            U - Up (port-channel)
            p - Up in delay-lacp mode (member)
            M - Not in use. Min-links not met
    --------------------------------------------------------------------------------
    Group Port-       Type     Protocol  Member       Ports
          Channel
    --------------------------------------------------------------------------------
    1     Po1(SU)     Eth      LACP      Eth1/35(P)   Eth1/36(P)
    
    cs1(config-if-range)#
  9. Verify that port e0b is up on all nodes: network port show ipspace Cluster

    The output should be similar to the following:

    cluster1::*> **network port show -ipspace Cluster**
    
    Node: node1
                                                                            Ignore
                                                       Speed(Mbps) Health   Health
    Port      IPspace      Broadcast Domain Link MTU   Admin/Oper  Status   Status
    --------- ------------ ---------------- ---- ----- ----------- -------- -------
    e0a       Cluster      Cluster          up   9000  auto/10000  healthy  false
    e0b       Cluster      Cluster          up   9000  auto/10000  healthy  false
    
    Node: node2
                                                                            Ignore
                                                       Speed(Mbps) Health   Health
    Port      IPspace      Broadcast Domain Link MTU   Admin/Oper  Status   Status
    --------- ------------ ---------------- ---- ----- ----------- -------- -------
    e0a       Cluster      Cluster          up   9000  auto/10000  healthy  false
    e0b       Cluster      Cluster          up   9000  auto/auto   -        false
    
    4 entries were displayed.
  10. On the same node you used in the previous step, revert the cluster LIF associated with the port in the previous step by using the network interface revert command.

    In this example, LIF node1_clus2 on node1 is successfully reverted if the Home value is true and the port is e0b.

    The following commands return LIF node1_clus2 on node1 to home port e0a and displays information about the LIFs on both nodes. Bringing up the first node is successful if the Is Home column is true for both cluster interfaces and they show the correct port assignments, in this example e0a and e0b on node1.

    cluster1::*> **network interface show -vserver Cluster**
    
                Logical      Status     Network            Current    Current Is
    Vserver     Interface    Admin/Oper Address/Mask       Node       Port    Home
    ----------- ------------ ---------- ------------------ ---------- ------- -----
    Cluster
                node1_clus1  up/up      169.254.209.69/16  node1      e0a     true
                node1_clus2  up/up      169.254.49.125/16  node1      e0b     true
                node2_clus1  up/up      169.254.47.194/16  node2      e0a     true
                node2_clus2  up/up      169.254.19.183/16  node2      e0a     false
    
    4 entries were displayed.
  11. Display information about the nodes in a cluster: cluster show

    This example shows that the node health for node1 and node2 in this cluster is true:

    cluster1::*> **cluster show**
    
    Node          Health  Eligibility
    ------------- ------- ------------
    node1         false   true
    node2         true    true
  12. Verify that all physical cluster ports are up: network port show ipspace Cluster

    cluster1::*> **network port show -ipspace Cluster**
    
    Node node1                                                               Ignore
                                                        Speed(Mbps) Health   Health
    Port      IPspace     Broadcast Domain  Link  MTU   Admin/Oper  Status   Status
    --------- ----------- ----------------- ----- ----- ----------- -------- ------
    e0a       Cluster     Cluster           up    9000  auto/10000  healthy  false
    e0b       Cluster     Cluster           up    9000  auto/10000  healthy  false
    
    Node: node2
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
    Port      IPspace      Broadcast Domain Link  MTU   Admin/Oper  Status   Status
    --------- ------------ ---------------- ----- ----- ----------- -------- ------
    e0a       Cluster      Cluster          up    9000  auto/10000  healthy  false
    e0b       Cluster      Cluster          up    9000  auto/10000  healthy  false
    
    4 entries were displayed.
  13. Verify that all the cluster LIFs can communicate: cluster ping-cluster

    cluster1::*> **cluster ping-cluster -node node2**
    Host is node2
    Getting addresses from network interface table...
    Cluster node1_clus1 169.254.209.69 node1 e0a
    Cluster node1_clus2 169.254.49.125 node1 e0b
    Cluster node2_clus1 169.254.47.194 node2 e0a
    Cluster node2_clus2 169.254.19.183 node2 e0b
    Local = 169.254.47.194 169.254.19.183
    Remote = 169.254.209.69 169.254.49.125
    Cluster Vserver Id = 4294967293
    Ping status:
    ....
    Basic connectivity succeeds on 4 path(s)
    Basic connectivity fails on 0 path(s)
    ................
    Detected 9000 byte MTU on 4 path(s):
    Local 169.254.47.194 to Remote 169.254.209.69
    Local 169.254.47.194 to Remote 169.254.49.125
    Local 169.254.19.183 to Remote 169.254.209.69
    Local 169.254.19.183 to Remote 169.254.49.125
    Larger than PMTU communication succeeds on 4 path(s)
    RPC status:
    2 paths up, 0 paths down (tcp check)
    2 paths up, 0 paths down (udp check)
  14. Confirm the following cluster network configuration: network port show

    cluster1::*> **network port show -ipspace Cluster**
    Node: node1
                                                                           Ignore
                                           Speed(Mbps)            Health   Health
    Port      IPspace     Broadcast Domain Link MTU   Admin/Oper  Status   Status
    --------- ----------- ---------------- ---- ----- ----------- -------- ------
    e0a       Cluster     Cluster          up   9000  auto/10000  healthy  false
    e0b       Cluster     Cluster          up   9000  auto/10000  healthy  false
    
    Node: node2
                                                                           Ignore
                                            Speed(Mbps)           Health   Health
    Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
    --------- ------------ ---------------- ---- ---- ----------- -------- ------
    e0a       Cluster      Cluster          up   9000 auto/10000  healthy  false
    e0b       Cluster      Cluster          up   9000 auto/10000  healthy  false
    
    4 entries were displayed.
    
    
    cluster1::*> **network interface show -vserver Cluster**
    
                Logical    Status     Network            Current       Current Is
    Vserver     Interface  Admin/Oper Address/Mask       Node          Port    Home
    ----------- ---------- ---------- ------------------ ------------- ------- ----
    Cluster
                node1_clus1  up/up    169.254.209.69/16  node1         e0a     true
                node1_clus2  up/up    169.254.49.125/16  node1         e0b     true
                node2_clus1  up/up    169.254.47.194/16  node2         e0a     true
                node2_clus2  up/up    169.254.19.183/16  node2         e0b     true
    
    4 entries were displayed.
    
    cluster1::> **network device-discovery show -protocol cdp**
    
    Node/       Local  Discovered
    Protocol    Port   Device (LLDP: ChassisID)  Interface         Platform
    ----------- ------ ------------------------- ----------------  ----------------
    node2      /cdp
                e0a    cs1                       0/2               N9K-C9336C
                e0b    newcs2                    0/2               N9K-C9336C
    node1      /cdp
                e0a    cs1                       0/1               N9K-C9336C
                e0b    newcs2                    0/1               N9K-C9336C
    
    4 entries were displayed.
    
    
    cs1# **show cdp neighbors**
    
    Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
                      S - Switch, H - Host, I - IGMP, r - Repeater,
                      V - VoIP-Phone, D - Remotely-Managed-Device,
                      s - Supports-STP-Dispute
    
    Device-ID            Local Intrfce  Hldtme Capability  Platform      Port ID
    node1                Eth1/1         144    H           FAS2980       e0a
    node2                Eth1/2         145    H           FAS2980       e0a
    newcs2               Eth1/35        176    R S I s     N9K-C9336C    Eth1/35
    newcs2               Eth1/36        176    R S I s     N9K-C9336C    Eth1/36
    
    Total entries displayed: 4
    
    
    cs2# **show cdp neighbors**
    
    Capability Codes: R - Router, T - Trans-Bridge, B - Source-Route-Bridge
                      S - Switch, H - Host, I - IGMP, r - Repeater,
                      V - VoIP-Phone, D - Remotely-Managed-Device,
                      s - Supports-STP-Dispute
    
    Device-ID          Local Intrfce  Hldtme Capability  Platform      Port ID
    node1              Eth1/1         139    H           FAS2980       e0b
    node2              Eth1/2         124    H           FAS2980       e0b
    cs1                Eth1/35        178    R S I s     N9K-C9336C    Eth1/35
    cs1                Eth1/36        178    R S I s     N9K-C9336C    Eth1/36
    
    Total entries displayed: 4
  15. For ONTAP 9.8 and later, enable the Ethernet switch health monitor log collection feature for collecting switch-related log files, using the commands: system switch ethernet log setup-password``system switch ethernet log enable-collection

    cluster1::*> **system switch ethernet log setup-password**
    Enter the switch name: <return>
    The switch name entered is not recognized.
    Choose from the following list:
    cs1
    cs2
    
    cluster1::*> **system switch ethernet log setup-password**
    
    Enter the switch name: **cs1**
    RSA key fingerprint is e5:8b:c6:dc:e2:18:18:09:36:63:d9:63:dd:03:d9:cc
    Do you want to continue? {y|n}::[n] **y**
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> **system switch ethernet log setup-password**
    
    Enter the switch name: **cs2**
    RSA key fingerprint is 57:49:86:a1:b9:80:6a:61:9a:86:8e:3c:e3:b7:1f:b1
    Do you want to continue? {y|n}:: [n] **y**
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> **system  switch ethernet log enable-collection**
    
    Do you want to enable cluster log collection for all nodes in the cluster?
    {y|n}: [n] **y**
    
    Enabling cluster switch log collection.
    
    cluster1::*>
    Note If any of these commands return an error, contact NetApp support.
  16. For ONTAP releases 9.5P16, 9.6P12, and 9.7P10 and later patch releases, enable the Ethernet switch health monitor log collection feature for collecting switch-related log files, using the commands: system cluster-switch log setup-password``system cluster-switch log enable-collection

    cluster1::*> **system cluster-switch log setup-password**
    Enter the switch name: <return>
    The switch name entered is not recognized.
    Choose from the following list:
    cs1
    cs2
    
    cluster1::*> **system cluster-switch log setup-password**
    
    Enter the switch name: **cs1**
    RSA key fingerprint is e5:8b:c6:dc:e2:18:18:09:36:63:d9:63:dd:03:d9:cc
    Do you want to continue? {y|n}::[n] **y**
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> **system cluster-switch log setup-password**
    
    Enter the switch name: **cs2**
    RSA key fingerprint is 57:49:86:a1:b9:80:6a:61:9a:86:8e:3c:e3:b7:1f:b1
    Do you want to continue? {y|n}:: [n] **y**
    
    Enter the password: <enter switch password>
    Enter the password again: <enter switch password>
    
    cluster1::*> **system cluster-switch log enable-collection**
    
    Do you want to enable cluster log collection for all nodes in the cluster?
    {y|n}: [n] **y**
    
    Enabling cluster switch log collection.
    
    cluster1::*>
    Note If any of these commands return an error, contact NetApp support.
  17. If you suppressed automatic case creation, re-enable it by invoking an AutoSupport message: system node autosupport invoke -node * -type all -message MAINT=END