Skip to main content
简体中文版经机器翻译而成,仅供参考。如与英语版出现任何冲突,应以英语版为准。

安装或升级参考配置文件 (RCF) 脚本

贡献者 netapp-yvonneo netapp-jolieg

请按照以下步骤安装或升级 RCF 脚本。

开始之前

在安装或升级 RCF 脚本之前,请确保交换机上具备以下条件:

  • Cumulus Linux 已安装。参见 "Hardware Universe"适用于支持的版本。

  • IP 地址、子网掩码和默认网关通过 DHCP 定义或手动配置。

备注 除了管理员用户之外,您还必须在 RCF 中指定一个专门用于日志收集的用户。
客户配置

可用的参考配置类别如下:

集群

在配置为 4x10GbE 分支的端口上,一个端口配置为 4x25GbE 分支,其他端口配置为 40/100GbE。对于使用共享集群/HA 端口的节点,支持端口上的共享集群/HA 流量。请参阅知识库文章中的平台表。 "哪些AFF、 ASA和FAS平台使用共享集群和 HA 以太网端口?" 。所有端口也可以用作专用集群端口。

存储

所有端口均配置为 100GbE NVMe 存储连接。

当前 RCF 脚本版本

集群和存储应用可以使用两种 RCF 脚本。从 "NVIDIA SN2100 软件下载"页。每种情况的处理步骤都相同。

  • 集群:MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP

  • 存储:MSN2100-RCF-v1.x-存储

关于示例

以下示例步骤展示了如何下载和应用集群交换机的 RCF 脚本。

示例命令输出使用交换机管理 IP 地址 10.233.204.71,子网掩码 255.255.254.0 和默认网关 10.233.204.1。

示例 1. 步骤
Cumulus Linux 4.4.3
  1. 将集群交换机连接到管理网络。

  2. 使用 `ping`用于验证与托管 Cumulus Linux 和 RCF 的服务器的连接性的命令。

  3. 显示每个节点上连接到集群交换机的集群端口:

    network device-discovery show

  4. 检查每个集群端口的管理和运行状态。

    1. 确认集群所有端口均已启动且状态正常:

      network port show -role cluster

    2. 确认所有集群接口(LIF)都位于主端口上:

      network interface show -role cluster

    3. 确认集群显示两个集群交换机的信息:

      system cluster-switch show -is-monitoring-enabled-operational true

  5. 禁用集群 LIF 的自动回滚功能。集群 LIF 会故障转移到伙伴集群交换机,并在您对目标交换机执行升级过程时保留在该交换机上:

    network interface modify -vserver Cluster -lif * -auto-revert false

  • 如果您要升级 RCF,则必须在此步骤中禁用自动回滚功能。

  • 如果您刚刚升级了 Cumulus Linux 版本,则无需在此步骤中禁用自动还原功能,因为它已经禁用。

  1. 显示SN2100交换机上的可用接口:

    admin@sw1:mgmt:~$ net show interface all
    
    State  Name   Spd  MTU    Mode         LLDP                Summary
    -----  -----  ---  -----  -----------  ------------------  --------------
    ...
    ...
    ADMDN  swp1   N/A  9216   NotConfigured
    ADMDN  swp2   N/A  9216   NotConfigured
    ADMDN  swp3   N/A  9216   NotConfigured
    ADMDN  swp4   N/A  9216   NotConfigured
    ADMDN  swp5   N/A  9216   NotConfigured
    ADMDN  swp6   N/A  9216   NotConfigured
    ADMDN  swp7   N/A  9216   NotConfigured
    ADMDN  swp8   N/A  9216   NotConfigured
    ADMDN  swp9   N/A  9216   NotConfigured
    ADMDN  swp10  N/A  9216   NotConfigured
    ADMDN  swp11  N/A  9216   NotConfigured
    ADMDN  swp12  N/A  9216   NotConfigured
    ADMDN  swp13  N/A  9216   NotConfigured
    ADMDN  swp14  N/A  9216   NotConfigured
    ADMDN  swp15  N/A  9216   NotConfigured
    ADMDN  swp16  N/A  9216   NotConfigured
  2. 将 RCF Python 脚本复制到交换机。

    cumulus@cumulus:mgmt:~$ cd /tmp
    cumulus@cumulus:mgmt:/tmp$ scp <user>@<host:/<path>/MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP .
    ssologin@10.233.204.71's password:
    MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP         100% 8607   111.2KB/s         00:00
    备注 尽管 `scp`如果示例中使用的是这种方式,您可以使用您喜欢的文件传输方式,例如 SFTP、HTTPS 或 FTP。
  3. 应用 RCF python 脚本 MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP

    cumulus@cumulus:mgmt:/tmp$ sudo python3 MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP
    [sudo] password for cumulus:
    ...
    Step 1: Creating the banner file
    Step 2: Registering banner message
    Step 3: Updating the MOTD file
    Step 4: Ensuring passwordless use of cl-support command by admin
    Step 5: Disabling apt-get
    Step 6: Creating the interfaces
    Step 7: Adding the interface config
    Step 8: Disabling cdp
    Step 9: Adding the lldp config
    Step 10: Adding the RoCE base config
    Step 11: Modifying RoCE Config
    Step 12: Configure SNMP
    Step 13: Reboot the switch

    RCF 脚本完成了上面示例中列出的步骤。

    备注 在上述步骤 3“更新 MOTD 文件”中,该命令 `cat /etc/motd`正在运行。这样,您就可以验证 RCF 文件名、RCF 版本、要使用的端口以及 RCF 横幅中的其他重要信息。
    备注 如果遇到任何无法解决的 RCF Python 脚本问题,请联系我们。 "NetApp 支持"寻求帮助。
  4. 将之前对交换机配置所做的任何自定义设置重新应用。请参阅"审查布线和配置注意事项"有关任何后续变更的详细信息。

  5. 重启后验证配置:

    admin@sw1:mgmt:~$ net show interface all
    
    State  Name      Spd   MTU    Mode       LLDP              Summary
    -----  --------- ----  -----  ---------- ----------------- --------
    ...
    ...
    DN     swp1s0    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp1s1    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp1s2    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp1s3    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp2s0    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp2s1    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp2s2    N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp2s3    N/A   9216   Trunk/L2                     Master: bridge(UP)
    UP     swp3      100G  9216   Trunk/L2                     Master: bridge(UP)
    UP     swp4      100G  9216   Trunk/L2                     Master: bridge(UP)
    DN     swp5      N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp6      N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp7      N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp8      N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp9      N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp10     N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp11     N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp12     N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp13     N/A   9216   Trunk/L2                     Master: bridge(UP)
    DN     swp14     N/A   9216   Trunk/L2                     Master: bridge(UP)
    UP     swp15     N/A   9216   BondMember                   Master: bond_15_16(UP)
    UP     swp16     N/A   9216   BondMember                   Master: bond_15_16(UP)
    ...
    ...
    
    admin@sw1:mgmt:~$ net show roce config
    RoCE mode.......... lossless
    Congestion Control:
      Enabled SPs.... 0 2 5
      Mode........... ECN
      Min Threshold.. 150 KB
      Max Threshold.. 1500 KB
    PFC:
      Status......... enabled
      Enabled SPs.... 2 5
      Interfaces......... swp10-16,swp1s0-3,swp2s0-3,swp3-9
    
    DSCP                     802.1p  switch-priority
    -----------------------  ------  ---------------
    0 1 2 3 4 5 6 7               0                0
    8 9 10 11 12 13 14 15         1                1
    16 17 18 19 20 21 22 23       2                2
    24 25 26 27 28 29 30 31       3                3
    32 33 34 35 36 37 38 39       4                4
    40 41 42 43 44 45 46 47       5                5
    48 49 50 51 52 53 54 55       6                6
    56 57 58 59 60 61 62 63       7                7
    
    switch-priority  TC  ETS
    ---------------  --  --------
    0 1 3 4 6 7       0  DWRR 28%
    2                 2  DWRR 28%
    5                 5  DWRR 43%
  6. 请核对接口中收发器的信息:

    admin@sw1:mgmt:~$ net show interface pluggables
    Interface  Identifier     Vendor Name  Vendor PN        Vendor SN       Vendor Rev
    ---------  -------------  -----------  ---------------  --------------  ----------
    swp3       0x11 (QSFP28)  Amphenol     112-00574        APF20379253516  B0
    swp4       0x11 (QSFP28)  AVAGO        332-00440        AF1815GU05Z     A0
    swp15      0x11 (QSFP28)  Amphenol     112-00573        APF21109348001  B0
    swp16      0x11 (QSFP28)  Amphenol     112-00573        APF21109347895  B0
  7. 确认每个节点都与每个交换机有连接:

    admin@sw1:mgmt:~$ net show lldp
    
    LocalPort  Speed  Mode        RemoteHost              RemotePort
    ---------  -----  ----------  ----------------------  -----------
    swp3       100G   Trunk/L2    sw1                     e3a
    swp4       100G   Trunk/L2    sw2                     e3b
    swp15      100G   BondMember  sw13                    swp15
    swp16      100G   BondMember  sw14                    swp16
  8. 检查集群上集群端口的运行状况。

    1. 确认集群中所有节点的集群端口均已启动且运行状况良好:

      cluster1::*> network port show -role cluster
      
      Node: node1
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
      Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
      --------- ------------ ---------------- ---- ---- ----------- -------- ------
      e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
      e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
      
      Node: node2
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
      Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
      --------- ------------ ---------------- ---- ---- ----------- -------- ------
      e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
      e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
    2. 从集群验证交换机的健康状况(这可能不会显示交换机 sw2,因为 LIF 没有归位到 e0d)。

      cluster1::*> network device-discovery show -protocol lldp
      Node/       Local  Discovered
      Protocol    Port   Device (LLDP: ChassisID)  Interface Platform
      ----------- ------ ------------------------- --------- ----------
      node1/lldp
                  e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3      -
                  e3b    sw2 (b8:ce:f6:19:1b:96)   swp3      -
      
      node2/lldp
                  e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4      -
                  e3b    sw2 (b8:ce:f6:19:1b:96)   swp4      -
      
      
      cluster1::*> system switch ethernet show -is-monitoring-enabled-operational true
      Switch                      Type               Address          Model
      --------------------------- ------------------ ---------------- -----
      sw1                         cluster-network    10.233.205.90    MSN2100-CB2RC
           Serial Number: MNXXXXXXGD
            Is Monitored: true
                  Reason: None
        Software Version: Cumulus Linux version 4.4.3 running on Mellanox
                          Technologies Ltd. MSN2100
          Version Source: LLDP
      
      sw2                         cluster-network    10.233.205.91    MSN2100-CB2RC
           Serial Number: MNCXXXXXXGS
            Is Monitored: true
                  Reason: None
        Software Version: Cumulus Linux version 4.4.3 running on Mellanox
                          Technologies Ltd. MSN2100
          Version Source: LLDP
  9. 验证集群是否运行正常:

    cluster show

  10. 对第二个开关重复步骤 1 至 14。

  11. 启用集群 LIF 的自动回滚功能。

    network interface modify -vserver Cluster -lif * -auto-revert true

  1. 将集群交换机连接到管理网络。

  2. 使用 `ping`用于验证与托管 Cumulus Linux 和 RCF 的服务器的连接性的命令。

  3. 显示每个节点上连接到集群交换机的集群端口:

    network device-discovery show

  4. 检查每个集群端口的管理和运行状态。

    1. 确认集群所有端口均已启动且状态正常:

      network port show -role cluster

    2. 确认所有集群接口(LIF)都位于主端口上:

      network interface show -role cluster

    3. 确认集群显示两个集群交换机的信息:

      system cluster-switch show -is-monitoring-enabled-operational true

  5. 禁用集群 LIF 的自动回滚功能。集群 LIF 会故障转移到伙伴集群交换机,并在您对目标交换机执行升级过程时保留在该交换机上:

    network interface modify -vserver Cluster -lif * -auto-revert false

  • 如果您要升级 RCF,则必须在此步骤中禁用自动回滚功能。

  • 如果您刚刚升级了 Cumulus Linux 版本,则无需在此步骤中禁用自动还原功能,因为它已经禁用。

  1. 显示SN2100交换机上的可用接口:

    admin@sw1:mgmt:~$ nv show interface
    Interface     MTU   Speed State Remote Host         Remote Port- Type      Summary
    ------------- ----- ----- ----- ------------------- ------------ --------- -------------
    + cluster_isl 9216  200G  up                                      bond
    + eth0        1500  100M  up    mgmt-sw1            Eth105/1/14   eth       IP Address: 10.231.80 206/22
      eth0                                                                      IP Address: fd20:8b1e:f6ff:fe31:4a0e/64
    + lo          65536       up                                      loopback  IP Address: 127.0.0.1/8
      lo                                                                        IP Address: ::1/128
    + swp1s0      9216 10G    up cluster01                e0b         swp
    .
    .
    .
    + swp15      9216 100G    up sw2                      swp15       swp
    + swp16      9216 100G    up sw2                      swp16       swp
  2. 将 RCF Python 脚本复制到交换机。

    cumulus@cumulus:mgmt:~$ cd /tmp
    cumulus@cumulus:mgmt:/tmp$ scp <user>@<host:/<path>/MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP .
    ssologin@10.233.204.71's password:
    MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP          100% 8607   111.2KB/s         00:00
    备注 尽管 `scp`如果示例中使用的是这种方式,您可以使用您喜欢的文件传输方式,例如 SFTP、HTTPS 或 FTP。
  3. 应用 RCF python 脚本 MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP

    cumulus@cumulus:mgmt:/tmp$ sudo python3 MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP
    [sudo] password for cumulus:
    .
    .
    Step 1: Creating the banner file
    Step 2: Registering banner message
    Step 3: Updating the MOTD file
    Step 4: Ensuring passwordless use of cl-support command by admin
    Step 5: Disabling apt-get
    Step 6: Creating the interfaces
    Step 7: Adding the interface config
    Step 8: Disabling cdp
    Step 9: Adding the lldp config
    Step 10: Adding the RoCE base config
    Step 11: Modifying RoCE Config
    Step 12: Configure SNMP
    Step 13: Reboot the switch

    RCF 脚本完成了上面示例中列出的步骤。

    备注 在上述步骤 3“更新 MOTD 文件”中,该命令 `cat /etc/issue.net`正在运行。这样,您就可以验证 RCF 文件名、RCF 版本、要使用的端口以及 RCF 横幅中的其他重要信息。

    例如:

    admin@sw1:mgmt:~$ cat /etc/issue.net
    ******************************************************************************
    *
    * NetApp Reference Configuration File (RCF)
    * Switch       : Mellanox MSN2100
    * Filename     : MSN2100-RCF-1._x_-Cluster-HA-Breakout-LLDP
    * Release Date : 13-02-2023
    * Version      : 1._x_-Cluster-HA-Breakout-LLDP
    *
    * Port Usage:
    * Port 1      : 4x10G Breakout mode for Cluster+HA Ports, swp1s0-3
    * Port 2      : 4x25G Breakout mode for Cluster+HA Ports, swp2s0-3
    * Ports 3-14  : 40/100G for Cluster+HA Ports, swp3-14
    * Ports 15-16 : 100G Cluster ISL Ports, swp15-16
    *
    * NOTE:
    *   RCF manually sets swp1s0-3 link speed to 10000 and
    *   auto-negotiation to off for Intel 10G
    *   RCF manually sets swp2s0-3 link speed to 25000 and
    *   auto-negotiation to off for Chelsio 25G
    *
    *
    * IMPORTANT: Perform the following steps to ensure proper RCF installation:
    * - Copy the RCF file to /tmp
    * - Ensure the file has execute permission
    * - From /tmp run the file as sudo python3 <filename>
    *
    ******************************************************************************
    备注 如果遇到任何无法解决的 RCF Python 脚本问题,请联系我们。 "NetApp 支持"寻求帮助。
  4. 将之前对交换机配置所做的任何自定义设置重新应用。请参阅"审查布线和配置注意事项"有关任何后续变更的详细信息。

  5. 重启后验证配置:

    admin@sw1:mgmt:~$ nv show interface
    Interface     MTU   Speed State Remote Host         Remote Port- Type      Summary
    ------------- ----- ----- ----- ------------------- ------------ --------- -------------
    + cluster_isl 9216  200G  up                                      bond
    + eth0        1500  100M  up    mgmt-sw1            Eth105/1/14   eth       IP Address: 10.231.80 206/22
      eth0                                                                      IP Address: fd20:8b1e:f6ff:fe31:4a0e/64
    + lo          65536       up                                      loopback  IP Address: 127.0.0.1/8
      lo                                                                        IP Address: ::1/128
    + swp1s0      9216 10G    up cluster01                e0b         swp
    .
    .
    .
    + swp15      9216 100G    up sw2                      swp15       swp
    + swp16      9216 100G    up sw2                      swp16       swp
    
    admin@sw1:mgmt:~$ nv show qos roce
                       operational  applied   description
    -----------------  -----------  --------- ----------------------------------------
    enable             on                     Turn feature 'on' or 'off'. This feature is disabled by default.
    mode               lossless     lossless  Roce Mode
    congestion-control
      congestion-mode   ECN,RED                Congestion config mode
      enabled-tc        0,2,5                  Congestion config enabled Traffic Class
      max-threshold     195.31 KB              Congestion config max-threshold
      min-threshold     39.06 KB               Congestion config min-threshold
      probability       100
    lldp-app-tlv
      priority          3                      switch-priority of roce
      protocol-id       4791                   L4 port number
      selector          UDP                    L4 protocol
    pfc
      pfc-priority      2, 5                   switch-prio on which PFC is enabled
      rx-enabled        enabled                PFC Rx Enabled status
      tx-enabled        enabled                PFC Tx Enabled status
    trust
      trust-mode        pcp,dscp               Trust Setting on the port for packet classification
    
    RoCE PCP/DSCP->SP mapping configurations
    ===========================================
            pcp  dscp                     switch-prio
        --  ---  -----------------------  -----------
        0   0    0,1,2,3,4,5,6,7          0
        1   1    8,9,10,11,12,13,14,15    1
        2   2    16,17,18,19,20,21,22,23  2
        3   3    24,25,26,27,28,29,30,31  3
        4   4    32,33,34,35,36,37,38,39  4
        5   5    40,41,42,43,44,45,46,47  5
        6   6    48,49,50,51,52,53,54,55  6
        7   7    56,57,58,59,60,61,62,63  7
    
    RoCE SP->TC mapping and ETS configurations
    =============================================
            switch-prio  traffic-class  scheduler-weight
        --  -----------  -------------  ----------------
        0   0            0              DWRR-28%
        1   1            0              DWRR-28%
        2   2            2              DWRR-28%
        3   3            0              DWRR-28%
        4   4            0              DWRR-28%
        5   5            5              DWRR-43%
        6   6            0              DWRR-28%
        7   7            0              DWRR-28%
    
    RoCE pool config
    ===================
            name                   mode     size  switch-priorities  traffic-class
        --  ---------------------  -------  ----  -----------------  -------------
        0   lossy-default-ingress  Dynamic  50%   0,1,3,4,6,7        -
        1   roce-reserved-ingress  Dynamic  50%   2,5                -
        2   lossy-default-egress   Dynamic  50%   -                  0
        3   roce-reserved-egress   Dynamic  inf   -                  2,5
    
    Exception List
    =================
            description
        --  -----------------------------------------------------------------------…
        1   RoCE PFC Priority Mismatch.Expected pfc-priority: 3.
        2   Congestion Config TC Mismatch.Expected enabled-tc: 0,3.
        3   Congestion Config mode Mismatch.Expected congestion-mode: ECN.
        4   Congestion Config min-threshold Mismatch.Expected min-threshold: 150000.
        5   Congestion Config max-threshold Mismatch.Expected max-threshold:
            1500000.
        6   Scheduler config mismatch for traffic-class mapped to switch-prio0.
            Expected scheduler-weight: DWRR-50%.
        7   Scheduler config mismatch for traffic-class mapped to switch-prio1.
            Expected scheduler-weight: DWRR-50%.
        8   Scheduler config mismatch for traffic-class mapped to switch-prio2.
            Expected scheduler-weight: DWRR-50%.
        9   Scheduler config mismatch for traffic-class mapped to switch-prio3.
            Expected scheduler-weight: DWRR-50%.
        10  Scheduler config mismatch for traffic-class mapped to switch-prio4.
            Expected scheduler-weight: DWRR-50%.
        11  Scheduler config mismatch for traffic-class mapped to switch-prio5.
            Expected scheduler-weight: DWRR-50%.
        12  Scheduler config mismatch for traffic-class mapped to switch-prio6.
            Expected scheduler-weight: strict-priority.
        13  Scheduler config mismatch for traffic-class mapped to switch-prio7.
            Expected scheduler-weight: DWRR-50%.
        14  Invalid reserved config for ePort.TC[2].Expected 0 Got 1024
        15  Invalid reserved config for ePort.TC[5].Expected 0 Got 1024
        16  Invalid traffic-class mapping for switch-priority 2.Expected 0 Got 2
        17  Invalid traffic-class mapping for switch-priority 3.Expected 3 Got 0
        18  Invalid traffic-class mapping for switch-priority 5.Expected 0 Got 5
        19  Invalid traffic-class mapping for switch-priority 6.Expected 6 Got 0
    Incomplete Command: set interface swp3-16 link fast-linkupp3-16 link fast-linkup
    Incomplete Command: set interface swp3-16 link fast-linkupp3-16 link fast-linkup
    Incomplete Command: set interface swp3-16 link fast-linkupp3-16 link fast-linkup
    备注 所列例外情况不影响性能,可以安全地忽略。
  6. 请核对接口中收发器的信息:

    admin@sw1:mgmt:~$ nv show interface --view=pluggables
    Interface  Identifier     Vendor Name  Vendor PN        Vendor SN       Vendor Rev
    ---------  -------------  -----------  ---------------  --------------  ----------
    swp1s0     0x00 None
    swp1s1     0x00 None
    swp1s2     0x00 None
    swp1s3     0x00 None
    swp2s0     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp2s1     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp2s2     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp2s3     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp3       0x00 None
    swp4       0x00 None
    swp5       0x00 None
    swp6       0x00 None
    .
    .
    .
    swp15      0x11 (QSFP28)  Amphenol     112-00595        APF20279210117  B0
    swp16      0x11 (QSFP28)  Amphenol     112-00595        APF20279210166  B0
  7. 确认每个节点都与每个交换机有连接:

    admin@sw1:mgmt:~$ nv show interface --view=lldp
    
    LocalPort  Speed  Mode        RemoteHost               RemotePort
    ---------  -----  ----------  -----------------------  -----------
    eth0       100M   Mgmt        mgmt-sw1                 Eth110/1/29
    swp2s1     25G    Trunk/L2    node1                    e0a
    swp15      100G   BondMember  sw2                      swp15
    swp16      100G   BondMember  sw2                      swp16
  8. 检查集群上集群端口的运行状况。

    1. 确认集群中所有节点的集群端口均已启动且运行状况良好:

      cluster1::*> network port show -role cluster
      
      Node: node1
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
      Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
      --------- ------------ ---------------- ---- ---- ----------- -------- ------
      e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
      e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
      
      Node: node2
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
      Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
      --------- ------------ ---------------- ---- ---- ----------- -------- ------
      e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
      e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
    2. 从集群验证交换机的健康状况(这可能不会显示交换机 sw2,因为 LIF 没有归位到 e0d)。

      cluster1::*> network device-discovery show -protocol lldp
      Node/       Local  Discovered
      Protocol    Port   Device (LLDP: ChassisID)  Interface Platform
      ----------- ------ ------------------------- --------- ----------
      node1/lldp
                  e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3      -
                  e3b    sw2 (b8:ce:f6:19:1b:96)   swp3      -
      
      node2/lldp
                  e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4      -
                  e3b    sw2 (b8:ce:f6:19:1b:96)   swp4      -
      
      
      cluster1::*> system switch ethernet show -is-monitoring-enabled-operational true
      Switch                      Type               Address          Model
      --------------------------- ------------------ ---------------- -----
      sw1                         cluster-network    10.233.205.90    MSN2100-CB2RC
           Serial Number: MNXXXXXXGD
            Is Monitored: true
                  Reason: None
        Software Version: Cumulus Linux version 5.4.0 running on Mellanox
                          Technologies Ltd. MSN2100
          Version Source: LLDP
      
      sw2                         cluster-network    10.233.205.91    MSN2100-CB2RC
           Serial Number: MNCXXXXXXGS
            Is Monitored: true
                  Reason: None
        Software Version: Cumulus Linux version 5.4.0 running on Mellanox
                          Technologies Ltd. MSN2100
          Version Source: LLDP
  9. 验证集群是否运行正常:

    cluster show

  10. 对第二个开关重复步骤 1 至 14。

  11. 启用集群 LIF 的自动回滚功能。

    network interface modify -vserver Cluster -lif * -auto-revert true

  1. 将集群交换机连接到管理网络。

  2. 使用 `ping`用于验证与托管 Cumulus Linux 和 RCF 的服务器的连接性的命令。

  3. 显示每个节点上连接到集群交换机的集群端口:

    network device-discovery show

  4. 检查每个集群端口的管理和运行状态。

    1. 确认集群所有端口均已启动且状态正常:

      network port show -role cluster

    2. 确认所有集群接口(LIF)都位于主端口上:

      network interface show -role cluster

    3. 确认集群显示两个集群交换机的信息:

      system cluster-switch show -is-monitoring-enabled-operational true

  5. 禁用集群 LIF 的自动回滚功能。集群 LIF 会故障转移到伙伴集群交换机,并在您对目标交换机执行升级过程时保留在该交换机上:

    network interface modify -vserver Cluster -lif * -auto-revert false

  • 如果您要升级 RCF,则必须在此步骤中禁用自动回滚功能。

  • 如果您刚刚升级了 Cumulus Linux 版本,则无需在此步骤中禁用自动还原功能,因为它已经禁用。

  1. 显示SN2100交换机上的可用接口:

    admin@sw1:mgmt:~$ nv show interface
    Interface     MTU   Speed State Remote Host         Remote Port- Type      Summary
    ------------- ----- ----- ----- ------------------- ------------ --------- -------------
    + cluster_isl 9216  200G  up                                      bond
    + eth0        1500  100M  up    mgmt-sw1            Eth105/1/14   eth       IP Address: 10.231.80 206/22
      eth0                                                                      IP Address: fd20:8b1e:f6ff:fe31:4a0e/64
    + lo          65536       up                                      loopback  IP Address: 127.0.0.1/8
      lo                                                                        IP Address: ::1/128
    + swp1s0      9216 10G    up cluster01                e0b         swp
    .
    .
    .
    + swp15      9216 100G    up sw2                      swp15       swp
    + swp16      9216 100G    up sw2                      swp16       swp
  2. 将 RCF Python 脚本复制到交换机。

    cumulus@cumulus:mgmt:~$ cd /tmp
    cumulus@cumulus:mgmt:/tmp$ scp <user>@<host:/<path>/MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP .
    ssologin@10.233.204.71's password:
    MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP          100% 8607   111.2KB/s         00:00
    备注 虽然 `scp`如果示例中使用的是这种方式,您可以使用您喜欢的文件传输方式,例如 SFTP、HTTPS 或 FTP。
  3. 应用 RCF python 脚本 MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP

    cumulus@cumulus:mgmt:/tmp$ sudo python3 MSN2100-RCF-v1.x-Cluster-HA-Breakout-LLDP
    [sudo] password for cumulus:
    .
    .
    Step 1: Creating the banner file
    Step 2: Registering banner message
    Step 3: Updating the MOTD file
    Step 4: Ensuring passwordless use of cl-support command by admin
    Step 5: Disabling apt-get
    Step 6: Creating the interfaces
    Step 7: Adding the interface config
    Step 8: Disabling cdp
    Step 9: Adding the lldp config
    Step 10: Adding the RoCE base config
    Step 11: Modifying RoCE Config
    Step 12: Configure SNMP
    Step 13: Reboot the switch

    RCF 脚本完成了上面示例中列出的步骤。

    备注 在上述步骤 3 更新 MOTD 文件 中,运行命令 cat /etc/issue.net。这样,您就可以验证 RCF 文件名、RCF 版本、要使用的端口以及 RCF 横幅中的其他重要信息。

    例如:

    admin@sw1:mgmt:~$ cat /etc/issue.net
    ******************************************************************************
    *
    * NetApp Reference Configuration File (RCF)
    * Switch       : Mellanox MSN2100
    * Filename     : MSN2100-RCF-1._x_-Cluster-HA-Breakout-LLDP
    * Release Date : 13-02-2023
    * Version      : 1._x_-Cluster-HA-Breakout-LLDP
    *
    * Port Usage:
    * Port 1      : 4x10G Breakout mode for Cluster+HA Ports, swp1s0-3
    * Port 2      : 4x25G Breakout mode for Cluster+HA Ports, swp2s0-3
    * Ports 3-14  : 40/100G for Cluster+HA Ports, swp3-14
    * Ports 15-16 : 100G Cluster ISL Ports, swp15-16
    *
    * NOTE:
    *   RCF manually sets swp1s0-3 link speed to 10000 and
    *   auto-negotiation to off for Intel 10G
    *   RCF manually sets swp2s0-3 link speed to 25000 and
    *   auto-negotiation to off for Chelsio 25G
    *
    *
    * IMPORTANT: Perform the following steps to ensure proper RCF installation:
    * - Copy the RCF file to /tmp
    * - Ensure the file has execute permission
    * - From /tmp run the file as sudo python3 <filename>
    *
    ******************************************************************************
    备注 如果遇到任何无法解决的 RCF Python 脚本问题,请联系我们。 "NetApp 支持"寻求帮助。
  4. 将之前对交换机配置所做的任何自定义设置重新应用。请参阅"审查布线和配置注意事项"有关任何后续变更的详细信息。

  5. 重启后验证配置:

    admin@sw1:mgmt:~$ nv show interface
    Interface     MTU   Speed State Remote Host         Remote Port- Type      Summary
    ------------- ----- ----- ----- ------------------- ------------ --------- -------------
    + cluster_isl 9216  200G  up                                      bond
    + eth0        1500  100M  up    mgmt-sw1            Eth105/1/14   eth       IP Address: 10.231.80 206/22
      eth0                                                                      IP Address: fd20:8b1e:f6ff:fe31:4a0e/64
    + lo          65536       up                                      loopback  IP Address: 127.0.0.1/8
      lo                                                                        IP Address: ::1/128
    + swp1s0      9216 10G    up cluster01                e0b         swp
    .
    .
    .
    + swp15      9216 100G    up sw2                      swp15       swp
    + swp16      9216 100G    up sw2                      swp16       swp
    
    admin@sw1:mgmt:~$ nv show qos roce
                       operational  applied   description
    -----------------  -----------  --------- ----------------------------------------
    enable             on                     Turn feature 'on' or 'off'. This feature is disabled by default.
    mode               lossless     lossless  Roce Mode
    congestion-control
      congestion-mode   ECN,RED                Congestion config mode
      enabled-tc        0,2,5                  Congestion config enabled Traffic Class
      max-threshold     195.31 KB              Congestion config max-threshold
      min-threshold     39.06 KB               Congestion config min-threshold
      probability       100
    lldp-app-tlv
      priority          3                      switch-priority of roce
      protocol-id       4791                   L4 port number
      selector          UDP                    L4 protocol
    pfc
      pfc-priority      2, 5                   switch-prio on which PFC is enabled
      rx-enabled        enabled                PFC Rx Enabled status
      tx-enabled        enabled                PFC Tx Enabled status
    trust
      trust-mode        pcp,dscp               Trust Setting on the port for packet classification
    
    RoCE PCP/DSCP->SP mapping configurations
    ===========================================
            pcp  dscp                     switch-prio
        --  ---  -----------------------  -----------
        0   0    0,1,2,3,4,5,6,7          0
        1   1    8,9,10,11,12,13,14,15    1
        2   2    16,17,18,19,20,21,22,23  2
        3   3    24,25,26,27,28,29,30,31  3
        4   4    32,33,34,35,36,37,38,39  4
        5   5    40,41,42,43,44,45,46,47  5
        6   6    48,49,50,51,52,53,54,55  6
        7   7    56,57,58,59,60,61,62,63  7
    
    RoCE SP->TC mapping and ETS configurations
    =============================================
            switch-prio  traffic-class  scheduler-weight
        --  -----------  -------------  ----------------
        0   0            0              DWRR-28%
        1   1            0              DWRR-28%
        2   2            2              DWRR-28%
        3   3            0              DWRR-28%
        4   4            0              DWRR-28%
        5   5            5              DWRR-43%
        6   6            0              DWRR-28%
        7   7            0              DWRR-28%
    
    RoCE pool config
    ===================
            name                   mode     size  switch-priorities  traffic-class
        --  ---------------------  -------  ----  -----------------  -------------
        0   lossy-default-ingress  Dynamic  50%   0,1,3,4,6,7        -
        1   roce-reserved-ingress  Dynamic  50%   2,5                -
        2   lossy-default-egress   Dynamic  50%   -                  0
        3   roce-reserved-egress   Dynamic  inf   -                  2,5
    
    Exception List
    =================
            description
        --  -----------------------------------------------------------------------…
        1   RoCE PFC Priority Mismatch.Expected pfc-priority: 3.
        2   Congestion Config TC Mismatch.Expected enabled-tc: 0,3.
        3   Congestion Config mode Mismatch.Expected congestion-mode: ECN.
        4   Congestion Config min-threshold Mismatch.Expected min-threshold: 150000.
        5   Congestion Config max-threshold Mismatch.Expected max-threshold:
            1500000.
        6   Scheduler config mismatch for traffic-class mapped to switch-prio0.
            Expected scheduler-weight: DWRR-50%.
        7   Scheduler config mismatch for traffic-class mapped to switch-prio1.
            Expected scheduler-weight: DWRR-50%.
        8   Scheduler config mismatch for traffic-class mapped to switch-prio2.
            Expected scheduler-weight: DWRR-50%.
        9   Scheduler config mismatch for traffic-class mapped to switch-prio3.
            Expected scheduler-weight: DWRR-50%.
        10  Scheduler config mismatch for traffic-class mapped to switch-prio4.
            Expected scheduler-weight: DWRR-50%.
        11  Scheduler config mismatch for traffic-class mapped to switch-prio5.
            Expected scheduler-weight: DWRR-50%.
        12  Scheduler config mismatch for traffic-class mapped to switch-prio6.
            Expected scheduler-weight: strict-priority.
        13  Scheduler config mismatch for traffic-class mapped to switch-prio7.
            Expected scheduler-weight: DWRR-50%.
        14  Invalid reserved config for ePort.TC[2].Expected 0 Got 1024
        15  Invalid reserved config for ePort.TC[5].Expected 0 Got 1024
        16  Invalid traffic-class mapping for switch-priority 2.Expected 0 Got 2
        17  Invalid traffic-class mapping for switch-priority 3.Expected 3 Got 0
        18  Invalid traffic-class mapping for switch-priority 5.Expected 0 Got 5
        19  Invalid traffic-class mapping for switch-priority 6.Expected 6 Got 0
    Incomplete Command: set interface swp3-16 link fast-linkupp3-16 link fast-linkup
    Incomplete Command: set interface swp3-16 link fast-linkupp3-16 link fast-linkup
    Incomplete Command: set interface swp3-16 link fast-linkupp3-16 link fast-linkup
    备注 所列例外情况不影响性能,可以忽略。
  6. 请核对接口中收发器的信息:

    admin@sw1:mgmt:~$ nv show platform transceiver
    Interface  Identifier     Vendor Name  Vendor PN        Vendor SN       Vendor Rev
    ---------  -------------  -----------  ---------------  --------------  ----------
    swp1s0     0x00 None
    swp1s1     0x00 None
    swp1s2     0x00 None
    swp1s3     0x00 None
    swp2s0     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp2s1     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp2s2     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp2s3     0x11 (QSFP28)  CISCO-LEONI  L45593-D278-D20  LCC2321GTTJ     00
    swp3       0x00 None
    swp4       0x00 None
    swp5       0x00 None
    swp6       0x00 None
    .
    .
    .
    swp15      0x11 (QSFP28)  Amphenol     112-00595        APF20279210117  B0
    swp16      0x11 (QSFP28)  Amphenol     112-00595        APF20279210166  B0
  7. 确认每个节点都与每个交换机有连接:

    admin@sw1:mgmt:~$ nv show interface lldp
    
    LocalPort  Speed  Mode        RemoteHost               RemotePort
    ---------  -----  ----------  -----------------------  -----------
    eth0       100M   Mgmt        mgmt-sw1                 Eth110/1/29
    swp2s1     25G    Trunk/L2    node1                    e0a
    swp15      100G   BondMember  sw2                      swp15
    swp16      100G   BondMember  sw2                      swp16
  8. 检查集群上集群端口的运行状况。

    1. 确认集群中所有节点的集群端口均已启动且运行状况良好:

      cluster1::*> network port show -role cluster
      
      Node: node1
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
      Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
      --------- ------------ ---------------- ---- ---- ----------- -------- ------
      e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
      e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
      
      Node: node2
                                                                             Ignore
                                                        Speed(Mbps) Health   Health
      Port      IPspace      Broadcast Domain Link MTU  Admin/Oper  Status   Status
      --------- ------------ ---------------- ---- ---- ----------- -------- ------
      e3a       Cluster      Cluster          up   9000  auto/10000 healthy  false
      e3b       Cluster      Cluster          up   9000  auto/10000 healthy  false
    2. 从集群验证交换机的健康状况(这可能不会显示交换机 sw2,因为 LIF 没有归位到 e0d)。

      cluster1::*> network device-discovery show -protocol lldp
      Node/       Local  Discovered
      Protocol    Port   Device (LLDP: ChassisID)  Interface Platform
      ----------- ------ ------------------------- --------- ----------
      node1/lldp
                  e3a    sw1 (b8:ce:f6:19:1a:7e)   swp3      -
                  e3b    sw2 (b8:ce:f6:19:1b:96)   swp3      -
      
      node2/lldp
                  e3a    sw1 (b8:ce:f6:19:1a:7e)   swp4      -
                  e3b    sw2 (b8:ce:f6:19:1b:96)   swp4      -
      
      
      cluster1::*> system switch ethernet show -is-monitoring-enabled-operational true
      Switch                      Type               Address          Model
      --------------------------- ------------------ ---------------- -----
      sw1                         cluster-network    10.233.205.90    MSN2100-CB2RC
           Serial Number: MNXXXXXXGD
            Is Monitored: true
                  Reason: None
        Software Version: Cumulus Linux version 5.4.0 running on Mellanox
                          Technologies Ltd. MSN2100
          Version Source: LLDP
      
      sw2                         cluster-network    10.233.205.91    MSN2100-CB2RC
           Serial Number: MNCXXXXXXGS
            Is Monitored: true
                  Reason: None
        Software Version: Cumulus Linux version 5.4.0 running on Mellanox
                          Technologies Ltd. MSN2100
          Version Source: LLDP
  9. 验证集群是否运行正常:

    cluster show

  10. 对第二个开关重复步骤 1 至 14。

  11. 启用集群 LIF 的自动回滚功能。

    network interface modify -vserver Cluster -lif * -auto-revert true

下一步是什么?

安装完 RCF 后,您可以…… "安装 CSHM 文件"