Configure Network Switches (Automated Deployment)

12/02/2020 Contributors

Prepare Required VLAN IDs

The following table lists the necessary VLANs for deployment, as outlined in this solution validation. You should configure these VLANs on the network switches prior to executing NDE.

Network Segment	Details	VLAN ID
Out-of-band management network	Network for HCI terminal user interface (TUI)	16
In-band management network	Network for accessing management interfaces of nodes, hosts, and guests	3488
VMware vMotion	Network for live migration of VMs	3489
iSCSI SAN storage	Network for iSCSI storage traffic	3490
Application	Network for Application traffic	3487
NFS	Network for NFS storage traffic	3491
IPL*	Interpeer link between Mellanox switches	4000
Native	Native VLAN	2

Network Segment

Details

VLAN ID

Out-of-band management network

Network for HCI terminal user interface (TUI)

In-band management network

Network for accessing management interfaces of nodes, hosts, and guests

3488

VMware vMotion

Network for live migration of VMs

3489

iSCSI SAN storage

Network for iSCSI storage traffic

3490

Application

Network for Application traffic

3487

NFS

Network for NFS storage traffic

3491

IPL*

Interpeer link between Mellanox switches

4000

Native

Native VLAN

*Only for Mellanox switches

Switch Configuration

This solution uses Mellanox SN2010 switches running Onyx. The Mellanox switches are configured using an Ansible playbook. Prior to running the Ansible playbook, you should perform the initial configuration of the switches manually:

Install and cable the switches to the uplink switch, compute, and storage nodes.
Power on the switches and configure them with the following details:
1. Host name
2. Management IP and gateway
3. NTP
Log into the Mellanox switches and run the following commands:
```
configuration write to pre-ansible
configuration write to post-ansible
```
The pre-ansible configuration file created can be used to restore the switch’s configuration to the state before the Ansible playbook execution.

The switch configuration for this solution is stored in the post-ansible configuration file.
The configuration playbook for Mellanox switches that follows best practices and requirements for NetApp HCI can be downloaded from the NetApp HCI Toolkit.

The HCI Toolkit also provides a playbook to setup Cisco Nexus switches with similar best practices and requirements for NetApp HCI.

Additional guidance on populating the variables and executing the playbook is available in the respective switch README.md file.

Fill out the credentials to access the switches and variables needed for the environment. The following text is a sample of the variable file for this solution.

# vars file for nar_hci_mellanox_deploy
#These set of variables will setup the Mellanox switches for NetApp HCI that uses a 2-cable compute connectivity option.
#Ansible connection variables for mellanox
ansible_connection: network_cli
ansible_network_os: onyx
#--------------------
# Primary Variables
#--------------------
#Necessary VLANs for Standard NetApp HCI Deployment [native, Management, iSCSI_Storage, vMotion, VM_Network, IPL]
#Any additional VLANs can be added to this in the prescribed format below
netapp_hci_vlans:
- {vlan_id: 2 , vlan_name: "Native" }
- {vlan_id: 3488 , vlan_name: "IB-Management" }
- {vlan_id: 3490 , vlan_name: "iSCSI_Storage" }
- {vlan_id: 3489 , vlan_name: "vMotion" }
- {vlan_id: 3491 , vlan_name: "NFS " }
- {vlan_id: 3487 , vlan_name: "App_Network" }
- {vlan_id: 4000 , vlan_name: "IPL" }#Modify the VLAN IDs to suit your environment
#Spanning-tree protocol type for uplink connections.
#The valid options are 'network' and 'normal'; selection depends on the uplink switch model.
uplink_stp_type: network
#----------------------
# IPL variables
#----------------------
#Inter-Peer Link Portchannel
#ipl_portchannel to be defined in the format - Po100
ipl_portchannel: Po100
#Inter-Peer Link Addresses
#The IPL IP address should not be part of the management network. This is typically a private network
ipl_ipaddr_a: 10.0.0.1
ipl_ipaddr_b: 10.0.0.2
#Define the subnet mask in CIDR number format. Eg: For subnet /22, use ipl_ip_subnet: 22
ipl_ip_subnet: 24
#Inter-Peer Link Interfaces
#members to be defined with Eth in the format. Eg: Eth1/1
peer_link_interfaces:
  members: ['Eth1/20', 'Eth1/22']
  description: "peer link interfaces"
#MLAG VIP IP address should be in the same subnet as that of the switches' mgmt0 interface subnet
#mlag_vip_ip to be defined in the format - <vip_ip>/<subnet_mask>. Eg: x.x.x.x/y
mlag_vip_ip: <<mlag_vip_ip>>
#MLAG VIP Domain Name
#The mlag domain must be unique name for each mlag domain.
#In case you have more than one pair of MLAG switches on the same network, each domain (consist of two switches) should be configured with different name.
mlag_domain_name: MLAG-VIP-DOM
#---------------------
# Interface Details
#---------------------
#Storage Bond10G Interface details
#members to be defined with Eth in the format. Eg: Eth1/1
#Only numerical digits between 100 to 1000 allowed for mlag_id
#Operational link speed [variable 'speed' below] to be defined in terms of bytes.
#For 10 Gigabyte operational speed, define 10G. [Possible values - 10G and 25G]
#Interface descriptions append storage node data port numbers assuming all Storage Nodes' Port C -> Mellanox Switch A and all Storage Nodes' Port D -> Mellanox Switch B
#List the storage Bond10G interfaces, their description, speed and MLAG IDs in list of dictionaries format
storage_interfaces:
- {members: "Eth1/1", description: "HCI_Storage_Node_01", mlag_id: 101, speed: 25G}
- {members: "Eth1/2", description: "HCI_Storage_Node_02", mlag_id: 102, speed: 25G}
#In case of additional storage nodes, add them here
#Storage Bond1G Interface
#Mention whether or not these Mellanox switches will also be used for Storage Node Mgmt connections
#Possible inputs for storage_mgmt are 'yes' and 'no'
storage_mgmt: <<yes or no>>
#Storage Bond1G (Mgmt) interface details. Only if 'storage_mgmt' is set to 'yes'
#Members to be defined with Eth in the format. Eg: Eth1/1
#Interface descriptions append storage node management port numbers assuming all Storage Nodes' Port A -> Mellanox Switch A and all Storage Nodes' Port B -> Mellanox Switch B
#List the storage Bond1G interfaces and their description in list of dictionaries format
storage_mgmt_interfaces:
- {members: "Ethx/y", description: "HCI_Storage_Node_01"}
- {members: "Ethx/y", description: "HCI_Storage_Node_02"}
#In case of additional storage nodes, add them here
#LACP load balancing algorithm for IP hash method
#Possible options are: 'destination-mac', 'destination-ip', 'destination-port', 'source-mac', 'source-ip', 'source-port', 'source-destination-mac', 'source-destination-ip', 'source-destination-port'
#This variable takes multiple options in a single go
#For eg: if you want to configure load to be distributed in the port-channel based on the traffic source and destination IP address and port number, use 'source-destination-ip source-destination-port'
#By default, Mellanox sets it to source-destination-mac. Enter the values below only if you intend to configure any other load balancing algorithm
#Make sure the load balancing algorithm that is set here is also replicated on the host side
#Recommended algorithm is source-destination-ip source-destination-port
#Fill the lacp_load_balance variable only if you are using configuring interfaces on compute nodes in bond or LAG with LACP
lacp_load_balance: "source-destination-ip source-destination-port"
#Compute Interface details
#Members to be defined with Eth in the format. Eg: Eth1/1
#Fill the mlag_id field only if you intend to configure interfaces of compute nodes into bond or LAG with LACP
#In case you do not intend to configure LACP on interfaces of compute nodes, either leave the mlag_id field unfilled or comment it or enter NA in the mlag_id field
#In case you have a mixed architecture where some compute nodes require LACP and some don't,
#1. Fill the mlag_id field with appropriate MLAG ID for interfaces that connect to compute nodes requiring LACP
#2. Either fill NA or leave the mlag_id field blank or comment it for interfaces connecting to compute nodes that do not require LACP
#Only numerical digits between 100 to 1000 allowed for mlag_id.
#Operational link speed [variable 'speed' below] to be defined in terms of bytes.
#For 10 Gigabyte operational speed, define 10G. [Possible values - 10G and 25G]
#Interface descriptions append compute node port numbers assuming all Compute Nodes' Port D -> Mellanox Switch A and all Compute Nodes' Port E -> Mellanox Switch B
#List the compute interfaces, their speed, MLAG IDs and their description in list of dictionaries format
compute_interfaces:
- members: "Eth1/7"#Compute Node for ESXi, setup by NDE
  description: "HCI_Compute_Node_01"
  mlag_id: #Fill the mlag_id only if you wish to use LACP on interfaces towards compute nodes
  speed: 25G
- members: "Eth1/8"#Compute Node for ESXi, setup by NDE
  description: "HCI_Compute_Node_02"
  mlag_id: #Fill the mlag_id only if you wish to use LACP on interfaces towards compute nodes
  speed: 25G
#In case of additional compute nodes, add them here in the same format as above- members: "Eth1/9"#Compute Node for Kubernetes Worker node
  description: "HCI_Compute_Node_01"
  mlag_id: 109 #Fill the mlag_id only if you wish to use LACP on interfaces towards compute nodes
  speed: 10G
- members: "Eth1/10"#Compute Node for Kubernetes Worker node
  description: "HCI_Compute_Node_02"
  mlag_id: 110 #Fill the mlag_id only if you wish to use LACP on interfaces towards compute nodes
  speed: 10G
#Uplink Switch LACP support
#Possible options are 'yes' and 'no' - Set to 'yes' only if your uplink switch supports LACP
uplink_switch_lacp: <<yes or no>>
#Uplink Interface details
#Members to be defined with Eth in the format. Eg: Eth1/1
#Only numerical digits between 100 to 1000 allowed for mlag_id.
#Operational link speed [variable 'speed' below] to be defined in terms of bytes.
#For 10 Gigabyte operational speed, define 10G. [Possible values in Mellanox are 1G, 10G and 25G]
#List the uplink interfaces, their description, MLAG IDs and their speed in list of dictionaries format
uplink_interfaces:
- members: "Eth1/18"
  description_switch_a: "SwitchA:Ethx/y -> Uplink_Switch:Ethx/y"
  description_switch_b: "SwitchB:Ethx/y -> Uplink_Switch:Ethx/y"
  mlag_id: 118  #Fill the mlag_id only if 'uplink_switch_lacp' is set to 'yes'
  speed: 10G
  mtu: 1500

The fingerprint for the switch’s key must match with that present in the host machine from where the playbook is being executed. To ensure this, add the key to /root/. ssh/known_host or any other appropriate location.

Rollback the Switch Configuration

In case of any timeout failures or partial configuration, run the following command to roll back the switch to the initial state.
```
configuration switch-to pre-ansible
```
This operation requires a reboot of the switch.
Switch the configuration to the state before running the Ansible playbook.
```
configuration delete post-ansible
```
Delete the post-ansible file that had the configuration from the Ansible playbook.
```
configuration write to post-ansible
```
Create a new file with the same name post-ansible, write the pre-ansible configuration to it, and switch to the new configuration to restart configuration.

IP Address Requirements

The deployment of the NetApp HCI inferencing platform with VMware and Kubernetes requires multiple IP addresses to be allocated. The following table lists the number of IP addresses required. Unless otherwise indicated, addresses are assigned automatically by NDE.

IP Address Quantity	Details	VLAN ID
One per storage and compute node*	HCI terminal user interface (TUI) addresses	16
One per vCenter Server (VM)	vCenter Server management address	3488
One per management node (VM)	Management node IP address
One per ESXi host	ESXi compute management addresses
One per storage/witness node	NetApp HCI storage node management addresses
One per storage cluster	Storage cluster management address
One per ESXi host	VMware vMotion address	3489
Two per ESXi host	ESXi host initiator address for iSCSI storage traffic	3490
Two per storage node	Storage node target address for iSCSI storage traffic
Two per storage cluster	Storage cluster target address for iSCSI storage traffic
Two for mNode	mNode iSCSI storage access

IP Address Quantity

Details

VLAN ID

IP Address

One per storage and compute node*

HCI terminal user interface (TUI) addresses

One per vCenter Server (VM)

vCenter Server management address

3488

One per management node (VM)

Management node IP address

One per ESXi host

ESXi compute management addresses

One per storage/witness node

NetApp HCI storage node management addresses

One per storage cluster

Storage cluster management address

One per ESXi host

VMware vMotion address

3489

Two per ESXi host

ESXi host initiator address for iSCSI storage traffic

3490

Two per storage node

Storage node target address for iSCSI storage traffic

Two per storage cluster

Storage cluster target address for iSCSI storage traffic

Two for mNode

mNode iSCSI storage access

The following IPs are assigned manually when the respective components are configured.

IP Address Quantity	Details	VLAN ID
One for Deployment Jump Management network	Deployment Jump VM to execute Ansible playbooks and configure other parts of the system – management connectivity	3488
One per Kubernetes master node – management network	Kubernetes master node VMs (three nodes)	3488
One per Kubernetes worker node – management network	Kubernetes worker nodes (two nodes)	3488
One per Kubernetes worker node – NFS network	Kubernetes worker nodes (two nodes)	3491
One per Kubernetes worker node – application network	Kubernetes worker nodes (two nodes)	3487
Three for ONTAP Select – management network	ONTAP Select VM	3488
One for ONTAP Select – NFS network	ONTAP Select VM – NFS data traffic	3491
At least two for Triton Inference Server Load Balancer – application network	Load balancer IP range for Kubernetes load balancer service	3487

IP Address Quantity

Details

VLAN ID

IP Address

One for Deployment Jump Management network

Deployment Jump VM to execute Ansible playbooks and configure other parts of the system – management connectivity

3488

One per Kubernetes master node – management network

Kubernetes master node VMs (three nodes)

3488

One per Kubernetes worker node – management network

Kubernetes worker nodes (two nodes)

3488

One per Kubernetes worker node – NFS network

Kubernetes worker nodes (two nodes)

3491

One per Kubernetes worker node – application network

Kubernetes worker nodes (two nodes)

3487

Three for ONTAP Select – management network

ONTAP Select VM

3488

One for ONTAP Select – NFS network

ONTAP Select VM – NFS data traffic

3491

At least two for Triton Inference Server Load Balancer – application network

Load balancer IP range for Kubernetes load balancer service

3487

*This validation requires the initial setup of the first storage node TUI address. NDE automatically assigns the TUI address for subsequent nodes.

DNS and Timekeeping Requirement

Depending on your deployment, you might need to prepare DNS records for your NetApp HCI system. NetApp HCI requires a valid NTP server for timekeeping; you can use a publicly available time server if you do not have one in your environment.

This validation involves deploying NetApp HCI with a new VMware vCenter Server instance using a fully qualified domain name (FQDN). Before deployment, you must have one Pointer (PTR) record and one Address (A) record created on the DNS server.

Next: Virtual Infrastructure with Automated Deployment

Configure Network Switches (Automated Deployment)

Creating your file...

Prepare Required VLAN IDs

Switch Configuration

Rollback the Switch Configuration

IP Address Requirements

DNS and Timekeeping Requirement