Specify Common File Node Configuration

01/31/2023 Contributors

PDFs

Specify common file node configuration using group variables (group_vars).

Overview

Configuration that should apple to all file nodes is defined at group_vars/ha_cluster.yml. It commonly includes:

Details on how to connect and login to each file node.
Common networking configuration.
Whether automatic reboots are allowed.
How firewall and selinux states should be configured.
Cluster configuration including alerting and fencing.
Performance tuning.
Common BeeGFS service configuration.

The options set in this file can also be defined on individual file nodes, for example if mixed hardware models are in use, or you have different passwords for each node. Configuration on individual file nodes will take precedence over the configuration in this file.

Steps

Create the file group_vars/ha_cluster.yml and populate it as follows:

Indicate how the Ansible Control node should authenticate with the remote hosts:

ansible_ssh_user: root
ansible_become_password: <PASSWORD>

Particularly for production environments, do not store passwords in plain text. Instead, use Ansible Vault (see Encrypting content with Ansible Vault) or the --ask-become-pass option when running the playbook. If the ansible_ssh_user is already root, then you can optionally omit the ansible_become_password.

If you are configuring static IPs on ethernet or InfiniBand interfaces (for example cluster IPs) and multiple interfaces are in the same IP subnet (for example if ib0 is using 192.168.1.10/24 and ib1 is using 192.168.1.11/24), additional IP routing tables and rules must be setup for multi-homed support to work properly. Simply enable the provided network interface configuration hook as follows:
```
eseries_ip_default_hook_templates:
  - 99-multihoming.j2
```
When deploying the cluster, depending on the storage protocol it may be necessary for nodes to be rebooted to facilitate discovering remote block devices (E-Series volumes) or apply other aspects of the configuration. By default nodes will prompt before rebooting, but you can allow nodes to restart automatically by specifying the following:
```
eseries_common_allow_host_reboot: true
```
1. By default after a reboot, to ensure block devices and other services are ready Ansible will wait until the systemd default.target is reached before continuing with the deployment. In some scenarios when NVMe/IB is in use, this may not be long enough to initialize, discover, and connect to remote devices. This can result in the automated deployment continuing prematurely and failing. To avoid this when using NVMe/IB also define the following:
  eseries_common_reboot_test_command: "! systemctl status eseries_nvme_ib.service || systemctl --state=exited | grep eseries_nvme_ib.service"
A number of firewall ports are required for BeeGFS and HA cluster services to communicate. Unless you wish to configure the firwewall manually (not recommended), specify the following to have required firewall zones created and ports opened automatically:
```
beegfs_ha_firewall_configure: True
```
At this time SELinux is not supported, and it is is recommended the state be set to disabled to avoid conflicts (especially when RDMA is in use). Set the following to ensure SELinux is disabled:
```
eseries_beegfs_ha_disable_selinux: True
eseries_selinux_state: disabled
```

Configure authentication so file nodes are able to communicate, adjusting the defaults as needed based on your organizations policies:

beegfs_ha_cluster_name: hacluster                  # BeeGFS HA cluster name.
beegfs_ha_cluster_username: hacluster              # BeeGFS HA cluster username.
beegfs_ha_cluster_password: hapassword             # BeeGFS HA cluster username's password.
beegfs_ha_cluster_password_sha512_salt: randomSalt # BeeGFS HA cluster username's password salt.

Based on the Plan the File System section specify the BeeGFS management IP for this file system:

beegfs_ha_mgmtd_floating_ip: <IP ADDRESS>

While seemingly redundant, beegfs_ha_mgmtd_floating_ip is important when you scale the BeeGFS file system beyond a single HA cluster. Subsequent HA clusters are deployed without an additional BeeGFS management service and point at the management service provided by the first cluster.

Enable email alerts if desired:

beegfs_ha_enable_alerts: True
# E-mail recipient list for notifications when BeeGFS HA resources change or fail.
beegfs_ha_alert_email_list: ["<EMAIL>"]
# This dictionary is used to configure postfix service (/etc/postfix/main.cf) which is required to set email alerts.
beegfs_ha_alert_conf_ha_group_options:
      # This parameter specifies the local internet domain name. This is optional when the cluster nodes have fully qualified hostnames (i.e. host.example.com)
      mydomain: <MY_DOMAIN>
beegfs_ha_alert_verbosity: 3
#  1) high-level node activity
#  3) high-level node activity + fencing action information + resources (filter on X-monitor)
#  5) high-level node activity + fencing action information + resources

Enabling fencing is strongly recommended, otherwise services can be blocked from starting on secondary nodes when the primary node fails.
1. Enable fencing globally by specifying the following:
  beegfs_ha_cluster_crm_config_options: stonith-enabled: True
  1. Note any supported cluster property can also be specified here if needed. Adjusting these is not typically needed, as the BeeGFS HA role ships with a number of well tested defaults.
2. Next select and configure a fencing agent:
  1. OPTION 1: To enable fencing using APC Power Distribution Units (PDUs):
    
    beegfs_ha_fencing_agents: fence_apc: - ipaddr: <PDU_IP_ADDRESS> login: <PDU_USERNAME> passwd: <PDU_PASSWORD> pcmk_host_map: "<HOSTNAME>:<PDU_PORT>,<PDU_PORT>;<HOSTNAME>:<PDU_PORT>,<PDU_PORT>"
  2. OPTION 2: To enable fencing using the Redfish APIs provided by the Lenovo XCC (and other BMCs):
    
    redfish: &redfish username: <BMC_USERNAME> password: <BMC_PASSWORD> ssl_insecure: 1 # If a valid SSL certificate is not available specify “1”. beegfs_ha_fencing_agents: fence_redfish: - pcmk_host_list: <HOSTNAME> ip: <BMC_IP> <<: *redfish - pcmk_host_list: <HOSTNAME> ip: <BMC_IP> <<: *redfish
  3. For details on configuring other fencing agents refer to the RedHat Documentation.
The BeeGFS HA role can apply many different tuning parameters to help further optimize performance. These include optimizing kernel memory utilization and block device I/O, among other parameters. The role ships with a reasonable set of defaults based on testing with NetApp E-Series block nodes, but by default these aren't applied unless you specify:
```
beegfs_ha_enable_performance_tuning: True
```
1. If needed also specify any changes to the default performance tuning here. See the full performance tuning parameters documentation for additional details.

To ensure floating IP addresses (sometimes known as logical interfaces) used for BeeGFS services can fail over between file nodes, all network interfaces must be named consistently. By default network interface names are generated by the kernel, which is not guaranteed to generate consistent names, even across identical server models with network adapters installed in the same PCIe slots. This is also useful when creating inventories before the equipment is deployed and generated interface names are known. To ensure consistent device names, based on a block diagram of the server or lshw -class network -businfo output, specify the desired PCIe address-to-logical interface mapping as follows:

For InfiniBand (IPoIB) network interfaces:

eseries_ipoib_udev_rules:
  "<PCIe ADDRESS>": <NAME> # Ex: 0000:41:00.0: i1a

For Ethernet network interfaces:

eseries_ip_udev_rules:
  "<PCIe ADDRESS>": <NAME> # Ex: 0000:41:00.0: e1a

To avoid conflicts when interfaces are renamed (preventing them from being renamed), you should not use any potential default names such as eth0, ens9f0, ib0, or ibs4f0. A common naming convention is to use 'e' or 'i' for Ethernet or InfiniBand, followed by the PCIe slot number, and a letter to indicate the the port. For example the second port of an InfiniBand adapter installed in slot 3 would be: i3b.

If you are using a verified file node model, click here example PCIe address-to-logical port mappings.

Optionally specify configuration that should apply to all BeeGFS services in the cluster. Default configuration values can be found here, and per-service configuration is specified elsewhere:
1. BeeGFS Management service:
  beegfs_ha_beegfs_mgmtd_conf_ha_group_options: <OPTION>: <VALUE>
2. BeeGFS Metadata services:
  beegfs_ha_beegfs_meta_conf_ha_group_options: <OPTION>: <VALUE>
3. BeeGFS Storage services:
  beegfs_ha_beegfs_storage_conf_ha_group_options: <OPTION>: <VALUE>
As of BeeGFS 7.2.7 and 7.3.1 connection authentication must be configured or explicitly disabled. There are a few ways this can be configured using the Ansible based deployment:
1. By default the deployment will automatically configure connection authentication, and generate a connauthfile that will be distributed to all file nodes and used with the BeeGFS services. This file will also be placed/maintained on the Ansible control node at <INVENTORY>/files/beegfs/<sysMgmtdHost>_connAuthFile where it should be maintained (securely) for reuse with clients that need to access this file system.
  1. To generate a new key specify -e "beegfs_ha_conn_auth_force_new=True when running the Ansible playbook. Note this is ignored if a beegfs_ha_conn_auth_secret is defined.
  2. For advanced options refer to the full list of defaults included with the BeeGFS HA role.
2. A custom secret can be used by defining the following in ha_cluster.yml:
  beegfs_ha_conn_auth_secret: <SECRET>
3. Connection authentication can be disabled entirely (NOT recommended):
  beegfs_ha_conn_auth_enabled: false

Click here for an example of a complete inventory file representing common file node configuration.

Using HDR (200Gb) InfiniBand with NetApp EF600 block nodes:

To use HDR (200Gb) InfiniBand with the EF600 the subnet manager must support virtualization. If file and block nodes are connected using a switch, this will need to be enabled on the subnet manager manager for the overall fabric.

If block and file nodes are directly connected using InfiniBand, an instance of opensm must be configured on each file node for each interface directly connected to a block node. This is done by specifying configure: true when configuring file node storage interfaces.

Currently the inbox version of opensm shipped with supported Linux distributions does not support virtualization. Instead it is required you install and configure version of opensm from the Mellanox OpenFabrics Enterprise Distribution (OFED). Although deployment using Ansible is still supported, a few additional steps are required:

Using curl or your desired tool, download the packages for the version of OpenSM listed in the technology requirements section from Mellanox’s website to the <INVENTORY>/packages/ directory. For example:

curl -o packages/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm https://linux.mellanox.com/public/repo/mlnx_ofed/5.4-1.0.3.0/rhel8.4/x86_64/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm

curl -o packages/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm https://linux.mellanox.com/public/repo/mlnx_ofed/5.4-1.0.3.0/rhel8.4/x86_64/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm

Under group_vars/ha_cluster.yml define the following configuration:

### OpenSM package and configuration information
eseries_ib_opensm_allow_upgrades: true
eseries_ib_opensm_skip_package_validation: true
eseries_ib_opensm_rhel_packages: []
eseries_ib_opensm_custom_packages:
  install:
    - files:
        add:
          "packages/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm": "/tmp/"
          "packages/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm": "/tmp/"
    - packages:
        add:
          - /tmp/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
          - /tmp/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
  uninstall:
    - packages:
        remove:
          - opensm
          - opensm-libs
      files:
        remove:
          - /tmp/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
          - /tmp/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm

eseries_ib_opensm_options:
  virt_enabled: "2"

Specify Common File Node Configuration

Creating your file...

Overview

Steps

Using HDR (200Gb) InfiniBand with NetApp EF600 block nodes: