Create the Ansible inventory
To define the configuration for file and block nodes, you create an Ansible inventory that represents the BeeGFS file system you want to deploy. The inventory includes hosts, groups, and variables describing the desired BeeGFS file system.
Step 1: Define configuration for all building blocks
Define the configuration that applies to all building blocks, regardless of which configuration profile you may apply to them individually.
-
Choose a subnet addressing scheme for your deployment. Due to the benefits listed in the software architecture, it is recommended to use a single subnet addressing scheme.
-
On your Ansible control node, identify a directory that you want to use to store the Ansible inventory and playbook files.
Unless otherwise noted, all files and directories created in this step and following steps are created relative to this directory.
-
Create the following subdirectories:
host_vars
group_vars
packages
Step 2: Define configuration for individual file and block nodes
Define the configuration that applies to individual file nodes and individual building block nodes.
-
Under
host_vars/
, create a file for each BeeGFS file node named<HOSTNAME>.yml
with the following content, paying special attention to the notes regarding content to populate for BeeGFS cluster IPs and host names ending in odd versus even numbers.Initially, the file node interface names do match what is listed here (such as ib0 or ibs1f0). These custom names are configured in Step 4: Define configuration that should apply to all file nodes.
ansible_host: “<MANAGEMENT_IP>” eseries_ipoib_interfaces: # Used to configure BeeGFS cluster IP addresses. - name: i1b address: 100.127.100. <NUMBER_FROM_HOSTNAME>/16 - name: i4b address: 100.127.100. <NUMBER_FROM_HOSTNAME>/16 beegfs_ha_cluster_node_ips: - <MANAGEMENT_IP> - <i1b_BEEGFS_CLUSTER_IP> - <i4b_BEEGFS_CLUSTER_IP> # NVMe over InfiniBand storage communication protocol information # For odd numbered file nodes (i.e., h01, h03, ..): eseries_nvme_ib_interfaces: - name: i1a address: 192.168.1.10/24 configure: true - name: i2a address: 192.168.3.10/24 configure: true - name: i3a address: 192.168.5.10/24 configure: true - name: i4a address: 192.168.7.10/24 configure: true # For even numbered file nodes (i.e., h02, h04, ..): # NVMe over InfiniBand storage communication protocol information eseries_nvme_ib_interfaces: - name: i1a address: 192.168.2.10/24 configure: true - name: i2a address: 192.168.4.10/24 configure: true - name: i3a address: 192.168.6.10/24 configure: true - name: i4a address: 192.168.8.10/24 configure: true
If you have already deployed the BeeGFS cluster, you must stop the cluster before adding or changing statically configured IP addresses, including cluster IPs and IPs used for NVMe/IB. This is required so these changes take effect properly and do not disrupt cluster operations. -
Under
host_vars/
, create a file for each BeeGFS block node named<HOSTNAME>.yml
and populate it with the following content.Pay special attention to the notes regarding content to populate for storage array names ending in odd versus even numbers.
For each block node, create one file and specify the
<MANAGEMENT_IP>
for one of the two controllers (usually A).eseries_system_name: <STORAGE_ARRAY_NAME> eseries_system_api_url: https://<MANAGEMENT_IP>:8443/devmgr/v2/ eseries_initiator_protocol: nvme_ib # For odd numbered block nodes (i.e., a01, a03, ..): eseries_controller_nvme_ib_port: controller_a: - 192.168.1.101 - 192.168.2.101 - 192.168.1.100 - 192.168.2.100 controller_b: - 192.168.3.101 - 192.168.4.101 - 192.168.3.100 - 192.168.4.100 # For even numbered block nodes (i.e., a02, a04, ..): eseries_controller_nvme_ib_port: controller_a: - 192.168.5.101 - 192.168.6.101 - 192.168.5.100 - 192.168.6.100 controller_b: - 192.168.7.101 - 192.168.8.101 - 192.168.7.100 - 192.168.8.100
Step 3: Define configuration that should apply to all file and block nodes
You can define configuration common to a group of hosts under group_vars
in a file name that corresponds with the group. This prevents repeating a shared configuration in multiple places.
Hosts can be in more than one group, and at runtime, Ansible chooses what variables apply to a particular host based on its variable precedence rules. (For more information on these rules, see the Ansible documentation for Using variables.)
Host-to-group assignments are defined in the actual Ansible inventory file, which is created towards the end of this procedure.
In Ansible, any configuration you want to apply to all hosts can be defined in a group called All
. Create the file group_vars/all.yml
with the following content:
ansible_python_interpreter: /usr/bin/python3 beegfs_ha_ntp_server_pools: # Modify the NTP server addressess if desired. - "pool 0.pool.ntp.org iburst maxsources 3" - "pool 1.pool.ntp.org iburst maxsources 3"
Step 4: Define configuration that should apply to all file nodes
The shared configuration for file nodes is defined in a group called ha_cluster
. The steps in this section build out the configuration that should be included in the group_vars/ha_cluster.yml
file.
-
At the top of the file, define the defaults, including the password to use as the
sudo
user on the file nodes.### ha_cluster Ansible group inventory file. # Place all default/common variables for BeeGFS HA cluster resources below. ### Cluster node defaults ansible_ssh_user: root ansible_become_password: <PASSWORD> eseries_ipoib_default_hook_templates: - 99-multihoming.j2 # This is required for single subnet deployments, where static IPs containing multiple IB ports are in the same IPoIB subnet. i.e: cluster IPs, multirail, single subnet, etc. # If the following options are specified, then Ansible will automatically reboot nodes when necessary for changes to take effect: eseries_common_allow_host_reboot: true eseries_common_reboot_test_command: "! systemctl status eseries_nvme_ib.service || systemctl --state=exited | grep eseries_nvme_ib.service" eseries_ib_opensm_options: virt_enabled: "2" virt_max_ports_in_process: "0"
Particularly for production environments, do not store passwords in plain text. Instead, use the Ansible Vault (see Encrypting content with Ansible Vault) or the --ask-become-pass
option when running the playbook. If theansible_ssh_user
is alreadyroot
, then you can optionally omit theansible_become_password
. -
Optionally, configure a name for the high-availability (HA) cluster and specify a user for intra-cluster communication.
If you are modifying the private IP addressing scheme, you must also update the default
beegfs_ha_mgmtd_floating_ip
. This must match what you configure later for the BeeGFS Management resource group.Specify one or more emails that should receive alerts for cluster events using
beegfs_ha_alert_email_list
.### Cluster information beegfs_ha_firewall_configure: True eseries_beegfs_ha_disable_selinux: True eseries_selinux_state: disabled # The following variables should be adjusted depending on the desired configuration: beegfs_ha_cluster_name: hacluster # BeeGFS HA cluster name. beegfs_ha_cluster_username: hacluster # BeeGFS HA cluster username. beegfs_ha_cluster_password: hapassword # BeeGFS HA cluster username's password. beegfs_ha_cluster_password_sha512_salt: randomSalt # BeeGFS HA cluster username's password salt. beegfs_ha_mgmtd_floating_ip: 100.127.101.0 # BeeGFS management service IP address. # Email Alerts Configuration beegfs_ha_enable_alerts: True beegfs_ha_alert_email_list: ["email@example.com"] # E-mail recipient list for notifications when BeeGFS HA resources change or fail. Often a distribution list for the team responsible for managing the cluster. beegfs_ha_alert_conf_ha_group_options: mydomain: “example.com” # The mydomain parameter specifies the local internet domain name. This is optional when the cluster nodes have fully qualified hostnames (i.e. host.example.com). # Adjusting the following parameters is optional: beegfs_ha_alert_timestamp_format: "%Y-%m-%d %H:%M:%S.%N" #%H:%M:%S.%N beegfs_ha_alert_verbosity: 3 # 1) high-level node activity # 3) high-level node activity + fencing action information + resources (filter on X-monitor) # 5) high-level node activity + fencing action information + resources
While seemingly redundant, beegfs_ha_mgmtd_floating_ip
is important when you scale the BeeGFS file system beyond a single HA cluster. Subsequent HA clusters are deployed without an additional BeeGFS management service and point at the management service provided by the first cluster. -
Configure a fencing agent. (For more details, see Configure fencing in a Red Hat High Availability cluster.) The following output shows examples for configuring common fencing agents. Choose one of these options.
For this step, be aware that:
-
By default, fencing is enabled, but you need to configure a fencing agent.
-
The
<HOSTNAME>
specified in thepcmk_host_map
orpcmk_host_list
must correspond with the hostname in the Ansible inventory. -
Running the BeeGFS cluster without fencing is not supported, particularly in production. This is largely to ensure when BeeGFS services, including any resource dependencies like block devices, fail over due to an issue, there is no risk of concurrent access by multiple nodes that result in file system corruption or other undesirable or unexpected behavior. If fencing must be disabled, refer to the general notes in the BeeGFS HA role’s getting started guide and set
beegfs_ha_cluster_crm_config_options["stonith-enabled"]
to false inha_cluster.yml
. -
There are multiple node-level fencing devices available, and the BeeGFS HA role can configure any fencing agent available in the Red Hat HA package repository. When possible, use a fencing agent that works through the uninterruptible power supply (UPS) or rack power distribution unit (rPDU), because some fencing agents such as the baseboard management controller (BMC) or other lights-out devices that are built into the server might not respond to the fence request under certain failure scenarios.
### Fencing configuration: # OPTION 1: To enable fencing using APC Power Distribution Units (PDUs): beegfs_ha_fencing_agents: fence_apc: - ipaddr: <PDU_IP_ADDRESS> login: <PDU_USERNAME> passwd: <PDU_PASSWORD> pcmk_host_map: "<HOSTNAME>:<PDU_PORT>,<PDU_PORT>;<HOSTNAME>:<PDU_PORT>,<PDU_PORT>" # OPTION 2: To enable fencing using the Redfish APIs provided by the Lenovo XCC (and other BMCs): redfish: &redfish username: <BMC_USERNAME> password: <BMC_PASSWORD> ssl_insecure: 1 # If a valid SSL certificate is not available specify “1”. beegfs_ha_fencing_agents: fence_redfish: - pcmk_host_list: <HOSTNAME> ip: <BMC_IP> <<: *redfish - pcmk_host_list: <HOSTNAME> ip: <BMC_IP> <<: *redfish # For details on configuring other fencing agents see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/configuring_and_managing_high_availability_clusters/assembly_configuring-fencing-configuring-and-managing-high-availability-clusters.
-
-
Enable recommended performance tuning in the Linux OS.
While many users find the default settings for the performance parameters generally work well, you can optionally change the default settings for a particular workload. As such, these recommendations are included in the BeeGFS role, but are not enabled by default to ensure users are aware of the tuning applied to their file system.
To enable performance tuning, specify:
### Performance Configuration: beegfs_ha_enable_performance_tuning: True
-
(Optional) You can adjust the performance tuning parameters in the Linux OS as needed.
For a comprehensive list of the available tuning parameters that you can adjust, see the Performance Tuning Defaults section of the BeeGFS HA role in E-Series BeeGFS GitHub site. The default values can be overridden for all nodes in the cluster in this file or the
host_vars
file for an individual node. -
To allow full 200Gb/HDR connectivity between block and file nodes, use the Open Subnet Manager (OpenSM) package from the NVIDIA Open Fabrics Enterprise Distribution (MLNX_OFED). The MLNX_OFED version in listed the file node requirements comes bundled with the recommended OpenSM packages. Although deployment using Ansible is supported, you must first install the MLNX_OFED driver on all file nodes.
-
Populate the following parameters in
group_vars/ha_cluster.yml
(adjust packages as needed):### OpenSM package and configuration information eseries_ib_opensm_options: virt_enabled: "2" virt_max_ports_in_process: "0"
-
-
Configure the
udev
rule to ensure consistent mapping of logical InfiniBand port identifiers to underlying PCIe devices.The
udev
rule must be unique to the PCIe topology of each server platform used as a BeeGFS file node.Use the following values for verified file nodes:
### Ensure Consistent Logical IB Port Numbering # OPTION 1: Lenovo SR665 V3 PCIe address-to-logical IB port mapping: eseries_ipoib_udev_rules: "0000:01:00.0": i1a "0000:01:00.1": i1b "0000:41:00.0": i2a "0000:41:00.1": i2b "0000:81:00.0": i3a "0000:81:00.1": i3b "0000:a1:00.0": i4a "0000:a1:00.1": i4b # OPTION 2: Lenovo SR665 PCIe address-to-logical IB port mapping: eseries_ipoib_udev_rules: "0000:41:00.0": i1a "0000:41:00.1": i1b "0000:01:00.0": i2a "0000:01:00.1": i2b "0000:a1:00.0": i3a "0000:a1:00.1": i3b "0000:81:00.0": i4a "0000:81:00.1": i4b
-
(Optional) Update the metadata target selection algorithm.
beegfs_ha_beegfs_meta_conf_ha_group_options: tuneTargetChooser: randomrobin
In verification testing, randomrobin
was typically used to ensure that test files were evenly distributed across all BeeGFS storage targets during performance benchmarking (for more information on benchmarking, see the BeeGFS site for Benchmarking a BeeGFS System). With real world use, this might cause lower numbered targets to fill up faster than higher numbered targets. Omittingrandomrobin
and just using the defaultrandomized
value has been shown to provide good performance while still utilizing all available targets.
Step 5: Define the configuration for the common block node
The shared configuration for block nodes is defined in a group called eseries_storage_systems
. The steps in this section build out the configuration that should be included in the group_vars/ eseries_storage_systems.yml
file.
-
Set the Ansible connection to local, provide the system password, and specify if SSL certificates should be verified. (Normally, Ansible uses SSH to connect to managed hosts, but in the case of the NetApp E-Series storage systems used as block nodes, the modules use the REST API for communication.) At the top of the file, add the following:
### eseries_storage_systems Ansible group inventory file. # Place all default/common variables for NetApp E-Series Storage Systems here: ansible_connection: local eseries_system_password: <PASSWORD> eseries_validate_certs: false
Listing any passwords in plaintext is not recommended. Use Ansible vault or provide the eseries_system_password
when running Ansible using--extra-vars
. -
To ensure optimal performance, install the versions listed for block nodes in Technical requirements.
Download the corresponding files from the NetApp Support site. You can either upgrade them manually or include them in the
packages/
directory of the Ansible control node, and then populate the following parameters ineseries_storage_systems.yml
to upgrade using Ansible:# Firmware, NVSRAM, and Drive Firmware (modify the filenames as needed): eseries_firmware_firmware: "packages/RCB_11.80GA_6000_64cc0ee3.dlp" eseries_firmware_nvsram: "packages/N6000-880834-D08.dlp"
-
Download and install the latest drive firmware available for the drives installed in your block nodes from the NetApp Support site. You can either upgrade them manually or include them in the
packages/
directory of the Ansible control node, and then populate the following parameters ineseries_storage_systems.yml
to upgrade using Ansible:eseries_drive_firmware_firmware_list: - "packages/<FILENAME>.dlp" eseries_drive_firmware_upgrade_drives_online: true
Setting eseries_drive_firmware_upgrade_drives_online
tofalse
will speed up the upgrade, but should not be done until after BeeGFS is deployed. This is because that setting requires stopping all I/O to the drives before the upgrade to avoid application errors. Although performing an online drive firmware upgrade before configuring volumes is still quick, we recommend you always set this value totrue
to avoid issues later. -
To optimize performance, make the following changes to the global configuration:
# Global Configuration Defaults eseries_system_cache_block_size: 32768 eseries_system_cache_flush_threshold: 80 eseries_system_default_host_type: linux dm-mp eseries_system_autoload_balance: disabled eseries_system_host_connectivity_reporting: disabled eseries_system_controller_shelf_id: 99 # Required.
-
To ensure optimal volume provisioning and behavior, specify the following parameters:
# Storage Provisioning Defaults eseries_volume_size_unit: pct eseries_volume_read_cache_enable: true eseries_volume_read_ahead_enable: false eseries_volume_write_cache_enable: true eseries_volume_write_cache_mirror_enable: true eseries_volume_cache_without_batteries: false eseries_storage_pool_usable_drives: "99:0,99:23,99:1,99:22,99:2,99:21,99:3,99:20,99:4,99:19,99:5,99:18,99:6,99:17,99:7,99:16,99:8,99:15,99:9,99:14,99:10,99:13,99:11,99:12"
The value specified for eseries_storage_pool_usable_drives
is specific to NetApp EF600 block nodes and controls the order in which drives are assigned to new volume groups. This ordering ensures that the I/O to each group is evenly distributed across backend drive channels.