Skip to main content
BeeGFS on NetApp with E-Series Storage

Create the Ansible inventory

Contributors netapp-jolieg

To define the configuration for file and block nodes, you create an Ansible inventory that represents the BeeGFS file system you want to deploy. The inventory includes hosts, groups, and variables describing the desired BeeGFS file system.

Step 1: Define configuration for all building blocks

Define the configuration that applies to all building blocks, regardless of which configuration profile you may apply to them individually.

Before you begin
  • Use a source control system like BitBucket or Git to store the contents of the directory containing the Ansible inventory and playbook files.

  • Create a .gitignore file that specifies the files that Git should ignore. This helps avoid storing large files in Git.

Steps
  1. On your Ansible control node, identify a directory that you want to use to store the Ansible inventory and playbook files.

    Unless otherwise noted, all files and directories created in this step and following steps are created relative to this directory.

  2. Create the following subdirectories:

    host_vars

    group_vars

    packages

Step 2: Define configuration for individual file and block nodes

Define the configuration that applies to individual file nodes and individual building block nodes.

  1. Under host_vars/, create a file for each BeeGFS file node named <HOSTNAME>.yml with the following content, paying special attention to the notes regarding content to populate for BeeGFS cluster IPs and host names ending in odd versus even numbers.

    Initially, the file node interface names do match what is listed here (such as ib0 or ibs1f0). These custom names are configured in Step 4: Define configuration that should apply to all file nodes.

    ansible_host: “<MANAGEMENT_IP>”
    eseries_ipoib_interfaces:  # Used to configure BeeGFS cluster IP addresses.
      - name: i1b
        address: 100.127.100. <NUMBER_FROM_HOSTNAME>/16
      - name: i4b
        address: 100.128.100. <NUMBER_FROM_HOSTNAME>/16
    beegfs_ha_cluster_node_ips:
      - <MANAGEMENT_IP>
      - <i1b_BEEGFS_CLUSTER_IP>
      - <i4b_BEEGFS_CLUSTER_IP>
    # NVMe over InfiniBand storage communication protocol information
    # For odd numbered file nodes (i.e., h01, h03, ..):
    eseries_nvme_ib_interfaces:
      - name: i1a
        address: 192.168.1.10/24
        configure: true
      - name: i2a
        address: 192.168.3.10/24
        configure: true
      - name: i3a
        address: 192.168.5.10/24
        configure: true
      - name: i4a
        address: 192.168.7.10/24
        configure: true
    # For even numbered file nodes (i.e., h02, h04, ..):
    # NVMe over InfiniBand storage communication protocol information
    eseries_nvme_ib_interfaces:
      - name: i1a
        address: 192.168.2.10/24
        configure: true
      - name: i2a
        address: 192.168.4.10/24
        configure: true
      - name: i3a
        address: 192.168.6.10/24
        configure: true
      - name: i4a
        address: 192.168.8.10/24
        configure: true
    Note If you have already deployed the BeeGFS cluster, you must stop the cluster before adding or changing statically configured IP addresses, including cluster IPs and IPs used for NVMe/IB. This is required so these changes take effect properly and do not disrupt cluster operations.
  2. Under host_vars/, create a file for each BeeGFS block node named <HOSTNAME>.yml and populate it with the following content.

    Pay special attention to the notes regarding content to populate for storage array names ending in odd versus even numbers.

    For each block node, create one file and specify the <MANAGEMENT_IP> for one of the two controllers (usually A).

    eseries_system_name: <STORAGE_ARRAY_NAME>
    eseries_system_api_url: https://<MANAGEMENT_IP>:8443/devmgr/v2/
    eseries_initiator_protocol: nvme_ib
    # For odd numbered block nodes (i.e., a01, a03, ..):
    eseries_controller_nvme_ib_port:
      controller_a:
        - 192.168.1.101
        - 192.168.2.101
        - 192.168.1.100
        - 192.168.2.100
      controller_b:
        - 192.168.3.101
        - 192.168.4.101
        - 192.168.3.100
        - 192.168.4.100
    # For even numbered block nodes (i.e., a02, a04, ..):
    eseries_controller_nvme_ib_port:
      controller_a:
        - 192.168.5.101
        - 192.168.6.101
        - 192.168.5.100
        - 192.168.6.100
      controller_b:
        - 192.168.7.101
        - 192.168.8.101
        - 192.168.7.100
        - 192.168.8.100

Step 3: Define configuration that should apply to all file and block nodes

You can define configuration common to a group of hosts under group_vars in a file name that corresponds with the group. This prevents repeating a shared configuration in multiple places.

About this task

Hosts can be in more than one group, and at runtime, Ansible chooses what variables apply to a particular host based on its variable precedence rules. (For more information on these rules, see the Ansible documentation for Using variables.)

Host-to-group assignments are defined in the actual Ansible inventory file, which is created towards the end of this procedure.

Step

In Ansible, any configuration you want to apply to all hosts can be defined in a group called All. Create the file group_vars/all.yml with the following content:

ansible_python_interpreter: /usr/bin/python3
beegfs_ha_ntp_server_pools:  # Modify the NTP server addressess if desired.
  - "pool 0.pool.ntp.org iburst maxsources 3"
  - "pool 1.pool.ntp.org iburst maxsources 3"

Step 4: Define configuration that should apply to all file nodes

The shared configuration for file nodes is defined in a group called ha_cluster. The steps in this section build out the configuration that should be included in the group_vars/ha_cluster.yml file.

Steps
  1. At the top of the file, define the defaults, including the password to use as the sudo user on the file nodes.

    ### ha_cluster Ansible group inventory file.
    # Place all default/common variables for BeeGFS HA cluster resources below.
    ### Cluster node defaults
    ansible_ssh_user: root
    ansible_become_password: <PASSWORD>
    eseries_ipoib_default_hook_templates:
      - 99-multihoming.j2 # This is required when configuring additional static IPs (for example cluster IPs) when multiple IB ports are in the same IPoIB subnet.
    # If the following options are specified, then Ansible will automatically reboot nodes when necessary for changes to take effect:
    eseries_common_allow_host_reboot: true
    eseries_common_reboot_test_command: "systemctl --state=active,exited | grep eseries_nvme_ib.service"
    Note Particularly for production environments, do not store passwords in plain text. Instead, use the Ansible Vault (see Encrypting content with Ansible Vault) or the --ask-become-pass option when running the playbook. If the ansible_ssh_user is already root, then you can optionally omit the ansible_become_password.
  2. Optionally, configure a name for the high-availability (HA) cluster and specify a user for intra-cluster communication.

    If you are modifying the private IP addressing scheme, you must also update the default beegfs_ha_mgmtd_floating_ip. This must match what you configure later for the BeeGFS Management resource group.

    Specify one or more emails that should receive alerts for cluster events using beegfs_ha_alert_email_list.

    ### Cluster information
    beegfs_ha_firewall_configure: True
    eseries_beegfs_ha_disable_selinux: True
    eseries_selinux_state: disabled
    # The following variables should be adjusted depending on the desired configuration:
    beegfs_ha_cluster_name: hacluster                  # BeeGFS HA cluster name.
    beegfs_ha_cluster_username: hacluster              # BeeGFS HA cluster username.
    beegfs_ha_cluster_password: hapassword             # BeeGFS HA cluster username's password.
    beegfs_ha_cluster_password_sha512_salt: randomSalt # BeeGFS HA cluster username's password salt.
    beegfs_ha_mgmtd_floating_ip: 100.127.101.0         # BeeGFS management service IP address.
    # Email Alerts Configuration
    beegfs_ha_enable_alerts: True
    beegfs_ha_alert_email_list: ["email@example.com"]  # E-mail recipient list for notifications when BeeGFS HA resources change or fail.  Often a distribution list for the team responsible for managing the cluster.
    beegfs_ha_alert_conf_ha_group_options:
          mydomain: “example.com”
    # The mydomain parameter specifies the local internet domain name. This is optional when the cluster nodes have fully qualified hostnames (i.e. host.example.com).
    # Adjusting the following parameters is optional:
    beegfs_ha_alert_timestamp_format: "%Y-%m-%d %H:%M:%S.%N" #%H:%M:%S.%N
    beegfs_ha_alert_verbosity: 3
    #  1) high-level node activity
    #  3) high-level node activity + fencing action information + resources (filter on X-monitor)
    #  5) high-level node activity + fencing action information + resources
    Note While seemingly redundant, beegfs_ha_mgmtd_floating_ip is important when you scale the BeeGFS file system beyond a single HA cluster. Subsequent HA clusters are deployed without an additional BeeGFS management service and point at the management service provided by the first cluster.
  3. Configure a fencing agent. (For more details, see Configure fencing in a Red Hat High Availability cluster.) The following output shows examples for configuring common fencing agents. Choose one of these options.

    For this step, be aware that:

    • By default, fencing is enabled, but you need to configure a fencing agent.

    • The <HOSTNAME> specified in the pcmk_host_map or pcmk_host_list must correspond with the hostname in the Ansible inventory.

    • Running the BeeGFS cluster without fencing is not supported, particularly in production. This is largely to ensure when BeeGFS services, including any resource dependencies like block devices, fail over due to an issue, there is no risk of concurrent access by multiple nodes that result in file system corruption or other undesirable or unexpected behavior. If fencing must be disabled, refer to the general notes in the BeeGFS HA role’s getting started guide and set beegfs_ha_cluster_crm_config_options["stonith-enabled"] to false in ha_cluster.yml.

    • There are multiple node-level fencing devices available, and the BeeGFS HA role can configure any fencing agent available in the Red Hat HA package repository. When possible, use a fencing agent that works through the uninterruptible power supply (UPS) or rack power distribution unit (rPDU), because some fencing agents such as the baseboard management controller (BMC) or other lights-out devices that are built into the server might not respond to the fence request under certain failure scenarios.

      ### Fencing configuration:
      # OPTION 1: To enable fencing using APC Power Distribution Units (PDUs):
      beegfs_ha_fencing_agents:
       fence_apc:
         - ipaddr: <PDU_IP_ADDRESS>
           login: <PDU_USERNAME>
           passwd: <PDU_PASSWORD>
           pcmk_host_map: "<HOSTNAME>:<PDU_PORT>,<PDU_PORT>;<HOSTNAME>:<PDU_PORT>,<PDU_PORT>"
      # OPTION 2: To enable fencing using the Redfish APIs provided by the Lenovo XCC (and other BMCs):
      redfish: &redfish
        username: <BMC_USERNAME>
        password: <BMC_PASSWORD>
        ssl_insecure: 1 # If a valid SSL certificate is not available specify “1”.
      beegfs_ha_fencing_agents:
        fence_redfish:
          - pcmk_host_list: <HOSTNAME>
            ip: <BMC_IP>
            <<: *redfish
          - pcmk_host_list: <HOSTNAME>
            ip: <BMC_IP>
            <<: *redfish
      # For details on configuring other fencing agents see https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_high_availability_clusters/assembly_configuring-fencing-configuring-and-managing-high-availability-clusters.
  4. Enable recommended performance tuning in the Linux OS.

    While many users find the default settings for the performance parameters generally work well, you can optionally change the default settings for a particular workload. As such, these recommendations are included in the BeeGFS role, but are not enabled by default to ensure users are aware of the tuning applied to their file system.

    To enable performance tuning, specify:

    ### Performance Configuration:
    beegfs_ha_enable_performance_tuning: True
  5. (Optional) You can adjust the performance tuning parameters in the Linux OS as needed.

    For a comprehensive list of the available tuning parameters that you can adjust, see the Performance Tuning Defaults section of the BeeGFS HA role in E-Series BeeGFS GitHub site. The default values can be overridden for all nodes in the cluster in this file or the host_vars file for an individual node.

  6. To allow full 200Gb/HDR connectivity between block and file nodes, use the Open Subnet Manager (OpenSM) package from the Mellanox Open Fabrics Enterprise Distribution (MLNX_OFED). (The inbox opensm package does not support the necessary virtualization functionality.) Although deployment using Ansible is supported, you must first download the desired packages to the Ansible control node used to run the BeeGFS role.

    1. Using curl or your desired tool, download the packages for the version of OpenSM listed in the technology requirements section from Mellanox’s website to the packages/ directory. For example:

      curl -o packages/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm https://linux.mellanox.com/public/repo/mlnx_ofed/5.4-1.0.3.0/rhel8.4/x86_64/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
      
      curl -o packages/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm https://linux.mellanox.com/public/repo/mlnx_ofed/5.4-1.0.3.0/rhel8.4/x86_64/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
    2. Populate the following parameters in group_vars/ha_cluster.yml (adjust packages as needed):

      ### OpenSM package and configuration information
      eseries_ib_opensm_allow_upgrades: true
      eseries_ib_opensm_skip_package_validation: true
      eseries_ib_opensm_rhel_packages: []
      eseries_ib_opensm_custom_packages:
        install:
          - files:
              add:
                "packages/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm": "/tmp/"
                "packages/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm": "/tmp/"
          - packages:
              add:
                - /tmp/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
                - /tmp/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
        uninstall:
          - packages:
              remove:
                - opensm
                - opensm-libs
            files:
              remove:
                - /tmp/opensm-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
                - /tmp/opensm-libs-5.9.0.MLNX20210617.c9f2ade-0.1.54103.x86_64.rpm
      eseries_ib_opensm_options:
        virt_enabled: "2"
  7. Configure the udev rule to ensure consistent mapping of logical InfiniBand port identifiers to underlying PCIe devices.

    The udev rule must be unique to the PCIe topology of each server platform used as a BeeGFS file node.

    Use the following values for verified file nodes:

    ### Ensure Consistent Logical IB Port Numbering
    # OPTION 1: Lenovo SR665 PCIe address-to-logical IB port mapping:
    eseries_ipoib_udev_rules:
      "0000:41:00.0": i1a
      "0000:41:00.1": i1b
      "0000:01:00.0": i2a
      "0000:01:00.1": i2b
      "0000:a1:00.0": i3a
      "0000:a1:00.1": i3b
      "0000:81:00.0": i4a
      "0000:81:00.1": i4b
    
    # Note: At this time no other x86 servers have been qualified. Configuration for future qualified file nodes will be added here.
  8. (Optional) Update the metadata target selection algorithm.

    beegfs_ha_beegfs_meta_conf_ha_group_options:
      tuneTargetChooser: randomrobin
    Note In verification testing, randomrobin was typically used to ensure that test files were evenly distributed across all BeeGFS storage targets during performance benchmarking (for more information on benchmarking, see the BeeGFS site for Benchmarking a BeeGFS System). With real world use, this might cause lower numbered targets to fill up faster than higher numbered targets. Omitting randomrobin and just using the default randomized value has been shown to provide good performance while still utilizing all available targets.

Step 5: Define the configuration for the common block node

The shared configuration for block nodes is defined in a group called eseries_storage_systems. The steps in this section build out the configuration that should be included in the group_vars/ eseries_storage_systems.yml file.

Steps
  1. Set the Ansible connection to local, provide the system password, and specify if SSL certificates should be verified. (Normally, Ansible uses SSH to connect to managed hosts, but in the case of the NetApp E-Series storage systems used as block nodes, the modules use the REST API for communication.) At the top of the file, add the following:

    ### eseries_storage_systems Ansible group inventory file.
    # Place all default/common variables for NetApp E-Series Storage Systems here:
    ansible_connection: local
    eseries_system_password: <PASSWORD>
    eseries_validate_certs: false
    Note Listing any passwords in plaintext is not recommended. Use Ansible vault or provide the eseries_system_password when running Ansible using --extra-vars.
  2. To ensure optimal performance, install the versions listed for block nodes in Technical requirements.

    Download the corresponding files from the NetApp Support site. You can either upgrade them manually or include them in the packages/ directory of the Ansible control node, and then populate the following parameters in eseries_storage_systems.yml to upgrade using Ansible:

    # Firmware, NVSRAM, and Drive Firmware (modify the filenames as needed):
    eseries_firmware_firmware: "packages/RCB_11.70.2_6000_61b1131d.dlp"
    eseries_firmware_nvsram: "packages/N6000-872834-D06.dlp"
  3. Download and install the latest drive firmware available for the drives installed in your block nodes from the NetApp Support site. You can either upgrade them manually or include them in the packages/ directory of the Ansible control node, and then populate the following parameters in eseries_storage_systems.yml to upgrade using Ansible:

    eseries_drive_firmware_firmware_list:
      - "packages/<FILENAME>.dlp"
    eseries_drive_firmware_upgrade_drives_online: true
    Note Setting eseries_drive_firmware_upgrade_drives_online to false will speed up the upgrade, but should not be done until after BeeGFS is deployed. This is because that setting requires stopping all I/O to the drives before the upgrade to avoid application errors. Although performing an online drive firmware upgrade before configuring volumes is still quick, we recommend you always set this value to true to avoid issues later.
  4. To optimize performance, make the following changes to the global configuration:

    # Global Configuration Defaults
    eseries_system_cache_block_size: 32768
    eseries_system_cache_flush_threshold: 80
    eseries_system_default_host_type: linux dm-mp
    eseries_system_autoload_balance: disabled
    eseries_system_host_connectivity_reporting: disabled
    eseries_system_controller_shelf_id: 99 # Required.
  5. To ensure optimal volume provisioning and behavior, specify the following parameters:

    # Storage Provisioning Defaults
    eseries_volume_size_unit: pct
    eseries_volume_read_cache_enable: true
    eseries_volume_read_ahead_enable: false
    eseries_volume_write_cache_enable: true
    eseries_volume_write_cache_mirror_enable: true
    eseries_volume_cache_without_batteries: false
    eseries_storage_pool_usable_drives: "99:0,99:23,99:1,99:22,99:2,99:21,99:3,99:20,99:4,99:19,99:5,99:18,99:6,99:17,99:7,99:16,99:8,99:15,99:9,99:14,99:10,99:13,99:11,99:12"
    Note The value specified for eseries_storage_pool_usable_drives is specific to NetApp EF600 block nodes and controls the order in which drives are assigned to new volume groups. This ordering ensures that the I/O to each group is evenly distributed across backend drive channels.