What's new in StorageGRID 11.4

StorageGRID 11.4 formally introduces the alert system as the primary framework for system notification; adds Grid Manager support for traffic classification policies, SNMP monitoring, AutoSupport on Demand, and SANtricity OS software upgrades; and provides erasure coding enhancements. In addition, S3 bucket tagging is now supported, and S3 tenants can now delete their buckets from the Tenant Manager. The 11.4 release also allows you to double or triple the capacity of an installed SG6060 storage appliance by adding expansion shelves.

Alert system is now primary

The StorageGRID alert system, which was available to preview in StorageGRID 11.3, has been substantially enhanced in StorageGRID 11.4. The alert system is now intended to be your primary tool for monitoring any issues that might occur in your StorageGRID system. The alert system provides an easy-to-use interface for detecting, evaluating, and resolving issues.

Note: While the alarm system continues to be supported in StorageGRID 11.4, the new alert system offers significant benefits and is easier to use.

The enhancements to the alert system for StorageGRID 11.4 include the following:

The following new alerts were added for StorageGRID 11.4:
  • Appliance battery expired
  • Appliance battery failed
  • Appliance battery has insufficient learned capacity
  • Appliance battery near expiration
  • Appliance battery removed
  • Appliance battery too hot
  • Appliance cache backup device failed
  • Appliance cache backup device insufficient capacity
  • Appliance cache backup device write-protected
  • Appliance cache memory size mismatch
  • Appliance compute controller chassis temperature too high
  • Appliance compute controller CPU temperature too high
  • Appliance compute controller needs attention
  • Appliance compute controller power supply A has a problem
  • Appliance compute controller power supply B has a problem
  • Appliance compute hardware monitor service stalled
  • Appliance flash cache drives non-optimal
  • Appliance interconnect/battery canister removed
  • Appliance overall power supply degraded
  • Appliance storage connectivity degraded
  • Appliance storage controller A failure
  • Appliance storage controller B failure
  • Appliance storage controller drive failure
  • Appliance storage controller hardware issue
  • Appliance storage controller power supply A failure
  • Appliance storage controller power supply B failure
  • Appliance storage hardware monitor service stalled
  • Appliance storage shelves power supply degraded
  • Appliance temperature exceeded
  • Appliance temperature sensor removed
  • Cassandra communication error
  • Cassandra repair metrics out of date
  • Cassandra repair progress slow
  • Cassandra repair service not available
  • DHCP lease expired
  • DHCP lease expiring soon
  • DHCP server unavailable
  • Email notification failure
  • Expiration of load balancer endpoint certificate
  • Grid Network MTU mismatch
  • High Java heap use
  • Identity federation synchronization failure
  • ILM placement unachievable
  • ILM scan period too long
  • ILM scan rate low
  • Non appliance node network down
  • Node network down
  • Node not locked with NTP server
  • S3 multipart part too small
  • Services appliance link down on Admin Network port 1
  • Services appliance link down on Grid Network (or Admin Network or Client Network)
  • Services appliance link down on network port 1, 2, 3, or 4
  • Storage appliance link down on Admin Network port 1
  • Storage appliance link down on Grid Network (or Admin Network or Client Network)
  • Storage appliance link down on network port 1, 2, 3, or 4
  • Unidentified corrupt object detected
The following alerts were modified in StorageGRID 11.4:
  • The Unable to communicate with node alert is no longer triggered when a node is gracefully shut down, for example, as part of a maintenance procedure.
  • Three alerts were renamed:
    New name Original name
    High latency for metadata queries Low metadata query performance
    Low system data capacity Low volume disk capacity
    Node not in sync with NTP server Node not in sync with time source
Note: Alerts might be triggered during an upgrade. See the information about how your system is affected during an upgrade. See details about specific alerts in the instructions for monitoring and troubleshooting StorageGRID.

Monitoring and troubleshooting StorageGRID

History of legacy attributes only retained for three years

As part of the move to Prometheus metrics and the new alert system, StorageGRID now retains legacy attribute values for a maximum of three years to save disk space on Admin Nodes. Previously, attribute values were saved for seven years. This change affects the historical charts and text reports that are available from the Support > Grid Topology page as well as the historical chart pop-ups that are available from the Grid Manager Dashboard.

When you upgrade to StorageGRID 11.4, the history for most legacy attributes is trimmed to 30 days. After you upgrade, StorageGRID will allow the attribute history to grow for up to three years. The exception to this is the history for node capacity attributes, S3 rate attributes, and ILM summary attributes, which is trimmed to three years when you upgrade, instead of 30 days.

Monitoring and troubleshooting StorageGRID

Support for traffic classification policies

To enhance your quality-of-service (QoS) offerings, you can now use the Grid Manager to configure traffic classification policies (Configuration > Traffic Classification). Within each policy, you can create rules for identifying different types of network traffic, including traffic related to specific buckets, tenants, client subnets, or load balancer endpoints. Traffic classification policies can assist with traffic limiting and monitoring. For any existing policy, you can view a graph of traffic over time to determine how often the policy is limiting traffic or if you need to adjust the policy.

Administering StorageGRID

Grid Manager support for Simple Network Management Protocol (SNMP)

You can now use the Grid Manager to configure the StorageGRID SNMP agent (Configuration > SNMP Agent). You can configure the agent for read-only MIB access and for trap and inform notifications. In addition, the StorageGRID SNMP agent now supports all three versions of the SNMP protocol.

As part of this change, the StorageGRID management information base (MIB) has been updated for version 11.4. The updated MIB contains table and notification definitions for current alerts. Information about alarms is marked deprecated, but can still be used.

Monitoring and troubleshooting StorageGRID

AutoSupport on Demand

AutoSupport on Demand (ASUP on Demand) can assist in solving issues that technical support is actively working on. When you enable AutoSupport on Demand (Support > AutoSupport > Weekly), technical support can request that AutoSupport messages be sent without the need for your intervention.

Administering StorageGRID

Change to metric used for Load Balancer Incoming Request Rate

In StorageGRID 11.3, the Load Balancer Incoming Request Rate chart on the Nodes > Load Balancer tab used the following metric:
storagegrid_private_load_balancer_storage_request_accept_count
In StorageGRID 11.4, this chart uses the following new metric, which more accurately tracks the number of requests the load balancer is receiving, instead of the number of requests being sent to Storage Nodes:
storagegrid_private_load_balancer_storage_request_incoming_count

When you upgrade to StorageGRID 11.4, the chart resets to use the new metric.

Monitoring and troubleshooting StorageGRID

Enhancements to software update procedures

You can now perform StorageGRID software upgrades, StorageGRID hotfixes, and SANtricity OS upgrades from the same location in the Grid Manager (Configuration > Software Update). You can also approve when a hotfix or SANtricity OS upgrade is applied to each node.

The process of upgrading SANtricity OS software on the storage controllers in an appliance has been simplified. You no longer need to manually enter maintenance mode before the upgrade, and you do not have to download a custom StorageGRID NVSRAM file in a separate step. In addition, the new upgrade process ensures you do not load the wrong firmware onto a controller.

Recovery and maintenance

SG6000 appliance installation and maintenance

SG5700 appliance installation and maintenance

SG5600 appliance installation and maintenance

ILM enhancements

A number of enhancement were made to ILM:

Administering StorageGRID

Transport Layer Security (TLS) 1.3 support

StorageGRID now supports TLS 1.3 for the following types of connections: The following TLS 1.3 ciphers are supported:

Administering StorageGRID

Implementing S3 client applications

Implementing Swift client applications

Enhancements to the Grid Manager

Administering StorageGRID

Expanding a StorageGRID system

Monitoring and troubleshooting StorageGRID

Enhancements to the Tenant Manager

Users can now delete their S3 buckets from the Tenant Manager. Buckets must be empty before they can be deleted.

Using tenant accounts

Enhancements to S3 REST API support

Implementing S3 client applications

Monitoring and troubleshooting StorageGRID

Support for SG6060 field expansions

Previously, expansion shelves had to be installed during the initial installation of the SG6060 storage appliance. Starting with StorageGRID 11.4, you can install expansion shelves after the SG6060 is deployed and operating in a grid. You can install one or two expansion shelves to double or triple the capacity of an existing SG6060.

The sizes of the disks in the expansion shelves do not have to be the same as the sizes of the disks in the original shelf or shelves.

SG6000 appliance installation and maintenance

New MTU setting and alert for network communications

You can now set the maximum transmission unit (MTU) for the Grid Network, Admin Network, and Client Network.

If you are using DHCP addressing, you can configure the DHCP server to set the MTU.

If you want to use jumbo frames, change the MTU setting to a value suitable for jumbo frames, such as 9000. Otherwise, keep the default value of 1500 (or 1400 for VMware).

Attention: For the best network performance, all nodes should be configured with similar MTU values on their Grid Network interfaces. The Grid Network MTU mismatch alert is triggered if there is a significant difference in MTU settings for the Grid Network on individual nodes.
  • During initial deployment, use the following methods to set the MTU:
    • For appliance nodes, use the StorageGRID Appliance Installer (Configure Networking > IP Configuration).
    • For Linux-based nodes, set the MTU in the node configuration file.
    • For VMware-based nodes, set the MTU option on the vSphere VM Properties page.
  • After deployment, use the following methods to changed the MTU:
    • For appliance nodes, place the appliance into maintenance mode and use the StorageGRID Appliance Installer.
    • For Linux- or VMware-based nodes, use the change-mtu.py script as described in the monitoring and troubleshooting instructions.

SG100 and SG1000 appliance installation and maintenance

SG6000 appliance installation and maintenance

SG5700 appliance installation and maintenance

SG5600 appliance installation and maintenance

Red Hat Enterprise Linux or CentOS installation

Monitoring and troubleshooting StorageGRID

Updates to lost object procedures

The procedures that describe how to troubleshoot potentially lost objects have been updated. Some commands have been updated to identify objects by their UUID rather than by their CBID. The procedure for objects stored on Archive Nodes has been updated to clarify that its commands apply only to objects stored on tape.

Monitoring and troubleshooting StorageGRID

Recovery and maintenance

Audit message changes

Understanding audit messages

Site decommission procedure

A new site decommission procedure is available to permanently remove a non-functional data center site from the StorageGRID system. The procedure is intended to be used only to clean up after a disaster, for example, to remove a site that was destroyed by a fire or flood. You must consider all nodes at the site to be unrecoverable — any data remaining at the site will become inaccessible.
Attention: The Decommission Site page is disabled by default in StorageGRID 11.4. Before performing this procedure, you must contact your NetApp account representative. NetApp will review your requirements before enabling the page.

Recovery and maintenance

StorageGRID SG100 services appliance

The new StorageGRID SG100 services appliance is a one rack-unit (1U) enclosure with four 10/25-GbE ports. The SG100 can serve as a Gateway Node or Admin Node to provide high availability load balancing services in a StorageGRID system. The SG100 is designed to be deployed with eight or fewer storage nodes and with limited use of traffic classifiers.

SG100 and SG1000 appliance installation and maintenance