StorageGRID 11.4 formally introduces the alert system as the primary framework for system notification; adds Grid Manager support for traffic classification policies, SNMP monitoring, AutoSupport on Demand, and SANtricity OS software upgrades; and provides erasure coding enhancements. In addition, S3 bucket tagging is now supported, and S3 tenants can now delete their buckets from the Tenant Manager. The 11.4 release also allows you to double or triple the capacity of an installed SG6060 storage appliance by adding expansion shelves.
Alert system is now primary
The StorageGRID alert system, which was available to preview in StorageGRID 11.3, has been substantially enhanced in StorageGRID 11.4. The alert system is now intended to be your primary tool for monitoring any issues that might occur in your StorageGRID system. The alert system provides an easy-to-use interface for detecting, evaluating, and resolving issues.
Note: While the alarm system continues to be supported in StorageGRID 11.4, the new alert system offers significant benefits and is easier to use.
The enhancements to the alert system for StorageGRID 11.4 include the following:
- Menus in the Grid Manager have been updated. You can now select Alerts directly from the top menu bar. The Alarms (legacy) options have been moved to the Support menu.
- The Health panel on the Grid Manager Dashboard now shows a system-wide summary of node connection state (blue and gray icons), current alerts (yellow, orange, and red icons), recently resolved alerts, legacy alarms, and license issues. If a health-related issue exists anywhere in your system, you can click a link directly from the Dashboard to quickly identify the problem.
- The Nodes page now uses alert icons, instead of alarm icons, in the tree view. In addition, the tab for a specific node shows the node connection state (Connected, Unknown, or Administratively Down) as well as any alerts currently affecting the node.
- You can now search and view a history of the alerts that have been resolved. By default, the Resolved Alerts page shows all resolved alerts that were triggered in the last week, but you can view a list of resolved alerts for different time periods. You can also filter the list by alert name, severity, or affected node.
- When configuring the SMTP server for alert email notifications, you can now specify a username and password if your SMTP server requires authentication. In addition, you can now enable Transport Layer Security (TLS) for communications with the SMTP server.
- Alert email notifications are now sent by whichever Admin Node is configured to be the
preferred sender.
Previously, all Admin Nodes sent alert email notifications.
The following new alerts were added for
StorageGRID 11.4:
- Appliance battery expired
- Appliance battery failed
- Appliance battery has insufficient learned capacity
- Appliance battery near expiration
- Appliance battery removed
- Appliance battery too hot
- Appliance cache backup device failed
- Appliance cache backup device insufficient capacity
- Appliance cache backup device write-protected
- Appliance cache memory size mismatch
- Appliance compute controller chassis temperature too high
- Appliance compute controller CPU temperature too high
- Appliance compute controller needs attention
- Appliance compute controller power supply A has a problem
- Appliance compute controller power supply B has a problem
- Appliance compute hardware monitor service stalled
- Appliance flash cache drives non-optimal
- Appliance interconnect/battery canister removed
- Appliance overall power supply degraded
- Appliance storage connectivity degraded
- Appliance storage controller A failure
- Appliance storage controller B failure
- Appliance storage controller drive failure
- Appliance storage controller hardware issue
- Appliance storage controller power supply A failure
- Appliance storage controller power supply B failure
- Appliance storage hardware monitor service stalled
- Appliance storage shelves power supply degraded
- Appliance temperature exceeded
- Appliance temperature sensor removed
- Cassandra communication error
- Cassandra repair metrics out of date
- Cassandra repair progress slow
- Cassandra repair service not available
- DHCP lease expired
- DHCP lease expiring soon
- DHCP server unavailable
- Email notification failure
- Expiration of load balancer endpoint certificate
- Grid Network MTU mismatch
- High Java heap use
- Identity federation synchronization failure
- ILM placement unachievable
- ILM scan period too long
- ILM scan rate low
- Non appliance node network down
- Node network down
- Node not locked with NTP server
- S3 multipart part too small
- Services appliance link down on Admin Network port 1
- Services appliance link down on Grid Network (or Admin Network or Client Network)
- Services appliance link down on network port 1, 2, 3, or 4
- Storage appliance link down on Admin Network port 1
- Storage appliance link down on Grid Network (or Admin Network or Client Network)
- Storage appliance link down on network port 1, 2, 3, or 4
- Unidentified corrupt object detected
The following alerts were modified in
StorageGRID 11.4:
- The Unable to communicate with node alert is no longer triggered when a node is gracefully shut down, for example, as part of a maintenance procedure.
- Three alerts were renamed:
New name |
Original name |
High latency for metadata queries |
Low metadata query performance |
Low system data capacity |
Low volume disk capacity |
Node not in sync with NTP server |
Node not in sync with time source |
Note: Alerts might be triggered during an upgrade. See the information about how your system is affected during an upgrade. See details about specific alerts in the instructions for monitoring and troubleshooting StorageGRID.
Monitoring and troubleshooting StorageGRID
History of legacy attributes only retained for three years
As part of the move to Prometheus metrics and the new alert system, StorageGRID now retains legacy attribute values for a maximum of three years to save disk space on Admin Nodes. Previously, attribute values were saved for seven years. This change affects the historical charts and text reports that are available from the page as well as the historical chart pop-ups that are available from the Grid Manager Dashboard.
When you upgrade to StorageGRID 11.4, the history for most legacy attributes is trimmed to 30 days. After you upgrade, StorageGRID will allow the attribute history to grow for up to three years. The exception to this is the history for node capacity attributes, S3 rate attributes, and ILM summary attributes, which is trimmed to three years when you upgrade, instead of 30 days.
Monitoring and troubleshooting StorageGRID
Support for traffic classification policies
To enhance your quality-of-service (QoS) offerings, you can now use the Grid Manager to configure traffic classification policies (). Within each policy, you can create rules for identifying different types of network traffic, including traffic related to specific buckets, tenants, client subnets, or load balancer endpoints. Traffic classification policies can assist with traffic limiting and monitoring. For any existing policy, you can view a graph of traffic over time to determine how often the policy is limiting traffic or if you need to adjust the policy.
Administering StorageGRID
Grid Manager support for Simple Network Management Protocol (SNMP)
You can now use the Grid Manager to configure the StorageGRID SNMP agent (). You can configure the agent for read-only MIB access and for trap and inform notifications. In addition, the StorageGRID SNMP agent now supports all three versions of the SNMP protocol.
As part of this change, the StorageGRID management information base (MIB) has been updated for version 11.4. The updated MIB contains table and notification definitions for current alerts. Information about alarms is marked deprecated, but can still be used.
Monitoring and troubleshooting StorageGRID
AutoSupport on Demand
AutoSupport on Demand (ASUP on Demand) can assist in solving issues that technical support is actively working on. When you enable AutoSupport on Demand (), technical support can request that AutoSupport messages be sent without the need for your intervention.
Administering StorageGRID
Change to metric used for Load Balancer Incoming Request Rate
In
StorageGRID 11.3, the Load Balancer Incoming Request Rate chart on the tab used the following metric:
storagegrid_private_load_balancer_storage_request_accept_count
In
StorageGRID 11.4, this chart uses the following new metric, which more accurately tracks the number of requests the load balancer is receiving, instead of the number of requests being sent to Storage Nodes:
storagegrid_private_load_balancer_storage_request_incoming_count
When you upgrade to StorageGRID 11.4, the chart resets to use the new metric.
Monitoring and troubleshooting StorageGRID
Enhancements to software update procedures
You can now perform StorageGRID software upgrades, StorageGRID hotfixes, and SANtricity OS upgrades from the same location in the Grid Manager (). You can also approve when a hotfix or SANtricity OS upgrade is applied to each node.
The process of upgrading SANtricity OS software on the storage controllers in an appliance has been simplified. You no longer need to manually enter maintenance mode before the upgrade, and you do not have to download a custom StorageGRID NVSRAM file in a separate step. In addition, the new upgrade process ensures you do not load the wrong firmware onto a controller.
Recovery and maintenance
SG6000 appliance installation and maintenance
SG5700 appliance installation and maintenance
SG5600 appliance installation and maintenance
Transport Layer Security (TLS) 1.3 support
StorageGRID now supports TLS 1.3 for the following types of connections:
- Administrative connections to the Grid Manager, the Tenant Manager, the Grid Management API, and the Tenant Management API
- S3 or Swift client connections
- Connections to external systems used for Cloud Storage Pools and identity federation
The following TLS 1.3 ciphers are supported:
- TLS_AES_256_GCM_SHA384
- TLS_CHACHA20_POLY1305_SHA256
- TLS_AES_128_GCM_SHA256
Administering StorageGRID
Implementing S3 client applications
Implementing Swift client applications
Enhancements to the Grid Manager
- The user interface used to configure StorageGRID grid options has been improved (). As part of this enhancement, the following changes were made:
- The Stored Object Compression option was renamed as Compress Stored Objects.
- The Prevent Client Modify option was renamed as Prevent Client Modification.
- The internal CA certificate for StorageGRID was moved to the page.
- A new Service Endpoint section on the Create Cloud Storage Pool page improves the usability of this page.
- A new Diagnostics page is available at . You can use the page to perform a set of pre-constructed diagnostic checks on the current state of the grid.
- The URLs for the Grafana metrics available at have changed. After upgrading to StorageGRID 11.4, you must update the URLs if you have bookmarked or embedded any charts.
Administering StorageGRID
Expanding a StorageGRID system
Monitoring and troubleshooting StorageGRID
Enhancements to the Tenant Manager
Users can now delete their S3 buckets from the
Tenant Manager. Buckets must be empty before they can be deleted.
Using tenant accounts
Enhancements to S3 REST API support
- The PUT Bucket tagging, GET Bucket tagging, and DELETE Bucket tagging operations are now supported. You can use these operations to add, retrieve, and delete a set of tags for a bucket. Cost allocation tags are not supported.
- For buckets created in StorageGRID 11.4, you no longer need to restrict object key names to meet performance best practices. For example, you can now use random values for the first four characters of object key names.
- Configuring bucket notifications for the s3:ObjectRestore:Post event type is now supported.
- AWS size limits for multipart parts are now enforced. Each part in a multipart upload must be between 5 MiB and 5 GiB. Only the last part can be smaller than 5 MiB (5,242,880 bytes).
The new
S3 multipart part too small alert is triggered if an S3 client attempts to complete a multipart upload with parts that do not meet the Amazon S3 size limits.
Attention: To give clients time to adjust their multipart upload settings, the troubleshooting steps for the alert describe how to run a script to temporarily disable the enforcement of minimum part size. This script will be removed in StorageGRID 11.5. After you upgrade to StorageGRID 11.5, all S3 clients must use part sizes between 5 MiB and 5 GiB.
Implementing S3 client applications
Monitoring and troubleshooting StorageGRID
Support for SG6060 field expansions
Previously, expansion shelves had to be installed during the initial installation of the SG6060 storage appliance. Starting with StorageGRID 11.4, you can install expansion shelves after the SG6060 is deployed and operating in a grid. You can install one or two expansion shelves to double or triple the capacity of an existing SG6060.
The sizes of the disks in the expansion shelves do not have to be the same as the sizes of the disks in the original shelf or shelves.
SG6000 appliance installation and maintenance
New MTU setting and alert for network communications
You can now set the maximum transmission unit (MTU) for the Grid Network, Admin Network, and Client Network.
If you are using DHCP addressing, you can configure the DHCP server to set the MTU.
If you want to use jumbo frames, change the MTU setting to a value suitable for jumbo frames, such as 9000. Otherwise, keep the default value of 1500 (or 1400 for VMware).
Attention: For the best network performance, all nodes should be configured with similar MTU values on their Grid Network interfaces. The Grid Network MTU mismatch alert is triggered if there is a significant difference in MTU settings for the Grid Network on individual nodes.
- During initial deployment, use the following methods to set the MTU:
- For appliance nodes, use the StorageGRID Appliance Installer ().
- For Linux-based nodes, set the MTU in the node configuration file.
- For VMware-based nodes, set the MTU option on the vSphere VM Properties page.
- After deployment, use the following methods to changed the MTU:
- For appliance nodes, place the appliance into maintenance mode and use the StorageGRID Appliance Installer.
- For Linux- or VMware-based nodes, use the change-mtu.py script as described in the monitoring and troubleshooting instructions.
SG100 and SG1000 appliance installation and maintenance
SG6000 appliance installation and maintenance
SG5700 appliance installation and maintenance
SG5600 appliance installation and maintenance
Red Hat Enterprise Linux or CentOS installation
Monitoring and troubleshooting StorageGRID
Updates to lost object procedures
The procedures that describe how to troubleshoot potentially lost objects have been updated. Some commands have been updated to identify objects by their UUID rather than by their CBID. The procedure for objects stored on Archive Nodes has been updated to clarify that its commands apply only to objects stored on tape.
Monitoring and troubleshooting StorageGRID
Recovery and maintenance
Audit message changes
- For S3 or Swift requests that are routed by a trusted Layer 7 load balancer, the audit messages include a new TLIP (Trusted Load Balancer IP Address) field, which indicates the IP address of the load balancer.
- The HTRH (HTTP Request Header) field in S3 and Swift audit messages now automatically includes the X-Forwarded-For header, if this header was present in the request and if the X-Forwarded-For value is different from the request sender IP address (SAIP audit field).
- The LLST (Location Lost) audit message has two new fields: UUID (Universally Unique Identifier) and PCLD (path to the disk location of the affected replicated object).
Understanding audit messages
Site decommission procedure
A new site decommission procedure is available to permanently remove a non-functional data center site from the
StorageGRID system. The procedure is intended to be used only to clean up after a disaster, for example, to remove a site that was destroyed by a fire or flood. You must consider all nodes at the site to be unrecoverable — any data remaining at the site will become inaccessible.
Attention: The Decommission Site page is disabled by default in StorageGRID 11.4. Before performing this procedure, you must contact your NetApp account representative. NetApp will review your requirements before enabling the page.
Recovery and maintenance
StorageGRID SG100 services appliance
The new StorageGRID SG100 services appliance is a one rack-unit (1U) enclosure with four 10/25-GbE ports. The SG100 can serve as a Gateway Node or Admin Node to provide high availability load balancing services in a StorageGRID system. The SG100 is designed to be deployed with eight or fewer storage nodes and with limited use of traffic classifiers.
SG100 and SG1000 appliance installation and maintenance