Monitoring alarms

StorageGRID alarms help you evaluate and quickly resolve trouble spots that sometimes occur during normal operation. Alarms can be triggered if attributes meet certain conditions or exceed configured thresholds.

Alarm severity levels

Alarms are organized into a hierarchy of five severity levels from Normal (no alarm or notification) to Critical. The two service state notifications indicate when a node becomes disconnected from the grid.

Icon Node state Alarm severity Meaning
green checkmark icon Connected Normal The node is functioning normally. It is connected to the grid and there are no alarms.
yellow square icon Connected Notice The node is connected to the grid, but an unusual condition exists that does not affect normal operations.
light orange diamond icon Connected Minor The node is connected to the grid, but an abnormal condition exists that could affect operation in the future. You should investigate to prevent escalation.
dark orange diamond icon Connected Major The node is connected to the grid, but an abnormal condition exists that currently affects operation. This requires prompt attention to prevent escalation.
red x icon Connected Critical The node is connected to the grid, but an abnormal condition exists that has stopped normal operations. You should address the issue immediately.
gray questionmark icon Disconnected Administratively Down The node is not connected to the grid for an expected reason. For example, the node, or services on the node, has been gracefully shut down, the node is rebooting, or the software is being upgraded.
blue question mark icon Disconnected Unknown The node is not connected to the grid. This situation requires immediate attentionFor example, the network connection between nodes has been lost or the power is down. This is the most severe condition.
Note: You might see transient blue nodes during managed shutdown operations. You can ignore these alarms.

Alarm classes

There are three classes of alarms:
  • Default alarms are the alarms provided with each StorageGRID system. Each default alarm tracks the value of a specific attribute. For example, the AMQS (Audit Messages Queued) Default alarm tracks the count of messages in the audit message queue at any given time. This alarm is triggered at different severities when the number of queued messages reaches certain threshold values.

    Default Alarms

    Default alarms cannot be modified. However, you can disable Default alarms or override them by defining Global Custom alarms or Custom alarms.

  • Global Custom alarms monitor the status of all services of a given type in the StorageGRID system. You can create a Global Custom alarm to override a Default alarm system-wide. You can also create a new Global Custom alarm that will monitor status system-wide. This can be useful for monitoring any customized conditions of your StorageGRID system.
  • Custom alarms monitor the status of a single service or component. You can create a Custom alarm to override a Default alarm or Global Custom alarm at the service or component level. You can also create new Custom alarms based on the service’s unique requirements.

Alarm notifications

When an alarm is triggered or a service state change occurs, an email notification lets the designated personnel know that the system requires attention. A notification is also sent when the alarm leaves the alarm level — either by being resolved or by entering a different alarm severity level.