Manage alerts: overview
PDF of this doc site
- Get started
Install and maintain appliance hardware
SG100 and SG1000 services appliances
- Prepare for installation (SG100 and SG1000)
SG6000 storage appliances
- Prepare for installation (SG6000)
- Configure hardware (SG6000)
SG5700 storage appliances
- Prepare for installation (SG5700)
- Configure hardware (SG5700)
SG5600 storage appliances
- Prepare for installation (SG5600)
- Configure hardware (SG5600)
- SG100 and SG1000 services appliances
Install and upgrade software
- Upgrade StorageGRID software
- Install Red Hat Enterprise Linux or CentOS
- Install Ubuntu or Debian
Perform system administration
- Manage security settings
- Manage Admin Nodes
- Manage Archive Nodes
Manage objects with ILM
- ILM and object lifecycle
- Create storage grades, storage pools, EC profiles, and regions
- Administer StorageGRID
- Use a tenant account
- S3 REST API supported operations and limitations
Monitor and maintain StorageGRID
Monitor and troubleshoot
- Troubleshoot a StorageGRID system
- Expand your grid
Recover and maintain
Grid node recovery procedures
- Recover from Storage Node failures
- Recover from Admin Node failures
- All grid node types: Replace Linux node
- Grid node decommission
- Network maintenance procedures
- Grid node procedures
- Grid node recovery procedures
Review audit logs
- Audit messages and the object lifecycle
- Monitor and troubleshoot
Alerts allow you to monitor various events and conditions within your StorageGRID system. You can manage alerts by creating custom alerts, editing or disabling the default alerts, setting up email notifications for alerts, and silencing alert notifications.
About StorageGRID alerts
The alert system provides an easy-to-use interface for detecting, evaluating, and resolving the issues that can occur during StorageGRID operation.
The alert system focuses on actionable problems in the system. Alerts are triggered for events that require your immediate attention, not for events that can safely be ignored.
The Current Alerts page provides a user-friendly interface for viewing current problems. You can sort the listing by individual alerts and alert groups. For example, you might want to sort all alerts by node/site to see which alerts are affecting a specific node. Or, you might want to sort the alerts in a group by time triggered to find the most recent instance of a specific alert.
The Resolved Alerts page provides similar information as on the Current Alerts page, but it allows you to search and view a history of the alerts that have been resolved, including when the alert was triggered and when it was resolved.
Multiple alerts of the same type are grouped into one email to reduce the number of notifications. In addition, multiple alerts of the same type are shown as a group on the Alerts page. You can expand and collapse alert groups to show or hide the individual alerts. For example, if several nodes report the Unable to communicate with node alert at about the same time, only one email is sent and the alert is shown as a group on the Alerts page.
Alerts use intuitive names and descriptions to help you quickly understand the problem. Alert notifications include details about the node and site affected, the alert severity, the time when the alert rule was triggered, and the current value of metrics related to the alert.
Alert emails notifications and the alert listings on the Current Alerts and Resolved Alerts pages provide recommended actions for resolving an alert. These recommended actions often include direct links to the StorageGRID documentation center to make it easier to find and access more detailed troubleshooting procedures.
If you need to temporarily suppress the notifications for an alert at one or more severity levels, you can easily silence a specific alert rule for a specified duration and for the entire grid, a single site, or a single node. You can also silence all alert rules, for example, during a planned maintenance procedure such as a software upgrade.
You can edit the default alert rules as required. You can disable an alert rule completely, or change its trigger conditions and duration.
You can create custom alert rules to target the specific conditions that are relevant to your situation and to provide your own recommended actions. To define the conditions for a custom alert, you create expressions using the Prometheus metrics available from the Metrics section of the Grid Management API.