Create custom alert rules
PDF of this doc site
- Get started
Install and maintain appliance hardware
SG100 and SG1000 services appliances
- Prepare for installation (SG100 and SG1000)
SG6000 storage appliances
- Prepare for installation (SG6000)
- Configure hardware (SG6000)
SG5700 storage appliances
- Prepare for installation (SG5700)
- Configure hardware (SG5700)
SG5600 storage appliances
- Prepare for installation (SG5600)
- Configure hardware (SG5600)
- SG100 and SG1000 services appliances
Install and upgrade software
- Upgrade StorageGRID software
- Install Red Hat Enterprise Linux or CentOS
- Install Ubuntu or Debian
Perform system administration
- Manage security settings
- Manage Admin Nodes
- Manage Archive Nodes
Manage objects with ILM
- ILM and object lifecycle
- Create storage grades, storage pools, EC profiles, and regions
- Administer StorageGRID
- Use a tenant account
- S3 REST API supported operations and limitations
Monitor and maintain StorageGRID
Monitor and troubleshoot
- Troubleshoot a StorageGRID system
- Expand your grid
Recover and maintain
Grid node recovery procedures
- Recover from Storage Node failures
- Recover from Admin Node failures
- All grid node types: Replace Linux node
- Grid node decommission
- Network maintenance procedures
- Grid node procedures
- Grid node recovery procedures
Review audit logs
- Audit messages and the object lifecycle
- Monitor and troubleshoot
You can create custom alert rules to define your own conditions for triggering alerts.
You are signed in to the Grid Manager using a supported web browser
You have the Manage Alerts or Root Access permission
You are familiar with the commonly used Prometheus metrics
You understand the syntax of Prometheus queries
Optionally, you have watched the video: Video: Using Metrics to Create Custom Alerts
StorageGRID does not validate custom alerts. If you decide to create custom alert rules, follow these general guidelines:
Look at the conditions for the default alert rules, and use them as examples for your custom alert rules.
If you define more than one condition for an alert rule, use the same expression for all conditions. Then, change the threshold value for each condition.
Carefully check each condition for typos and logic errors.
Use only the metrics listed in the Grid Management API.
When testing an expression using the Grid Management API, be aware that a “successful” response might simply be an empty response body (no alert triggered). To see if the alert is actually triggered, you can temporarily set a threshold to a value you expect to be true currently.
For example, to test the expression
node_memory_MemTotal_bytes < 24000000000, first execute
node_memory_MemTotal_bytes >= 0and ensure you get the expected results (all nodes return a value). Then, change the operator and the threshold back to the intended values and execute again. No results indicate there are no current alerts for this expression.
Do not assume a custom alert is working unless you have validated that the alert is triggered when expected.
Select ALERTS > Rules.
The Alert Rules page appears.
Select Create custom rule.
The Create Custom Rule dialog box appears.
Select or unselect the Enabled check box to determine if this alert rule is currently enabled.
If an alert rule is disabled, its expressions are not evaluated and no alerts are triggered.
Enter the following information:
A unique name for this rule. The alert rule name is shown on the Alerts page and is also the subject for email notifications. Names for alert rules can be between 1 and 64 characters.
A description of the problem that is occurring. The description is the alert message shown on the Alerts page and in email notifications. Descriptions for alert rules can be between 1 and 128 characters.
Optionally, the recommended actions to take when this alert is triggered. Enter recommended actions as plain text (no formatting codes). Recommended actions for alert rules can be between 0 and 1,024 characters.
In the Conditions section, enter a Prometheus expression for one or more of the alert severity levels.
A basic expression is usually of the form:
[metric] [operator] [value]
Expressions can be any length, but appear on a single line in the user interface. At least one expression is required.
This expression causes an alert to be triggered if the amount of installed RAM for a node is less than 24,000,000,000 bytes (24 GB).
node_memory_MemTotal_bytes < 24000000000
To see available metrics and to test Prometheus expressions, select the help icon and follow the link to the Metrics section of the Grid Management API.
In the Duration field, enter the amount of time a condition must continuously remain in effect before the alert is triggered, and select a unit of time.
To trigger an alert immediately when a condition becomes true, enter 0. Increase this value to prevent temporary conditions from triggering alerts.
The default is 5 minutes.
The dialog box closes, and the new custom alert rule appears in the Alert Rules table.