Ways to respond to system health alerts

Contributors

When a system health alert occurs, you can acknowledge it, learn more about it, repair the underlying condition, and prevent it from occurring again.

When a health monitor raises an alert, you can respond in any of the following ways:

  • Get information about the alert, which includes the affected resource, alert severity, probable cause, possible effect, and corrective actions.

  • Get detailed information about the alert, such as the time when the alert was raised and whether anyone else has acknowledged the alert already.

  • Get health-related information about the state of the affected resource or subsystem, such as a specific shelf or disk.

  • Acknowledge the alert to indicate that someone is working on the problem, and identify yourself as the “Acknowledger.”

  • Resolve the problem by taking the corrective actions provided in the alert, such as fixing cabling to resolve a connectivity problem.

  • Delete the alert, if the system did not automatically clear it.

  • Suppress an alert to prevent it from affecting the health status of a subsystem.

    Suppressing is useful when you understand a problem. After you suppress an alert, it can still occur, but the subsystem health displays as “ok-with-suppressed.” when the suppressed alert occurs.