How health monitoring works

Contributors

Individual health monitors have a set of policies that trigger alerts when certain conditions occur. Understanding how health monitoring works can help you respond to problems and control future alerts.

Health monitoring consists of the following components:

  • Individual health monitors for specific subsystems, each of which has its own health status

    For example, the Storage subsystem has a node connectivity health monitor.

  • An overall system health monitor that consolidates the health status of the individual health monitors

    A degraded status in any single subsystem results in a degraded status for the entire system. If no subsystems have alerts, the overall system status is OK.

Each health monitor is made up of the following key elements:

  • Alerts that the health monitor can potentially raise

    Each alert has a definition, which includes details such as the severity of the alert and its probable cause.

  • Health policies that identify when each alert is triggered

    Each health policy has a rule expression, which is the exact condition or change that triggers the alert.

A health monitor continuously monitors and validates the resources in its subsystem for condition or state changes. When a condition or state change matches a rule expression in a health policy, the health monitor raises an alert. An alert causes the subsystem’s health status and the overall system health status to become degraded.