Skip to main content
Data Infrastructure Insights

Monitor Infrastructure Health

Contributors netapp-alavoie

Data Infrastructure Insights provides comprehensive infrastructure health monitoring that tracks the performance, capacity, configuration, and component status of your storage environment. Health scores are calculated based on monitor alerts across these categories, giving you a unified view of system health and enabling proactive issue resolution.

The Infrastructure Health dashboard

Note Monitoring Infrastructure Health is a Preview feature and is subject to change.

Navigate to Observability > Analyze and select Infrastructure Health. The dashboard provides an overview of your system health, based on monitor alert categories and scores as explained below. Set filters at the top to narrow down the focus of your investigation.

infrastructure health overview

By default, health scores are grouped by data center; you can select the grouping that works best for your session.

Configure Monitors to use for infrastructure health

Health scores are driven by alerts that are configured for inclusion in system health calculations.

When creating a monitor for an infrastructure object, you can choose whether to include alerts from the monitor in the calculations. At the bottom of the screen, expand the Advanced Configuration and select to Include in Infrastructure Health Calculation. Select a category to which to apply the calculation for the monitor:

  • Component Health - fan failure, service processor offline, etc.

  • Performance Health - high storage node utilization, abnormal spike in node latency, etc.

  • Capacity Health - storage Pool capacity approaching full, insufficient space for LUN snapshot, etc.

  • Configuration Health - cloud tier unreachable, SnapMirror relationship out of sync, etc.

monitor advanced configuration to add to health calculations

Health scores explained

Scores are presented on a scale of 0 to 100, with 100 being at full health. Monitored infrastructure objects currently or recently experiencing issues will lower this score according to the following weighted averages:

  • Components, Performance, or Capacity: 30% each

  • Configuration: 10%

Health scores are impacted by alerts generated by the monitors you configured to include in infrastructure health calculations in the following ways:

  • Critical alerts drop the health score by the full category weight

  • Warning alerts drop the score by half the category weight.

If any categories are not reporting, the weighted average will adjust accordingly.

For example: 1 critical alert on Components (-30) and 1 warning alert on Performance (50% of 30 = -15) yield a health score of 55 (100 minus 45).

When alerts are resolved, these health score reductions gradually fade, and the score fully recovers within 2 hours.