Performance event analysis and notification

Performance events notify you about I/O performance issues on a volume workload caused by contention on a cluster component. Unified Manager analyzes the event to identify all workloads involved, the component in contention, and whether the event is still an issue that you might need to resolve.

Unified Manager monitors the I/O latency (response time) and IOPS (operations) for volumes on a cluster. When other workloads overuse a cluster component, for example, the component is in contention and cannot perform at an optimal level to meet workload demands. The performance of other workloads that are using the same component might be impacted, causing their latencies to increase. If the latency crosses the performance threshold, Unified Manager triggers a performance event and sends an email alert to notify you.

Event analysis

Unified Manager performs the following analyses, using the previous 15 days of performance statistics, to identify the victim workloads, bully workloads, and the cluster component involved in an event:

An event might occur for only a brief moment and then correct itself after the component it is using is no longer in contention. A continuous event is one that reoccurs for the same cluster component within a five-minute interval and remains in the new state. For continuous events, Unified Manager triggers an alert after detecting the same event during two consecutive analysis intervals. Events that remain unresolved, which have a state of new, can display different description messages as workloads involved in the event change.

When an event is resolved, it remains available in Unified Manager as part of the record of past performance issues for a volume. Each event has a unique ID that identifies the event type and the volumes, cluster, and cluster components involved.
Note: A single volume can be involved in more than one event at the same time.

Event state

Events can be in one of the following states:
New
Indicates that the performance event is currently active. The issue causing the event has not corrected itself or has not been resolved. The performance counter for the storage object remains above the performance threshold.
Obsolete
Indicates that the event is no longer active. The issue causing the event has corrected itself or has been resolved. The performance counter for the storage object is no longer above the performance threshold.

Event notification

The event alerts are displayed on the Dashboards/Overview page, Dashboards/Performance page, Performance/Volume Details page, and they are sent to specified email addresses. You can view detailed analysis information about an event and get suggestions for resolving it on the Dynamic Threshold Event Details page.


Single event on Latency chart in Performance Manager

In this example, an event is indicated by a red dot (Performance Manager incident icon) on the Latency chart on the Performance/Volume Details page. Hovering your mouse cursor over the red dot displays a popup with more details about the event and options for analyzing it.

Event interaction

On the Performance/Volume Details page, you can interact with events in the following ways: