Responding to QoS policy group performance events

Unified Manager generates QoS policy warning events when workload throughput (IOPS, IOPS/TB, or MBps) has exceeded the defined QoS policy setting and workload latency is becoming affected. These system-defined events provide the opportunity to correct potential performance issues before many workloads are affected by latency.

Before you begin

About this task

Unified Manager generates warning events for QoS policy breaches when workload throughput has exceeded the defined QoS policy setting during each performance collection period for the previous hour. Workload throughput may exceed the QoS threshold for only a short period of time during each collection period, but Unified Manager displays only the "average" throughput during the collection period on the chart. For this reason you may receive QoS events while the throughput for a workload might not have crossed the policy threshold shown in the chart.

You can use System Manager or the ONTAP commands to manage policy groups, including the following tasks:
  • Creating a new policy group for the workload
  • Adding or removing workloads in a policy group
  • Moving a workload between policy groups
  • Changing the throughput limit of a policy group
  • Moving a workload to a different aggregate or node

Steps

  1. Display the System-Defined Threshold Event Details page to view information about the event.
  2. Review the top of the page to view the storage object, for example, LUN or volume, on which the event occurred.
  3. Under the Summary section, read the Description, which describes the threshold breach that caused the event.
    For example, the message "IOPS value of 1,352 IOPS on vol1_NFS1 has triggered a WARNING event to identify potential performance problems for the workload" indicates that a QoS Max IOPS event occurred on volume vol1_NFS1.
  4. Under the System Diagnosis section, review the two charts: one for total average IOPS or MBps (depending on the event), and one for latency. When arranged this way you can see which cluster components are most affecting latency when the workload approached the QoS max limit.
    Note that for adaptive QoS policy events that the IOPS chart shows IOPS values that have been converted from the assigned IOPS/TB threshold policy based on the size of the volume.
  5. Under Suggested Actions section, review the suggestions and determine which actions you should perform to avoid an increase in latency for the workload.
    If required, click the Help button to view more details about the suggested actions you can perform to try and resolve the performance event.