Responding to node resources overutilized performance events

Unified Manager generates node resources overutilized warning events when a single node is operating above the bounds of its operational efficiency, and therefore potentially affecting workload latencies. These system-defined events provide the opportunity to correct potential performance issues before many workloads are affected by latency.

Before you begin

About this task

Unified Manager generates warning events for node resources overutilized policy breaches by looking for nodes that are using more than 100% of their performance capacity for more than 30 minutes.

You can use System Manager or the ONTAP commands to correct this type of performance issue, including the following tasks:
  • Creating and applying a QoS policy to any volumes or LUNs that are overusing system resources
  • Reducing the QoS maximum throughput limit of a policy group to which workloads have been applied
  • Moving a workload to a different aggregate or node
  • Increasing capacity by adding disks to the node, or by upgrading to a node with a faster CPU and more RAM

Steps

  1. Display the System-Defined Threshold Event Details page to view information about the event.
  2. Under the Summary section, read the Description, which describes the threshold breach that caused the event.
    For example, the message "Perf. Capacity Used value of 139% on simplicity-02 has triggered a WARNING event to identify potential performance problems in the data processing unit." indicates that performance capacity on node simplicity-02 is overused and affecting node performance.
  3. Under the System Diagnosis section, review the three charts: one for performance capacity used on the node, one for average storage IOPS being used by the top workloads, and one for latency on the top workloads. When arranged in this way you can see which workloads are the cause of the latency on the node.
    You can view which workloads have QoS policies applied, and which do not, by moving your cursor over the IOPS chart.
  4. Under Suggested Actions section, review the suggestions and determine which actions you should perform to avoid an increase in latency for the workload.
    If required, click the Help button to view more details about the suggested actions you can perform to try and resolve the performance event.