Responding to a dynamic performance event caused by HA takeover

11/30/2022 Contributors

You can use Unified Manager to investigate a performance event caused by high data processing on a cluster node that is in a high-availability (HA) pair. You can also use Unified Manager to check the health of the nodes to see whether any recent health events detected on the nodes contributed to the performance event.

Before you begin

You must have the Operator, OnCommand Administrator, or Storage Administrator role.
There must be new, acknowledged, or obsolete performance events.

Steps

Display the Event details page to view information about the event.
Read the Description, which describes the workloads involved in the event and the cluster component in contention.

There is one victim volume whose latency was impacted by the cluster component in contention. The data processing node, which took over all workloads from its partner node, is the cluster component in contention. Under Component in Contention, the Data Processing icon is highlighted red and the name of the node that was handling data processing at the time of the event is displayed in parentheses.
In the Description, click the name of the victim volume.

The Performance/Volume Details page is displayed. At the bottom of the page, in the Events time line, a change event icon () indicates the time that Unified Manager detected the start of the HA takeover.
Point your cursor to the change event icon for the HA takeover.

Details about the HA takeover are displayed in the Events List table. In the Latency chart, an event indicates that the selected volume crossed the performance threshold due to high latency around the same time as the HA takeover.
Select Break down data by.
Under Latency, select Cluster Components.
Click Submit.

The Cluster Components chart is displayed. The chart breaks down the total latency by cluster component.
At the bottom of the page, point your mouse cursor to the change event icon for the start of the HA takeover.
In the Cluster Components chart, compare the latency for data processing to the total latency in the Latency chart.

At the time of the HA takeover, there was a spike in data processing from the increased workload demand on the data processing node. The increased CPU utilization drove up the latency and triggered the event.
After fixing the failed node, use OnCommand System Manager to perform an HA giveback, which moves the workloads from the partner node to the fixed node.
After the HA giveback is complete, in Unified Manager, search for the event ID you recorded in Step 2.

The event triggered by the HA takeover is displayed on the Event details page. The event now has a state of obsolete, which indicates that the event is resolved.
In the Description, click the name of the victim volume.

The Performance/Volume Details page is displayed. At the bottom of the page, in the Events time line, a change event icon indicates the time that Unified Manager detected the completion of the HA giveback.
Select Break down data by.
Under Latency, select Cluster Components.

The Cluster Components chart is displayed.
At the bottom of the page, point your cursor to the change event icon for the HA giveback.

The change event is highlighted in the Events List table and indicates that the HA giveback was completed successfully.
In the Cluster Components chart, compare the latency for data processing to the total latency in the Latency chart.

The latency at the data processing component has decreased, which has decreased the total latency. The node that the selected volume is now using for data processing has resolved the event.

Responding to a dynamic performance event caused by HA takeover

Creating your file...

Before you begin

Steps