Learn about latency monitoring in Workload Factory for EDA

07/07/2026 Contributors

Latency monitoring in Workload Factory for EDA helps you find and fix performance slowdowns in your FSx for ONTAP volumes. It tracks read and write latency using CloudWatch metrics and automatically analyzes the data to help identify the cause of performance issues.

How latency monitoring works

Latency analysis collects CloudWatch metrics for read and write activity on all FSx for ONTAP volumes connected to your AWS account. It continuously checks these metrics against defined limits to detect performance problems early.

If latency rises, Workload Factory automatically reviews ONTAP QoS delay metrics to identify the main cause of the slowdown. For more complex issues involving data or cluster components, you can run an optional AI analysis that provides the likely root cause, identifies affected clients, and suggests steps to resolve the problem.

Alert generation

An alert triggers only when these conditions are true for the entire selected time range: latency stays above its threshold and IOPS stays above its threshold. Requiring both reduces false alarms by ensuring that high latency occurs while the system is handling real workload.

You can configure separate thresholds for:

Read operations
Write operations
Warning severity
Critical severity

All detected events appear in the latency events table. If notifications are set up, you also receive an email or Amazon SNS message with details about the affected volumes. You can control how often you receive notifications—either daily per file system or every 20 minutes.

Understanding alerts

Understanding how alerts are triggered helps you configure appropriate thresholds and interpret the results.

Metrics collected

The system collects the following CloudWatch metrics for each volume:

Read latency threshold: Calculated as 1000 * m2/(m1+0.000001) where m1 = DataReadOperations and m2 = DataReadOperationTime
Write latency threshold: Calculated as 1000 * m2/(m1+0.000001) where m1 = DataWriteOperations and m2 = DataWriteOperationTime

Alert trigger conditions

An alert is triggered when all of the following conditions are met:

The latency threshold is exceeded for the operation type (read or write).
The IOPS threshold is exceeded for the operation type.
Both conditions persist for all data points within the configured time range.

For example, with default warning thresholds, a read alert triggers only if read latency exceeds 6 ms AND read IOPS exceeds 100 ops/sec for all data points within a 10-minute period.

Event severity

Warning events: Indicate elevated latency that might need attention
Critical events: Indicate severe latency that requires immediate investigation

Latency analysis

Workload Factory provides two levels of analysis to help you troubleshoot latency issues.

Basic analysis

When a latency event occurs, Workload Factory automatically runs a basic analysis to find the cause. It uses ONTAP QoS delay center metrics to see which component is responsible for the slowdown, such as FlexCache, the capacity pool, QoS limits, disks, data, the cluster, or another subsystem. This quickly identifies the source of the latency without requiring manual investigation.

You can see a component breakdown only when a link is associated with the FSx for ONTAP file system. If there is no link, you can still view graphs for latency, IOPS, and throughput.

Latency values from ONTAP QoS analysis and CloudWatch might differ slightly because they collect data in different ways. The basic analysis uses ONTAP data to identify the root cause.

AI analysis

While basic analysis can identify the source of latency, more complex situations involving data or cluster components often need deeper investigation. AI analysis provides this deeper troubleshooting by finding problems such as overloaded volumes, poor configuration, or the need to add more capacity—issues that basic analysis may miss.

When you run AI analysis, the system provides:

Potential root cause: Detailed explanation of what's causing the latency issue
Affected clients: List of EC2 instance names impacted by the latency
Potential remediation steps: Two or more specific actions to resolve the issue

AI analysis requires an Amazon Bedrock model ARN in your Workload Factory settings. If Bedrock isn't set up, you can still use latency monitoring and basic automated analysis.