Skip to main content

Monitor Trident

Contributors juliantap netapp-aruldeepa netapp-asubhas

Trident provides a set of Prometheus metrics endpoints that you can use to monitor Trident performance.

Overview

The metrics provided by Trident enable you to do the following:

  • Keep tabs on Trident's health and configuration. You can examine how successful operations are and if it can communicate with the backends as expected.

  • Examine backend usage information and understand how many volumes are provisioned on a backend and the amount of space consumed, and so on.

  • Maintain a mapping of the amount of volumes provisioned on available backends.

  • Track performance. You can take a look at how long it takes for Trident to communicate to backends and perform operations.

Note By default, Trident's metrics are exposed on the target port 8001 at the /metrics endpoint. These metrics are enabled by default when Trident is installed.
What you'll need

Step 1: Define a Prometheus target

You should define a Prometheus target to gather the metrics and obtain information about the backends Trident manages, the volumes it creates, and so on. This blog explains how you can use Prometheus and Grafana with Trident to retrieve metrics. The blog explains how you can run Prometheus as an operator in your Kubernetes cluster and the creation of a ServiceMonitor to obtain Trident metrics.

Step 2: Create a Prometheus ServiceMonitor

To consume the Trident metrics, you should create a Prometheus ServiceMonitor that watches the trident-csi service and listens on the metrics port. A sample ServiceMonitor looks like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: trident-sm
  namespace: monitoring
  labels:
      release: prom-operator
spec:
  jobLabel: trident
  selector:
    matchLabels:
      app: controller.csi.trident.netapp.io
  namespaceSelector:
    matchNames:
    - trident
  endpoints:
  - port: metrics
    interval: 15s

This ServiceMonitor definition retrieves metrics returned by the trident-csi service and specifically looks for the metrics endpoint of the service. As a result, Prometheus is now configured to understand Trident's
metrics.

In addition to metrics available directly from Trident, kubelet exposes many kubelet_volume_* metrics via it's own metrics endpoint. Kubelet can provide information about the volumes that are attached, and pods and other internal operations it handles. Refer to here.

Step 3: Query Trident metrics with PromQL

PromQL is good for creating expressions that return time-series or tabular data.

Here are some PromQL queries that you can use:

Get Trident health information

  • Percentage of HTTP 2XX responses from Trident

(sum (trident_rest_ops_seconds_total_count{status_code=~"2.."} OR on() vector(0)) / sum (trident_rest_ops_seconds_total_count)) * 100
  • Percentage of REST responses from Trident via status code

(sum (trident_rest_ops_seconds_total_count) by (status_code)  / scalar (sum (trident_rest_ops_seconds_total_count))) * 100
  • Average duration in ms of operations performed by Trident

sum by (operation) (trident_operation_duration_milliseconds_sum{success="true"}) / sum by (operation) (trident_operation_duration_milliseconds_count{success="true"})

Get Trident usage information

  • Average volume size

trident_volume_allocated_bytes/trident_volume_count
  • Total volume space provisioned by each backend

sum (trident_volume_allocated_bytes) by (backend_uuid)

Get individual volume usage

Note This is enabled only if kubelet metrics are also gathered.
  • Percentage of used space for each volume

kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100

Learn about Trident AutoSupport telemetry

By default, Trident sends Prometheus metrics and basic backend information to NetApp on a daily cadence.

  • To stop Trident from sending Prometheus metrics and basic backend information to NetApp, pass the --silence-autosupport flag during Trident installation.

  • Trident can also send container logs to NetApp Support on-demand via tridentctl send autosupport. You will need to trigger Trident to upload it's logs. Before you submit logs, you should accept NetApp's
    privacy policy.

  • Unless specified, Trident fetches the logs from the past 24 hours.

  • You can specify the log retention time frame with the --since flag. For example: tridentctl send autosupport --since=1h. This information is collected and sent via a trident-autosupport container
    that is installed alongside Trident. You can obtain the container image at Trident AutoSupport.

  • Trident AutoSupport does not gather or transmit Personally Identifiable Information (PII) or Personal Information. It comes with a EULA that is not applicable to the Trident container image itself. You can learn more about NetApp's commitment to data security and trust here.

An example payload sent by Trident looks like this:

---
items:
- backendUUID: ff3852e1-18a5-4df4-b2d3-f59f829627ed
  protocol: file
  config:
    version: 1
    storageDriverName: ontap-nas
    debug: false
    debugTraceFlags:
    disableDelete: false
    serialNumbers:
    - nwkvzfanek_SN
    limitVolumeSize: ''
  state: online
  online: true
  • The AutoSupport messages are sent to NetApp's AutoSupport endpoint. If you are using a private registry to store container images, you can use the --image-registry flag.

  • You can also configure proxy URLs by generating the installation YAML files. This can be done by using tridentctl install --generate-custom-yaml to create the YAML files and adding the --proxy-url argument for the trident-autosupport container in trident-deployment.yaml.

Disable Trident metrics

To disable metrics from being reported, you should generate custom YAMLs (using the --generate-custom-yaml flag) and edit them to remove the --metrics flag from being invoked for the trident-main
container.