Skip to main content

Monitor Trident

Contributors netapp-aruldeepa juliantap netapp-mwallis netapp-asubhas

Trident provides a set of Prometheus metrics endpoints that you can use to monitor Trident performance.

Overview

The metrics provided by Trident enable you to do the following:

  • Keep tabs on Trident's health and configuration. You can examine how successful operations are and if it can communicate with the backends as expected.

  • Examine backend usage information and understand how many volumes are provisioned on a backend and the amount of space consumed, and so on.

  • Maintain a mapping of the amount of volumes provisioned on available backends.

  • Track performance. You can take a look at how long it takes for Trident to communicate to backends and perform operations.

Note By default, Trident's metrics are exposed on the target port 8001 at the /metrics endpoint. These metrics are enabled by default when Trident is installed. You can configure to consume Trident metrics over HTTPS on port 8444 as well.
What you'll need

Step 1: Define a Prometheus target

You should define a Prometheus target to gather the metrics and obtain information about the backends Trident manages, the volumes it creates, and so on. See Prometheus Operator documentation.

Step 2: Create a Prometheus ServiceMonitor

To consume the Trident metrics, you should create a Prometheus ServiceMonitor that watches the trident-csi service and listens on the metrics port. A sample ServiceMonitor looks like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: trident-sm
  namespace: monitoring
  labels:
    release: prom-operator
spec:
  jobLabel: trident
  selector:
    matchLabels:
      app: controller.csi.trident.netapp.io
  namespaceSelector:
    matchNames:
      - trident
  endpoints:
    - port: metrics
      interval: 15s

This ServiceMonitor definition retrieves metrics returned by the trident-csi service and specifically looks for the metrics endpoint of the service. As a result, Prometheus is now configured to understand Trident's metrics.

In addition to metrics available directly from Trident, kubelet exposes many kubelet_volume_* metrics via it's own metrics endpoint. Kubelet can provide information about the volumes that are attached, and pods and other internal operations it handles. Refer to here.

Consume Trident metrics over HTTPS

To consume Trident metrics over HTTPS (port 8444), you must modify the ServiceMonitor definition to include TLS configuration. You also need to copy the trident-csi secret from the trident namespace to the namespace where Prometheus is running. You can do this using the following command:

kubectl get secret trident-csi -n trident -o yaml | sed 's/namespace: trident/namespace: monitoring/' | kubectl apply -f -

A sample ServiceMonitor for HTTPS metrics looks like this:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: trident-sm
  namespace: monitoring
  labels:
    release: prom-operator
spec:
  jobLabel: trident
  selector:
    matchLabels:
      app: controller.csi.trident.netapp.io
  namespaceSelector:
    matchNames:
      - trident
  endpoints:
    - interval: 15s
      path: /metrics
      port: https-metrics
      scheme: https
      tlsConfig:
        ca:
          secret:
            key: caCert
            name: trident-csi
        cert:
          secret:
            key: clientCert
            name: trident-csi
        keySecret:
          key: clientKey
          name: trident-csi
        serverName: trident-csi

Trident support HTTPS metrics in all the installation methods: tridentctl, Helm chart, and Operator:

  • If you are using the tridentctl install command, you can pass the --https-metrics flag to enable HTTPS metrics.

  • If you are using the Helm chart, you can set the httpsMetrics parameter to enable HTTPS metrics.

  • If you are using YAML files, you can add the --https_metrics flag to the trident-main container in the trident-deployment.yaml file.

Step 3: Query Trident metrics with PromQL

PromQL is good for creating expressions that return time-series or tabular data.

Here are some PromQL queries that you can use:

Get Trident health information

  • Percentage of HTTP 2XX responses from Trident

(sum (trident_rest_ops_seconds_total_count{status_code=~"2.."} OR on() vector(0)) / sum (trident_rest_ops_seconds_total_count)) * 100
  • Percentage of REST responses from Trident via status code

(sum (trident_rest_ops_seconds_total_count) by (status_code)  / scalar (sum (trident_rest_ops_seconds_total_count))) * 100
  • Average duration in ms of operations performed by Trident

sum by (operation) (trident_operation_duration_milliseconds_sum{success="true"}) / sum by (operation) (trident_operation_duration_milliseconds_count{success="true"})

Get Trident usage information

  • Average volume size

trident_volume_allocated_bytes/trident_volume_count
  • Total volume space provisioned by each backend

sum (trident_volume_allocated_bytes) by (backend_uuid)

Get individual volume usage

Note This is enabled only if kubelet metrics are also gathered.
  • Percentage of used space for each volume

kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100

Learn about Trident AutoSupport telemetry

By default, Trident sends Prometheus metrics and basic backend information to NetApp on a daily cadence.

  • To stop Trident from sending Prometheus metrics and basic backend information to NetApp, pass the --silence-autosupport flag during Trident installation.

  • Trident can also send container logs to NetApp Support on-demand via tridentctl send autosupport. You will need to trigger Trident to upload it's logs. Before you submit logs, you should accept NetApp's
    privacy policy.

  • Unless specified, Trident fetches the logs from the past 24 hours.

  • You can specify the log retention time frame with the --since flag. For example: tridentctl send autosupport --since=1h. This information is collected and sent via a trident-autosupport container
    that is installed alongside Trident. You can obtain the container image at Trident AutoSupport.

  • Trident AutoSupport does not gather or transmit Personally Identifiable Information (PII) or Personal Information. It comes with a EULA that is not applicable to the Trident container image itself. You can learn more about NetApp's commitment to data security and trust here.

An example payload sent by Trident looks like this:

---
items:
  - backendUUID: ff3852e1-18a5-4df4-b2d3-f59f829627ed
    protocol: file
    config:
      version: 1
      storageDriverName: ontap-nas
      debug: false
      debugTraceFlags: null
      disableDelete: false
      serialNumbers:
        - nwkvzfanek_SN
      limitVolumeSize: ""
    state: online
    online: true
  • The AutoSupport messages are sent to NetApp's AutoSupport endpoint. If you are using a private registry to store container images, you can use the --image-registry flag.

  • You can also configure proxy URLs by generating the installation YAML files. This can be done by using tridentctl install --generate-custom-yaml to create the YAML files and adding the --proxy-url argument for the trident-autosupport container in trident-deployment.yaml.

Disable Trident metrics

To disable metrics from being reported, you should generate custom YAMLs (using the --generate-custom-yaml flag) and edit them to remove the --metrics flag from being invoked for the trident-main
container.