Monitor Trident
Trident provides a set of Prometheus metrics endpoints that you can use to monitor Trident performance.
Overview
The metrics provided by Trident enable you to do the following:
-
Keep tabs on Trident's health and configuration. You can examine how successful operations are and if it can communicate with the backends as expected.
-
Examine backend usage information and understand how many volumes are provisioned on a backend and the amount of space consumed, and so on.
-
Maintain a mapping of the amount of volumes provisioned on available backends.
-
Track performance. You can take a look at how long it takes for Trident to communicate to backends and perform operations.
By default, Trident's metrics are exposed on the target port 8001 at the /metrics endpoint. These metrics are enabled by default when Trident is installed.
|
-
A Kubernetes cluster with Trident installed.
-
A Prometheus instance. This can be a containerized Prometheus deployment or you can choose to run Prometheus as a native application.
Step 1: Define a Prometheus target
You should define a Prometheus target to gather the metrics and obtain information about the backends Trident manages, the volumes it creates, and so on. This blog explains how you can use Prometheus and Grafana with Trident to retrieve metrics. The blog explains how you can run Prometheus as an operator in your Kubernetes cluster and the creation of a ServiceMonitor to obtain Trident metrics.
Step 2: Create a Prometheus ServiceMonitor
To consume the Trident metrics, you should create a Prometheus ServiceMonitor that watches the trident-csi
service and listens on the metrics
port. A sample ServiceMonitor looks like this:
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: trident-sm namespace: monitoring labels: release: prom-operator spec: jobLabel: trident selector: matchLabels: app: controller.csi.trident.netapp.io namespaceSelector: matchNames: - trident endpoints: - port: metrics interval: 15s
This ServiceMonitor definition retrieves metrics returned by the trident-csi
service and specifically looks for the metrics
endpoint of the service. As a result, Prometheus is now configured to understand Trident's
metrics.
In addition to metrics available directly from Trident, kubelet exposes many kubelet_volume_*
metrics via it's own metrics endpoint. Kubelet can provide information about the volumes that are attached, and pods and other internal operations it handles. Refer to here.
Step 3: Query Trident metrics with PromQL
PromQL is good for creating expressions that return time-series or tabular data.
Here are some PromQL queries that you can use:
Get Trident health information
-
Percentage of HTTP 2XX responses from Trident
(sum (trident_rest_ops_seconds_total_count{status_code=~"2.."} OR on() vector(0)) / sum (trident_rest_ops_seconds_total_count)) * 100
-
Percentage of REST responses from Trident via status code
(sum (trident_rest_ops_seconds_total_count) by (status_code) / scalar (sum (trident_rest_ops_seconds_total_count))) * 100
-
Average duration in ms of operations performed by Trident
sum by (operation) (trident_operation_duration_milliseconds_sum{success="true"}) / sum by (operation) (trident_operation_duration_milliseconds_count{success="true"})
Get Trident usage information
-
Average volume size
trident_volume_allocated_bytes/trident_volume_count
-
Total volume space provisioned by each backend
sum (trident_volume_allocated_bytes) by (backend_uuid)
Get individual volume usage
This is enabled only if kubelet metrics are also gathered. |
-
Percentage of used space for each volume
kubelet_volume_stats_used_bytes / kubelet_volume_stats_capacity_bytes * 100
Learn about Trident AutoSupport telemetry
By default, Trident sends Prometheus metrics and basic backend information to NetApp on a daily cadence.
-
To stop Trident from sending Prometheus metrics and basic backend information to NetApp, pass the
--silence-autosupport
flag during Trident installation. -
Trident can also send container logs to NetApp Support on-demand via
tridentctl send autosupport
. You will need to trigger Trident to upload it's logs. Before you submit logs, you should accept NetApp's
privacy policy. -
Unless specified, Trident fetches the logs from the past 24 hours.
-
You can specify the log retention time frame with the
--since
flag. For example:tridentctl send autosupport --since=1h
. This information is collected and sent via atrident-autosupport
container
that is installed alongside Trident. You can obtain the container image at Trident AutoSupport. -
Trident AutoSupport does not gather or transmit Personally Identifiable Information (PII) or Personal Information. It comes with a EULA that is not applicable to the Trident container image itself. You can learn more about NetApp's commitment to data security and trust here.
An example payload sent by Trident looks like this:
--- items: - backendUUID: ff3852e1-18a5-4df4-b2d3-f59f829627ed protocol: file config: version: 1 storageDriverName: ontap-nas debug: false debugTraceFlags: disableDelete: false serialNumbers: - nwkvzfanek_SN limitVolumeSize: '' state: online online: true
-
The AutoSupport messages are sent to NetApp's AutoSupport endpoint. If you are using a private registry to store container images, you can use the
--image-registry
flag. -
You can also configure proxy URLs by generating the installation YAML files. This can be done by using
tridentctl install --generate-custom-yaml
to create the YAML files and adding the--proxy-url
argument for thetrident-autosupport
container intrident-deployment.yaml
.
Disable Trident metrics
To disable metrics from being reported, you should generate custom YAMLs (using the --generate-custom-yaml
flag) and edit them to remove the --metrics
flag from being invoked for the trident-main
container.