Skip to main content

Monitor Trident protect resources

Contributors netapp-mwallis

You can use the kube-state-metrics, Prometheus, and Alertmanager open source tools to monitor the health of the resources protected by Trident protect.

The kube-state-metrics service generates metrics from Kubernetes API communication. Using it with Trident protect exposes useful information about the state of resources in your environment.

Prometheus is a toolkit that can ingest the data generated by kube-state-metrics and present it as easily readable information about these objects. Together, kube-state-metrics and Prometheus provide a way for you to monitor the health and status of the resources you are managing with Trident protect.

Alertmanager is a service that ingests the alerts sent by tools such as Prometheus and routes them to destinations that you configure.

Note

The configurations and guidance included in these steps are only examples; you need to customize them to match your environment. Refer to the following official documentation for specific instructions and support:

Step 1: Install the monitoring tools

To enable resource monitoring in Trident protect, you need to install and configure kube-state-metrics, Promethus, and Alertmanager.

Install kube-state-metrics

You can install kube-state-metrics using Helm.

Steps
  1. Add the kube-state-metrics Helm chart. For example:

    helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
    helm repo update
    Console
  2. Create a configuration file for the Helm chart (for example, metrics-config.yaml). You can customize the following example configuration to match your environment:

    metrics-config.yaml: kube-state-metrics Helm chart configuration
    ---
    extraArgs:
      # Collect only custom metrics
      - --custom-resource-state-only=true
    
    customResourceState:
      enabled: true
      config:
        kind: CustomResourceStateMetrics
        spec:
          resources:
          - groupVersionKind:
              group: astra.netapp.io
              kind: "Backup"
              version: "v1"
            labelsFromPath:
              backup_uid: [metadata, uid]
              backup_name: [metadata, name]
              creation_time: [metadata, creationTimestamp]
            metrics:
            - name: backup_info
              help: "Exposes details about the Backup state"
              each:
                type: Info
                info:
                  labelsFromPath:
                    appVaultReference: ["spec", "appVaultRef"]
                    appReference: ["spec", "applicationRef"]
    rbac:
      extraRules:
      - apiGroups: ["backups.protect.trident.netapp.io"]
        resources: ["backups"]
        verbs: ["list", "watch"]
    
    # Collect metrics from all namespaces
    namespaces: ""
    
    # Ensure that the metrics are collected by Prometheus
    prometheus:
      monitor:
        enabled: true
    YAML
  3. Install kube-state-metrics by deploying the Helm chart. For example:

    helm install custom-resource -f metrics-config.yaml prometheus-community/kube-state-metrics --version 5.21.0
    Console
  4. Configure kube-state-metrics to generate metrics for the custom resources used by Trident protect by following the instructions in the kube-state-metrics custom resource documentation.

Install Prometheus

You can install Prometheus by following the instructions in the Prometheus documentation.

Install Alertmanager

You can install Alertmanager by following the instructions in the Alertmanager documentation.

Step 2: Configure the monitoring tools to work together

After you install the monitoring tools, you need to configure them to work together.

Steps
  1. Integrate kube-state-metrics with Prometheus. Edit the Prometheus configuration file (prometheus.yml) and add the kube-state-metrics service information. For example:

    prometheus.yml: kube-state-metrics service integration with Prometheus
    ---
    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: prometheus-config
      namespace: astra-connector
    data:
      prometheus.yml: |
        global:
          scrape_interval: 15s
        scrape_configs:
          - job_name: 'kube-state-metrics'
            static_configs:
              - targets: ['kube-state-metrics.astra-connector.svc:8080']
    YAML
  2. Configure Prometheus to route alerts to Alertmanager. Edit the Prometheus configuration file (prometheus.yml) and add the following section:

    prometheus.yml: Send alerts to Alertmanager
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
                - alertmanager.astra-connector.svc:9093
    YAML
Result

Prometheus can now gather metrics from kube-state-metrics, and can send alerts to Alertmanager. You are now ready to configure what conditions trigger an alert and where the alerts should be sent.

Step 3: Configure alerts and alert destinations

After you configure the tools to work together, you need to configure what type of information triggers alerts, and where the alerts should be sent.

Alert example: backup failure

The following example defines a critical alert that is triggered when the status of the backup custom resource is set to Error for 5 seconds or longer. You can customize this example to match your environment, and include this YAML snippet in your prometheus.yml configuration file:

rules.yml: Define a Prometheus alert for failed backups
rules.yml: |
  groups:
    - name: fail-backup
        rules:
          - alert: BackupFailed
            expr: kube_customresource_backup_info{status="Error"}
            for: 5s
            labels:
              severity: critical
            annotations:
              summary: "Backup failed"
              description: "A backup has failed."
YAML

Configure Alertmanager to send alerts to other channels

You can configure Alertmanager to send notifications to other channels, such as e-mail, PagerDuty, Microsoft Teams, or other notification services by specifying the respective configuration in the alertmanager.yml file.

The following example configures Alertmanager to send notifications to a Slack channel. To customize this example to your environment, replace the value of the api_url key with the Slack webhook URL used in your environment:

alertmanager.yml: Send alerts to a Slack channel
data:
  alertmanager.yml: |
    global:
      resolve_timeout: 5m
    route:
      receiver: 'slack-notifications'
    receivers:
      - name: 'slack-notifications'
        slack_configs:
          - api_url: '<your-slack-webhook-url>'
            channel: '#failed-backups-channel'
            send_resolved: false
YAML