Skip to main content
Data Infrastructure Insights
本繁體中文版使用機器翻譯,譯文僅供參考,若與英文版本牴觸,應以英文版本為準。

Kubernetes 監控操作員組態選項

貢獻者

"Kubernetes 監控營運者"您可以自訂組態。

下表列出 AgentConfiguration 檔案的可能選項:

元件 選項 說明

代理程式

操作員可以安裝的所有元件通用的組態選項。這些選項可視為「整體」選項。

dockerRepo

相較於 Data Infrastructure Insights 泊塢視窗 repo 、 dockerRepo 會置換以從「客戶」的「私有」泊塢視窗中提取影像。預設為 Data Infrastructure Insights 泊塢視窗 repo

dockerImagePullSecret

選用:客戶的秘密私人回購

叢集名稱

可唯一識別所有客戶叢集的任意文字欄位。這在 Data Infrastructure Insights 租戶中應該是獨一無二的。預設是客戶在 UI 中輸入的「叢集名稱」欄位

Proxy 格式: Proxy :伺服器:連接埠:使用者名稱:密碼: NoProxy :啟用 ISTelegrafProxy: 啟用 isAuProxy: 啟用 isFluentbitProxy: 啟用 isCollectorProxy: 啟用 isCollectorProxy:

可選擇設定 Proxy 。這通常是客戶的公司代理。

Telegraf

可自訂電信業者安裝的組態選項

CollectionInterval

度量收集時間間隔(以秒為單位)(最大 = 60 秒)

dsCpuLimit

Telegraf DS 的 CPU 限制

dsMemLimit

Telegraf DS 的記憶體限制

dsCpuRequest

對 Telegraf DS 的 CPU 要求

dsMemRequest

對 Telegraf DS 的記憶體要求

rsCpuLimit

Telegraf RS 的 CPU 限制

rsMemLimit

Telegraf RS 的記憶體限制

rsCpuRequest

適用於 Telegraf RS 的 CPU 要求

rsMemRequest

對 Telegraf RS 的記憶體要求

RunPrivileged

在特權模式下運行 Telegraf demonSet 的 _telegraf-mountstats-poller-Container 。如果在 Kubernetes 節點上啟用 SELinux 、請將此設定為 True 。

RunDsPrivileged

將 RunDsPrivileged 設為 true 、以特權模式執行 Telgraf demonSet 的 Telgraf 容器。

批次大小

請參閱 "Telegraf 組態文件"

bufferLimit

請參閱 "Telegraf 組態文件"

圓週期間隔

請參閱 "Telegraf 組態文件"

CollectionJitter

請參閱 "Telegraf 組態文件"

精度

請參閱 "Telegraf 組態文件"

FlushInterval

請參閱 "Telegraf 組態文件"

FlushJitter

請參閱 "Telegraf 組態文件"

輸出逾時

請參閱 "Telegraf 組態文件"

dsTolerations

Telegraf-DS 額外的容忍度。

RsTolerations

Telegraf-RS 額外容忍度。

skipProcessorsAfterAggreaters

請參閱 "Telegraf 組態文件"

未受保護

請參閱本"已知的 Telegraf 問題"。設定 _ 無保護 _ 會指示 Kubernetes 監控營運者以旗標執行 Telegraf 。 --unprotected

Kube-state 指標

可自訂操作員的 kbe 狀態度量安裝的組態選項

cpuLimit

kube 狀態度量部署的 CPU 限制

MemLimit

kube 狀態度量部署的記憶體限制

cpuRequest

CPU 要求進行 kube 狀態指標部署

MemRequest

kube 狀態指標部署的記憶體要求

資源

要擷取的資源清單以逗號分隔。例如: cronjobs , daemonsets ,部署,擷取,工作,命名空間,節點,持續磁碟區,持續磁碟區,群組,複本集,資源等量,服務,完整狀態集

公差

Kube-state - 衡量其他容忍度。

標籤

以逗號分隔的資源清單應擷取 + 範例: cronjobs=[],daemonsets=[],plasing=[],inges=[],jobs=[],vedemas=[],nodes=[],sterentvolumes=[,equpos=[],equas*[*

記錄

可自訂記錄收集和安裝操作員的組態選項

readFromHead

是非題、應能流暢地從標頭讀取記錄

逾時

逾時、以秒為單位

dnsMode

TCP/UDP 、 DNS 模式

流暢的位元容忍度

Fluent-bit-DS 額外公差。

事件導出者容忍度

事件導出者額外容忍度。

事件導出者 -maxEventAgeSeconds

事件導出者最大事件發生時間。請參閱 https://github.com/jkroepke/resmoio-kubernetes-event-exporter

工作負載對應

可自訂工作負載對應集合及安裝 Operator 的組態選項。

cpuLimit

Net 觀察者 DS 的 CPU 上限

MemLimit

net 觀察者 DS 的記憶體限制

cpuRequest

CPU 要求取得 Net 觀察者 DS

MemRequest

net 觀察者 DS 的記憶體要求

MetricAggergationInterval.

度量集合時間間隔(以秒為單位)

bpfPollInterval.

BPF 輪詢時間間隔(秒)

enabledDNSookup

是非題、啟用 DNS 查詢

L4-公差

net-觀察者 -L4-DS 額外容忍度。

RunPrivileged

是非題:如果在 Kubernetes 節點上啟用 SELinux 、請將 RunPrivileged 設為 true 。

變更管理

Kubernetes 變更管理與分析的組態選項

cpuLimit

change-觀察者 water-RS 的 CPU 上限

MemLimit

change-觀察者 water-RS 的記憶體限制

cpuRequest

CPU 要求變更觀察者手錶 -RS

MemRequest

mem 要求 change-觀察者 water-RS

Failure宣言 IntermalMins

未成功部署工作負載的時間間隔(以分鐘為單位)將標示為失敗

deployAggrIntervalSeconds

工作負載部署進行中事件的傳送頻率

NonWorkloadAggrIntervalSeconds

非工作負載部署的組合與傳送頻率

termsToRedact

env 名稱和資料對應中使用的一組規則運算式,其值將會被編修範例詞彙: "pwd" , "password" , "toke" , "apikey" , "api-key" , "JWt"

其他 KindsToWatch

以逗號分隔的其他種類清單、可從收集器所監控的預設種類集觀看

KindsToIgnoreFromWatch

從收集器所監控的預設種類集中、忽略的種類清單、以逗號分隔

LogRecordAggrIntervalSeconds

從收集器傳送記錄至 CI 的頻率

監看容忍度

change-觀察者 water-DS 額外容忍度。僅限精簡單行格式。範例: ' { key : tint1 、 operator : Exists 、 effect : NoSchedule } 、 { key : tint2 、 operator : Exists 、 effect : NoExecute } '

AgentConfiguration 檔案範例

以下是 AgentConfiguration 檔案範例。

apiVersion: monitoring.netapp.com/v1alpha1
kind: AgentConfiguration
metadata:
  name: netapp-ci-monitoring-configuration
  namespace: "netapp-monitoring"
  labels:
    installed-by: nkmo-netapp-monitoring

spec:
  # # You can modify the following fields to configure the operator.
  # # Optional settings are commented out and include default values for reference
  # #   To update them, uncomment the line, change the value, and apply the updated AgentConfiguration.
  agent:
    # # [Required Field] A uniquely identifiable user-friendly clustername.
    # # clusterName must be unique across all clusters in your Data Infrastructure Insights environment.
    clusterName: "my_cluster"

    # # Proxy settings. The proxy that the operator should use to send metrics to Data Infrastructure Insights.
    # # Please see documentation here: https://docs.netapp.com/us-en/cloudinsights/task_config_telegraf_agent_k8s.html#configuring-proxy-support
    # proxy:
    #   server:
    #   port:
    #   noproxy:
    #   username:
    #   password:
    #   isTelegrafProxyEnabled:
    #   isFluentbitProxyEnabled:
    #   isCollectorsProxyEnabled:

    # # [Required Field] By default, the operator uses the CI repository.
    # # To use a private repository, change this field to your repository name.
    # # Please see documentation here: https://docs.netapp.com/us-en/cloudinsights/task_config_telegraf_agent_k8s.html#using-a-custom-or-private-docker-repository
    dockerRepo: 'docker.c01.cloudinsights.netapp.com'
    # # [Required Field] The name of the imagePullSecret for dockerRepo.
    # # If you are using a private repository, change this field from 'netapp-ci-docker' to the name of your secret.
    dockerImagePullSecret: 'netapp-ci-docker'

    # # Allow the operator to automatically rotate its ApiKey before expiration.
    # tokenRotationEnabled: 'true'
    # # Number of days before expiration that the ApiKey should be rotated. This must be less than the total ApiKey duration.
    # tokenRotationThresholdDays: '30'

  telegraf:
    # # Settings to fine-tune metrics data collection. Telegraf config names are included in parenthesis.
    # # See https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md#agent

    # # The default time telegraf will wait between inputs for all plugins (interval). Max=60
    # collectionInterval: '60s'
    # # Maximum number of records per output that telegraf will write in one batch (metric_batch_size).
    # batchSize: '10000'
    # # Maximum number of records per output that telegraf will cache pending a successful write (metric_buffer_limit).
    # bufferLimit: '150000'
    # # Collect metrics on multiples of interval (round_interval).
    # roundInterval: 'true'
    # # Each plugin waits a random amount of time between the scheduled collection time and that time + collection_jitter before collecting inputs (collection_jitter).
    # collectionJitter: '0s'
    # # Collected metrics are rounded to the precision specified. When set to "0s" precision will be set by the units specified by interval (precision).
    # precision: '0s'
    # # Time telegraf will wait between writing outputs (flush_interval). Max=collectionInterval
    # flushInterval: '60s'
    # # Each output waits a random amount of time between the scheduled write time and that time + flush_jitter before writing outputs (flush_jitter).
    # flushJitter: '0s'
    # # Timeout for writing to outputs (timeout).
    # outputTimeout: '5s'

    # # telegraf-ds CPU/Mem limits and requests.
    # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    # dsCpuLimit: '750m'
    # dsMemLimit: '800Mi'
    # dsCpuRequest: '100m'
    # dsMemRequest: '500Mi'

    # # telegraf-rs CPU/Mem limits and requests.
    # rsCpuLimit: '3'
    # rsMemLimit: '4Gi'
    # rsCpuRequest: '100m'
    # rsMemRequest: '500Mi'

    # # Skip second run of processors after aggregators
    # skipProcessorsAfterAggregators: 'true'

    # # telegraf additional tolerations. Use the following abbreviated single line format only.
    # # Inspect telegraf-rs/-ds to view tolerations which are always present.
    # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}'
    # dsTolerations: ''
    # rsTolerations: ''


    # If telegraf warns of insufficient lockable memory, try increasing the limit of lockable memory for Telegraf in the underlying operating system/node.  If increasing the limit is not an option, set this to true to instruct Telegraf to not attempt to reserve locked memory pages.  While this might pose a security risk as decrypted secrets might be swapped out to disk, it allows for execution in environments where reserving locked memory is not possible.
    # unprotected: 'false'

    # # Run the telegraf DaemonSet's telegraf-mountstats-poller container in privileged mode.  Set runPrivileged to true if SELinux is enabled on your Kubernetes nodes.
    # runPrivileged: '{{ .Values.telegraf_installer.kubernetes.privileged_mode }}'

    # # Set runDsPrivileged to true to run the telegraf DaemonSet's telegraf container in privileged mode
    # runDsPrivileged: '{{ .Values.telegraf_installer.kubernetes.ds.privileged_mode }}'

    # # Collect container Block IO metrics.
    # dsBlockIOEnabled: 'true'

    # # Collect NFS IO metrics.
    # dsNfsIOEnabled: 'true'

    # # Collect kubernetes.system_container metrics and objects in the kube-system|cattle-system namespaces for managed kubernetes clusters (EKS, AKS, GKE, managed Rancher).  Set this to true if you want collect these metrics.
    # managedK8sSystemMetricCollectionEnabled: 'false'

    # # Collect kubernetes.pod_volume (pod ephemeral storage) metrics.  Set this to true if you want to collect these metrics.
    # podVolumeMetricCollectionEnabled: 'false'

    # # Declare Rancher cluster as managed.  Set this to true if your Rancher cluster is managed as opposed to on-premise.
    # isManagedRancher: 'false'

    # # If telegraf-rs fails to start due to being unable to find the etcd crt and key, manually specify the appropriate path here.
    # rsHostEtcdCrt: ''
    # rsHostEtcdKey: ''

  # kube-state-metrics:
    # # kube-state-metrics CPU/Mem limits and requests.
    # cpuLimit: '500m'
    # memLimit: '1Gi'
    # cpuRequest: '100m'
    # memRequest: '500Mi'

    # # Comma-separated list of resources to enable.
    # # See resources in https://github.com/kubernetes/kube-state-metrics/blob/main/docs/cli-arguments.md
    # resources: 'cronjobs,daemonsets,deployments,ingresses,jobs,namespaces,nodes,persistentvolumeclaims,persistentvolumes,pods,replicasets,resourcequotas,services,statefulsets'

    # # Comma-separated list of metrics to enable.
    # # See metric-allowlist in https://github.com/kubernetes/kube-state-metrics/blob/main/docs/cli-arguments.md
    # metrics: 'kube_cronjob_created,kube_cronjob_status_active,kube_cronjob_labels,kube_daemonset_created,kube_daemonset_status_current_number_scheduled,kube_daemonset_status_desired_number_scheduled,kube_daemonset_status_number_available,kube_daemonset_status_number_misscheduled,kube_daemonset_status_number_ready,kube_daemonset_status_number_unavailable,kube_daemonset_status_observed_generation,kube_daemonset_status_updated_number_scheduled,kube_daemonset_metadata_generation,kube_daemonset_labels,kube_deployment_status_replicas,kube_deployment_status_replicas_available,kube_deployment_status_replicas_unavailable,kube_deployment_status_replicas_updated,kube_deployment_status_observed_generation,kube_deployment_spec_replicas,kube_deployment_spec_paused,kube_deployment_spec_strategy_rollingupdate_max_unavailable,kube_deployment_spec_strategy_rollingupdate_max_surge,kube_deployment_metadata_generation,kube_deployment_labels,kube_deployment_created,kube_job_created,kube_job_owner,kube_job_status_active,kube_job_status_succeeded,kube_job_status_failed,kube_job_labels,kube_job_status_start_time,kube_job_status_completion_time,kube_namespace_created,kube_namespace_labels,kube_namespace_status_phase,kube_node_info,kube_node_labels,kube_node_role,kube_node_spec_unschedulable,kube_node_created,kube_persistentvolume_capacity_bytes,kube_persistentvolume_status_phase,kube_persistentvolume_labels,kube_persistentvolume_info,kube_persistentvolume_claim_ref,kube_persistentvolumeclaim_access_mode,kube_persistentvolumeclaim_info,kube_persistentvolumeclaim_labels,kube_persistentvolumeclaim_resource_requests_storage_bytes,kube_persistentvolumeclaim_status_phase,kube_pod_info,kube_pod_start_time,kube_pod_completion_time,kube_pod_owner,kube_pod_labels,kube_pod_status_phase,kube_pod_status_ready,kube_pod_status_scheduled,kube_pod_container_info,kube_pod_container_status_waiting,kube_pod_container_status_waiting_reason,kube_pod_container_status_running,kube_pod_container_state_started,kube_pod_container_status_terminated,kube_pod_container_status_terminated_reason,kube_pod_container_status_last_terminated_reason,kube_pod_container_status_ready,kube_pod_container_status_restarts_total,kube_pod_overhead_cpu_cores,kube_pod_overhead_memory_bytes,kube_pod_created,kube_pod_deletion_timestamp,kube_pod_init_container_info,kube_pod_init_container_status_waiting,kube_pod_init_container_status_waiting_reason,kube_pod_init_container_status_running,kube_pod_init_container_status_terminated,kube_pod_init_container_status_terminated_reason,kube_pod_init_container_status_last_terminated_reason,kube_pod_init_container_status_ready,kube_pod_init_container_status_restarts_total,kube_pod_status_scheduled_time,kube_pod_status_unschedulable,kube_pod_spec_volumes_persistentvolumeclaims_readonly,kube_pod_container_resource_requests_cpu_cores,kube_pod_container_resource_requests_memory_bytes,kube_pod_container_resource_requests_storage_bytes,kube_pod_container_resource_requests_ephemeral_storage_bytes,kube_pod_container_resource_limits_cpu_cores,kube_pod_container_resource_limits_memory_bytes,kube_pod_container_resource_limits_storage_bytes,kube_pod_container_resource_limits_ephemeral_storage_bytes,kube_pod_init_container_resource_limits_cpu_cores,kube_pod_init_container_resource_limits_memory_bytes,kube_pod_init_container_resource_limits_storage_bytes,kube_pod_init_container_resource_limits_ephemeral_storage_bytes,kube_pod_init_container_resource_requests_cpu_cores,kube_pod_init_container_resource_requests_memory_bytes,kube_pod_init_container_resource_requests_storage_bytes,kube_pod_init_container_resource_requests_ephemeral_storage_bytes,kube_replicaset_status_replicas,kube_replicaset_status_ready_replicas,kube_replicaset_status_observed_generation,kube_replicaset_spec_replicas,kube_replicaset_metadata_generation,kube_replicaset_labels,kube_replicaset_created,kube_replicaset_owner,kube_resourcequota,kube_resourcequota_created,kube_service_info,kube_service_labels,kube_service_created,kube_service_spec_type,kube_statefulset_status_replicas,kube_statefulset_status_replicas_current,kube_statefulset_status_replicas_ready,kube_statefulset_status_replicas_updated,kube_statefulset_status_observed_generation,kube_statefulset_replicas,kube_statefulset_metadata_generation,kube_statefulset_created,kube_statefulset_labels,kube_statefulset_status_current_revision,kube_statefulset_status_update_revision,kube_node_status_capacity,kube_node_status_allocatable,kube_node_status_condition,kube_pod_container_resource_requests,kube_pod_container_resource_limits,kube_pod_init_container_resource_limits,kube_pod_init_container_resource_requests'

    # # Comma-separated list of Kubernetes label keys that will be used in the resources' labels metric.
    # # See metric-labels-allowlist in https://github.com/kubernetes/kube-state-metrics/blob/main/docs/cli-arguments.md
    # labels: 'cronjobs=[*],daemonsets=[*],deployments=[*],ingresses=[*],jobs=[*],namespaces=[*],nodes=[*],persistentvolumeclaims=[*],persistentvolumes=[*],pods=[*],replicasets=[*],resourcequotas=[*],services=[*],statefulsets=[*]'

    # # kube-state-metrics additional tolerations. Use the following abbreviated single line format only.
    # # No tolerations are applied by default
    # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}'
    # tolerations: ''

    # # kube-state-metrics shards.  Increase the number of shards for larger clusters if telegraf RS pod(s) experience collection timeouts
    # shards: '2'

  # # Settings for the Events Log feature.
  # logs:
    # # Set runPrivileged to true if Fluent Bit fails to start, trying to open/create its database.
    # runPrivileged: 'false'

    # # If Fluent Bit should read new files from the head, not tail.
    # # See Read_from_Head in https://docs.fluentbit.io/manual/pipeline/inputs/tail
    # readFromHead: "true"

    # # Network protocol that Fluent Bit should use for DNS: "UDP" or "TCP".
    # dnsMode: "UDP"

    # # DNS resolver that Fluent Bit should use: "LEGACY" or "ASYNC"
    # fluentBitDNSResolver: "LEGACY"

    # # Logs additional tolerations. Use the following abbreviated single line format only.
    # # Inspect fluent-bit-ds to view tolerations which are always present. No tolerations are applied by default for event-exporter.
    # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}'
    # fluent-bit-tolerations: ''
    # event-exporter-tolerations: ''

    # # event-exporter CPU/Mem limits and requests.
    # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    # event-exporter-cpuLimit: '500m'
    # event-exporter-memLimit: '1Gi'
    # event-exporter-cpuRequest: '50m'
    # event-exporter-memRequest: '100Mi'

    # # event-exporter max event age.
    # # See https://github.com/jkroepke/resmoio-kubernetes-event-exporter
    # event-exporter-maxEventAgeSeconds: '10'

    # # event-exporter client-side throttling
    # # Set kubeBurst to roughly match your events per minute and kubeQPS=kubeBurst/5
    # # See https://github.com/resmoio/kubernetes-event-exporter#troubleshoot-events-discarded-warning
    # event-exporter-kubeQPS: 20
    # event-exporter-kubeBurst: 100

    # # fluent-bit CPU/Mem limits and requests.
    # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    # fluent-bit-cpuLimit: '500m'
    # fluent-bit-memLimit: '1Gi'
    # fluent-bit-cpuRequest: '50m'
    # fluent-bit-memRequest: '100Mi'

  # # Settings for the Network Performance and Map feature.
  # workload-map:
    # # netapp-ci-net-observer-l4-ds CPU/Mem limits and requests.
    # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    # cpuLimit: '500m'
    # memLimit: '500Mi'
    # cpuRequest: '100m'
    # memRequest: '500Mi'

    # # Metric aggregation interval in seconds. Min=30, Max=120
    # metricAggregationInterval: '60'

    # # Interval for bpf polling. Min=3, Max=15
    # bpfPollInterval: '8'

    # # Enable performing reverse DNS lookups on observed IPs.
    # enableDNSLookup: 'true'

    # # netapp-ci-net-observer-l4-ds additional tolerations. Use the following abbreviated single line format only.
    # # Inspect netapp-ci-net-observer-l4-ds to view tolerations which are always present.
    # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}'
    # l4-tolerations: ''

    # # Set runPrivileged to true if SELinux is enabled on your Kubernetes nodes.
    # # Note: In OpenShift environments, this is set to true automatically.
    # runPrivileged: 'false'

  # change-management:
    # # change-observer-watch-rs CPU/Mem limits and requests.
    # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
    # cpuLimit: '1'
    # memLimit: '1Gi'
    # cpuRequest: '500m'
    # memRequest: '500Mi'

    # # Interval in minutes after which a non-successful deployment of a workload will be marked as failed
    # failureDeclarationIntervalMins: '30'

    # # Frequency at which workload deployment in-progress events are sent
    # deployAggrIntervalSeconds: '300'

    # # Frequency at which non-workload deployments are combined and sent
    # nonWorkloadAggrIntervalSeconds: '15'

    # # A set of regular expressions used in env names and data maps whose value will be redacted
    # termsToRedact: '"pwd", "password", "token", "apikey", "api-key", "api_key", "jwt", "accesskey", "access_key", "access-key", "ca-file", "key-file", "cert", "cafile", "keyfile", "tls", "crt", "salt", ".dockerconfigjson", "auth", "secret"'

    # # A comma separated list of additional kinds to watch from the default set of kinds watched by the collector
    # # Each kind will have to be prefixed by its apigroup
    # # Example: '"authorization.k8s.io.subjectaccessreviews"'
    # additionalKindsToWatch: ''

    # # A comma separated list of additional field paths whose diff is ignored as part of change analytics. This list in addition to the default set of field paths ignored by the collector.
    # # Example: '"metadata.specTime", "data.status"'
    # additionalFieldsDiffToIgnore: ''

    # # A comma separated list of kinds to ignore from watching from the default set of kinds watched by the collector
    # # Each kind will have to be prefixed by its apigroup
    # # Example: '"networking.k8s.io.networkpolicies,batch.jobs", "authorization.k8s.io.subjectaccessreviews"'
    # kindsToIgnoreFromWatch: ''

    # # Frequency with which log records are sent to CI from the collector
    # logRecordAggrIntervalSeconds: '20'

    # # change-observer-watch-ds additional tolerations. Use the following abbreviated single line format only.
    # # Inspect change-observer-watch-ds to view tolerations which are always present.
    # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}'
    # watch-tolerations: ''