Kubernetes監視オペレータの設定オプション
"Kubernetes監視オペレータ"構成はカスタマイズできます。
次の表に、_AgentConfiguration_ファイルに使用できるオプションを示します。
コンポーネント | オプション | 製品説明 |
---|---|---|
エージェント |
オペレータがインストールできるすべてのコンポーネントに共通の設定オプション。これらは「グローバル」オプションと見なすことができます。 |
|
dockerrepo |
Data Infrastructure Insights Dockerリポジトリと比較して、お客様のプライベートDockerリポジトリからイメージを取得するためのdockerRepoオーバーライド。デフォルトはData Infrastructure Insights Dockerリポジトリ |
|
dockerImagePullSecretの略 |
オプション:顧客のプライベートリポジトリのシークレット |
|
クラスタ名 |
すべてのお客様のクラスタ間でクラスタを一意に識別するフリーテキストフィールド。これは、Data Infrastructure Insightsのテナント全体で一意である必要があります。デフォルトでは、UIで[Cluster Name]フィールドに入力します |
|
プロキシ形式:プロキシ:サーバー:ポート:ユーザー名:パスワード: noProxy:isTelegrafProxyEnabled:isAuProxyEnabled:isFluentbitProxyEnabled:isCollectorProxyEnabled: |
プロキシを設定する場合はオプションです。これは通常、顧客の法人代理店です。 |
|
テレグラフ |
オペレータのTelegrafインストールをカスタマイズできる設定オプション |
|
collectionInterval |
指標収集間隔(秒)(最大=60秒) |
|
dsCpuLimit |
Telegraf DSのCPU制限 |
|
dsMemLimit |
Telegraf DSのメモリ制限 |
|
dsCpuRequest |
Telegraf DSのCPU要求 |
|
dsMemRequest |
Telegraf DSのメモリ要求 |
|
rsCpuLimit |
Telegraf RSのCPU制限 |
|
rsMemLimit |
Telegraf RSのメモリ制限 |
|
rsCpuRequest |
Telegraf RSのCPU要求 |
|
rsMemRequest |
テレグラフRSのメモリ要求 |
|
runPrivileged |
特権モードでtelegrafデーモンセットの_telegraf-mountstats-poller_containerを実行します。KubernetesノードでSELinuxが有効になっている場合は、このパラメータをtrueに設定します。 |
|
runDsPrivileged |
特権モードでtelegraf DaemonSetのtelegrafコンテナを実行するには、runDsPrivilegedをtrueに設定します。 |
|
バッチサイズ |
を参照し "Telegraf設定ドキュメント" |
|
BufferLimit |
を参照し "Telegraf設定ドキュメント" |
|
RoundIntervalの略 |
を参照し "Telegraf設定ドキュメント" |
|
collectionJitter |
を参照し "Telegraf設定ドキュメント" |
|
精度 |
を参照し "Telegraf設定ドキュメント" |
|
flushInterval(フラッシュ間隔) |
を参照し "Telegraf設定ドキュメント" |
|
FlushJitter(フラッシュジッタ |
を参照し "Telegraf設定ドキュメント" |
|
outputTimeout |
を参照し "Telegraf設定ドキュメント" |
|
dsTolerations |
Telegraf-DS追加の許容値。 |
|
rsTolerations |
Telegraf-RS追加許容値。 |
|
skipProcessorsAfterAggregators |
を参照し "Telegraf設定ドキュメント" |
|
保護なし |
これを見てください"既知のTelegraf問題"。setting_unprotected_は、Kubernetes Monitoring Operatorにフラグを指定してTelegrafを実行するように指示し `--unprotected`ます。 |
|
kube-state-metricsの略 |
Operatorのkube状態メトリックのインストールをカスタマイズできる設定オプション |
|
cpuLimit |
kube-state-metricsデプロイメントのCPU制限 |
|
memLimit |
kube-state-metrics展開のメモリ制限 |
|
cpuRequest |
kube state metrics deploymentのCPU要求 |
|
MemRequestの略 |
KUBE状態メトリクス展開のためのMEM要求 |
|
リソース |
キャプチャするリソースのカンマ区切りリスト。例:cronjobs、daemonsets、deployment、ingresses、jobs、namespace、nodes、persistentvolumes、pods、ReplicaSets、resourcequotas、services、statefulsets |
|
許容範囲 |
kube-state-metrics追加の許容値。 |
|
ラベル |
kube-state-metricsでキャプチャするリソースのカンマ区切りリスト例:cronjobs=[]、daemonsets=[]、deployments=[]、ingresses=[]、jobs=[]、namespaces=[]、nodes=[]、persistentvolumes=[]、pods=[]、replicaresets=[]、[]、[]、[*] |
|
ログ |
オペレータのログ収集とインストールをカスタマイズできる設定オプション |
|
readFromHead |
true / false。fluentビットがheadからログを読み取る必要があります |
|
タイムアウト |
タイムアウト(秒) |
|
DNSMode(DNSMode) |
TCP / UDP、DNSのモード |
|
Fluent-bit-tolerationsの略 |
FLUENT-BIT-DSの追加許容値。 |
|
event-exporter-tolerationsの略 |
イベントエクスポータの追加許容値。 |
|
event-exporter-maxEventAgeSeconds |
イベントエクスポータの最大イベント経過時間。を参照し https://github.com/jkroepke/resmoio-kubernetes-event-exporter |
|
ワークロードマップ |
作業負荷マップの収集とオペレータのインストールをカスタマイズできる設定オプション。 |
|
cpuLimit |
ネットオブザーバーDSのCPU制限 |
|
memLimit |
ネットオブザーバDSのメモリ制限 |
|
cpuRequest |
ネットオブザーバーDSのCPU要求 |
|
MemRequestの略 |
ネットオブザーバーDSのMEM要求 |
|
metricAggregationInterval |
メトリック集約間隔(秒単位) |
|
bpfPollIntervalの略 |
BPFポーリング間隔(秒単位) |
|
enableDNSLookup |
trueまたはfalse、DNSルックアップを有効にします |
|
L4 -公差 |
NET-OBSERVER-L4-DS追加許容値。 |
|
runPrivileged |
true/false - KubernetesノードでSELinuxが有効になっている場合は、runPrivilegedをtrueに設定します。 |
|
変更管理 |
Kubernetes Change Management and Analysisの構成オプション |
|
cpuLimit |
change-observer-watch-rsのCPU制限 |
|
memLimit |
change-observer-watch-rsのメモリ制限 |
|
cpuRequest |
change-observer-watch-rsのCPU要求 |
|
MemRequestの略 |
change-observer-watch-rsのMEM要求 |
|
failureDeclarationIntervalMins |
ワークロードの導入に失敗した場合に障害が発生したとマークされる間隔(分) |
|
deployAggrIntervalSeconds |
ワークロード導入を実行中のイベントが送信される頻度 |
|
nonWorkloadAggrIntervalSeconds |
ワークロード以外の導入環境を組み合わせて送信する頻度 |
|
termsToRedact |
値が編集される環境名およびデータマップで使用される一連の正規表現例:「pwd」、「password」、「token」、「apiKey」、「api-key」、「jwt」 |
|
AdditionalKindsToWatch |
コレクターが監視するデフォルトの種類のセットから、監視する追加の種類のコンマ区切りリスト |
|
kindsToIgnoreFromWatch |
コレクタが監視するデフォルトの種類のセットから、監視対象から無視する種類のコンマ区切りのリスト |
|
logRecordAggrIntervalSeconds |
コレクタからCIにログレコードを送信する頻度 |
|
ウォッチトレランス |
change-observer-watch-ds追加の許容値。省略された単一行形式のみ。例:「{key:taint1、operator:exists、effect:NoSchedule}、{key:taint2、operator:exists、effect:NoExecute}」 |
サンプルのAgentConfigurationファイル
以下は、Sample_AgentConfiguration_ファイルです。
apiVersion: monitoring.netapp.com/v1alpha1 kind: AgentConfiguration metadata: name: netapp-ci-monitoring-configuration namespace: "netapp-monitoring" labels: installed-by: nkmo-netapp-monitoring spec: # # You can modify the following fields to configure the operator. # # Optional settings are commented out and include default values for reference # # To update them, uncomment the line, change the value, and apply the updated AgentConfiguration. agent: # # [Required Field] A uniquely identifiable user-friendly clustername. # # clusterName must be unique across all clusters in your Data Infrastructure Insights environment. clusterName: "my_cluster" # # Proxy settings. The proxy that the operator should use to send metrics to Data Infrastructure Insights. # # Please see documentation here: https://docs.netapp.com/us-en/cloudinsights/task_config_telegraf_agent_k8s.html#configuring-proxy-support # proxy: # server: # port: # noproxy: # username: # password: # isTelegrafProxyEnabled: # isFluentbitProxyEnabled: # isCollectorsProxyEnabled: # # [Required Field] By default, the operator uses the CI repository. # # To use a private repository, change this field to your repository name. # # Please see documentation here: https://docs.netapp.com/us-en/cloudinsights/task_config_telegraf_agent_k8s.html#using-a-custom-or-private-docker-repository dockerRepo: 'docker.c01.cloudinsights.netapp.com' # # [Required Field] The name of the imagePullSecret for dockerRepo. # # If you are using a private repository, change this field from 'netapp-ci-docker' to the name of your secret. dockerImagePullSecret: 'netapp-ci-docker' # # Allow the operator to automatically rotate its ApiKey before expiration. # tokenRotationEnabled: 'true' # # Number of days before expiration that the ApiKey should be rotated. This must be less than the total ApiKey duration. # tokenRotationThresholdDays: '30' telegraf: # # Settings to fine-tune metrics data collection. Telegraf config names are included in parenthesis. # # See https://github.com/influxdata/telegraf/blob/master/docs/CONFIGURATION.md#agent # # The default time telegraf will wait between inputs for all plugins (interval). Max=60 # collectionInterval: '60s' # # Maximum number of records per output that telegraf will write in one batch (metric_batch_size). # batchSize: '10000' # # Maximum number of records per output that telegraf will cache pending a successful write (metric_buffer_limit). # bufferLimit: '150000' # # Collect metrics on multiples of interval (round_interval). # roundInterval: 'true' # # Each plugin waits a random amount of time between the scheduled collection time and that time + collection_jitter before collecting inputs (collection_jitter). # collectionJitter: '0s' # # Collected metrics are rounded to the precision specified. When set to "0s" precision will be set by the units specified by interval (precision). # precision: '0s' # # Time telegraf will wait between writing outputs (flush_interval). Max=collectionInterval # flushInterval: '60s' # # Each output waits a random amount of time between the scheduled write time and that time + flush_jitter before writing outputs (flush_jitter). # flushJitter: '0s' # # Timeout for writing to outputs (timeout). # outputTimeout: '5s' # # telegraf-ds CPU/Mem limits and requests. # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ # dsCpuLimit: '750m' # dsMemLimit: '800Mi' # dsCpuRequest: '100m' # dsMemRequest: '500Mi' # # telegraf-rs CPU/Mem limits and requests. # rsCpuLimit: '3' # rsMemLimit: '4Gi' # rsCpuRequest: '100m' # rsMemRequest: '500Mi' # # Skip second run of processors after aggregators # skipProcessorsAfterAggregators: 'true' # # telegraf additional tolerations. Use the following abbreviated single line format only. # # Inspect telegraf-rs/-ds to view tolerations which are always present. # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}' # dsTolerations: '' # rsTolerations: '' # If telegraf warns of insufficient lockable memory, try increasing the limit of lockable memory for Telegraf in the underlying operating system/node. If increasing the limit is not an option, set this to true to instruct Telegraf to not attempt to reserve locked memory pages. While this might pose a security risk as decrypted secrets might be swapped out to disk, it allows for execution in environments where reserving locked memory is not possible. # unprotected: 'false' # # Run the telegraf DaemonSet's telegraf-mountstats-poller container in privileged mode. Set runPrivileged to true if SELinux is enabled on your Kubernetes nodes. # runPrivileged: '{{ .Values.telegraf_installer.kubernetes.privileged_mode }}' # # Set runDsPrivileged to true to run the telegraf DaemonSet's telegraf container in privileged mode # runDsPrivileged: '{{ .Values.telegraf_installer.kubernetes.ds.privileged_mode }}' # # Collect container Block IO metrics. # dsBlockIOEnabled: 'true' # # Collect NFS IO metrics. # dsNfsIOEnabled: 'true' # # Collect kubernetes.system_container metrics and objects in the kube-system|cattle-system namespaces for managed kubernetes clusters (EKS, AKS, GKE, managed Rancher). Set this to true if you want collect these metrics. # managedK8sSystemMetricCollectionEnabled: 'false' # # Collect kubernetes.pod_volume (pod ephemeral storage) metrics. Set this to true if you want to collect these metrics. # podVolumeMetricCollectionEnabled: 'false' # # Declare Rancher cluster as managed. Set this to true if your Rancher cluster is managed as opposed to on-premise. # isManagedRancher: 'false' # # If telegraf-rs fails to start due to being unable to find the etcd crt and key, manually specify the appropriate path here. # rsHostEtcdCrt: '' # rsHostEtcdKey: '' # kube-state-metrics: # # kube-state-metrics CPU/Mem limits and requests. # cpuLimit: '500m' # memLimit: '1Gi' # cpuRequest: '100m' # memRequest: '500Mi' # # Comma-separated list of resources to enable. # # See resources in https://github.com/kubernetes/kube-state-metrics/blob/main/docs/cli-arguments.md # resources: 'cronjobs,daemonsets,deployments,ingresses,jobs,namespaces,nodes,persistentvolumeclaims,persistentvolumes,pods,replicasets,resourcequotas,services,statefulsets' # # Comma-separated list of metrics to enable. # # See metric-allowlist in https://github.com/kubernetes/kube-state-metrics/blob/main/docs/cli-arguments.md # metrics: 'kube_cronjob_created,kube_cronjob_status_active,kube_cronjob_labels,kube_daemonset_created,kube_daemonset_status_current_number_scheduled,kube_daemonset_status_desired_number_scheduled,kube_daemonset_status_number_available,kube_daemonset_status_number_misscheduled,kube_daemonset_status_number_ready,kube_daemonset_status_number_unavailable,kube_daemonset_status_observed_generation,kube_daemonset_status_updated_number_scheduled,kube_daemonset_metadata_generation,kube_daemonset_labels,kube_deployment_status_replicas,kube_deployment_status_replicas_available,kube_deployment_status_replicas_unavailable,kube_deployment_status_replicas_updated,kube_deployment_status_observed_generation,kube_deployment_spec_replicas,kube_deployment_spec_paused,kube_deployment_spec_strategy_rollingupdate_max_unavailable,kube_deployment_spec_strategy_rollingupdate_max_surge,kube_deployment_metadata_generation,kube_deployment_labels,kube_deployment_created,kube_job_created,kube_job_owner,kube_job_status_active,kube_job_status_succeeded,kube_job_status_failed,kube_job_labels,kube_job_status_start_time,kube_job_status_completion_time,kube_namespace_created,kube_namespace_labels,kube_namespace_status_phase,kube_node_info,kube_node_labels,kube_node_role,kube_node_spec_unschedulable,kube_node_created,kube_persistentvolume_capacity_bytes,kube_persistentvolume_status_phase,kube_persistentvolume_labels,kube_persistentvolume_info,kube_persistentvolume_claim_ref,kube_persistentvolumeclaim_access_mode,kube_persistentvolumeclaim_info,kube_persistentvolumeclaim_labels,kube_persistentvolumeclaim_resource_requests_storage_bytes,kube_persistentvolumeclaim_status_phase,kube_pod_info,kube_pod_start_time,kube_pod_completion_time,kube_pod_owner,kube_pod_labels,kube_pod_status_phase,kube_pod_status_ready,kube_pod_status_scheduled,kube_pod_container_info,kube_pod_container_status_waiting,kube_pod_container_status_waiting_reason,kube_pod_container_status_running,kube_pod_container_state_started,kube_pod_container_status_terminated,kube_pod_container_status_terminated_reason,kube_pod_container_status_last_terminated_reason,kube_pod_container_status_ready,kube_pod_container_status_restarts_total,kube_pod_overhead_cpu_cores,kube_pod_overhead_memory_bytes,kube_pod_created,kube_pod_deletion_timestamp,kube_pod_init_container_info,kube_pod_init_container_status_waiting,kube_pod_init_container_status_waiting_reason,kube_pod_init_container_status_running,kube_pod_init_container_status_terminated,kube_pod_init_container_status_terminated_reason,kube_pod_init_container_status_last_terminated_reason,kube_pod_init_container_status_ready,kube_pod_init_container_status_restarts_total,kube_pod_status_scheduled_time,kube_pod_status_unschedulable,kube_pod_spec_volumes_persistentvolumeclaims_readonly,kube_pod_container_resource_requests_cpu_cores,kube_pod_container_resource_requests_memory_bytes,kube_pod_container_resource_requests_storage_bytes,kube_pod_container_resource_requests_ephemeral_storage_bytes,kube_pod_container_resource_limits_cpu_cores,kube_pod_container_resource_limits_memory_bytes,kube_pod_container_resource_limits_storage_bytes,kube_pod_container_resource_limits_ephemeral_storage_bytes,kube_pod_init_container_resource_limits_cpu_cores,kube_pod_init_container_resource_limits_memory_bytes,kube_pod_init_container_resource_limits_storage_bytes,kube_pod_init_container_resource_limits_ephemeral_storage_bytes,kube_pod_init_container_resource_requests_cpu_cores,kube_pod_init_container_resource_requests_memory_bytes,kube_pod_init_container_resource_requests_storage_bytes,kube_pod_init_container_resource_requests_ephemeral_storage_bytes,kube_replicaset_status_replicas,kube_replicaset_status_ready_replicas,kube_replicaset_status_observed_generation,kube_replicaset_spec_replicas,kube_replicaset_metadata_generation,kube_replicaset_labels,kube_replicaset_created,kube_replicaset_owner,kube_resourcequota,kube_resourcequota_created,kube_service_info,kube_service_labels,kube_service_created,kube_service_spec_type,kube_statefulset_status_replicas,kube_statefulset_status_replicas_current,kube_statefulset_status_replicas_ready,kube_statefulset_status_replicas_updated,kube_statefulset_status_observed_generation,kube_statefulset_replicas,kube_statefulset_metadata_generation,kube_statefulset_created,kube_statefulset_labels,kube_statefulset_status_current_revision,kube_statefulset_status_update_revision,kube_node_status_capacity,kube_node_status_allocatable,kube_node_status_condition,kube_pod_container_resource_requests,kube_pod_container_resource_limits,kube_pod_init_container_resource_limits,kube_pod_init_container_resource_requests' # # Comma-separated list of Kubernetes label keys that will be used in the resources' labels metric. # # See metric-labels-allowlist in https://github.com/kubernetes/kube-state-metrics/blob/main/docs/cli-arguments.md # labels: 'cronjobs=[*],daemonsets=[*],deployments=[*],ingresses=[*],jobs=[*],namespaces=[*],nodes=[*],persistentvolumeclaims=[*],persistentvolumes=[*],pods=[*],replicasets=[*],resourcequotas=[*],services=[*],statefulsets=[*]' # # kube-state-metrics additional tolerations. Use the following abbreviated single line format only. # # No tolerations are applied by default # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}' # tolerations: '' # # kube-state-metrics shards. Increase the number of shards for larger clusters if telegraf RS pod(s) experience collection timeouts # shards: '2' # # Settings for the Events Log feature. # logs: # # Set runPrivileged to true if Fluent Bit fails to start, trying to open/create its database. # runPrivileged: 'false' # # If Fluent Bit should read new files from the head, not tail. # # See Read_from_Head in https://docs.fluentbit.io/manual/pipeline/inputs/tail # readFromHead: "true" # # Network protocol that Fluent Bit should use for DNS: "UDP" or "TCP". # dnsMode: "UDP" # # DNS resolver that Fluent Bit should use: "LEGACY" or "ASYNC" # fluentBitDNSResolver: "LEGACY" # # Logs additional tolerations. Use the following abbreviated single line format only. # # Inspect fluent-bit-ds to view tolerations which are always present. No tolerations are applied by default for event-exporter. # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}' # fluent-bit-tolerations: '' # event-exporter-tolerations: '' # # event-exporter CPU/Mem limits and requests. # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ # event-exporter-cpuLimit: '500m' # event-exporter-memLimit: '1Gi' # event-exporter-cpuRequest: '50m' # event-exporter-memRequest: '100Mi' # # event-exporter max event age. # # See https://github.com/jkroepke/resmoio-kubernetes-event-exporter # event-exporter-maxEventAgeSeconds: '10' # # event-exporter client-side throttling # # Set kubeBurst to roughly match your events per minute and kubeQPS=kubeBurst/5 # # See https://github.com/resmoio/kubernetes-event-exporter#troubleshoot-events-discarded-warning # event-exporter-kubeQPS: 20 # event-exporter-kubeBurst: 100 # # fluent-bit CPU/Mem limits and requests. # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ # fluent-bit-cpuLimit: '500m' # fluent-bit-memLimit: '1Gi' # fluent-bit-cpuRequest: '50m' # fluent-bit-memRequest: '100Mi' # # Settings for the Network Performance and Map feature. # workload-map: # # netapp-ci-net-observer-l4-ds CPU/Mem limits and requests. # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ # cpuLimit: '500m' # memLimit: '500Mi' # cpuRequest: '100m' # memRequest: '500Mi' # # Metric aggregation interval in seconds. Min=30, Max=120 # metricAggregationInterval: '60' # # Interval for bpf polling. Min=3, Max=15 # bpfPollInterval: '8' # # Enable performing reverse DNS lookups on observed IPs. # enableDNSLookup: 'true' # # netapp-ci-net-observer-l4-ds additional tolerations. Use the following abbreviated single line format only. # # Inspect netapp-ci-net-observer-l4-ds to view tolerations which are always present. # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}' # l4-tolerations: '' # # Set runPrivileged to true if SELinux is enabled on your Kubernetes nodes. # # Note: In OpenShift environments, this is set to true automatically. # runPrivileged: 'false' # change-management: # # change-observer-watch-rs CPU/Mem limits and requests. # # See https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ # cpuLimit: '1' # memLimit: '1Gi' # cpuRequest: '500m' # memRequest: '500Mi' # # Interval in minutes after which a non-successful deployment of a workload will be marked as failed # failureDeclarationIntervalMins: '30' # # Frequency at which workload deployment in-progress events are sent # deployAggrIntervalSeconds: '300' # # Frequency at which non-workload deployments are combined and sent # nonWorkloadAggrIntervalSeconds: '15' # # A set of regular expressions used in env names and data maps whose value will be redacted # termsToRedact: '"pwd", "password", "token", "apikey", "api-key", "api_key", "jwt", "accesskey", "access_key", "access-key", "ca-file", "key-file", "cert", "cafile", "keyfile", "tls", "crt", "salt", ".dockerconfigjson", "auth", "secret"' # # A comma separated list of additional kinds to watch from the default set of kinds watched by the collector # # Each kind will have to be prefixed by its apigroup # # Example: '"authorization.k8s.io.subjectaccessreviews"' # additionalKindsToWatch: '' # # A comma separated list of additional field paths whose diff is ignored as part of change analytics. This list in addition to the default set of field paths ignored by the collector. # # Example: '"metadata.specTime", "data.status"' # additionalFieldsDiffToIgnore: '' # # A comma separated list of kinds to ignore from watching from the default set of kinds watched by the collector # # Each kind will have to be prefixed by its apigroup # # Example: '"networking.k8s.io.networkpolicies,batch.jobs", "authorization.k8s.io.subjectaccessreviews"' # kindsToIgnoreFromWatch: '' # # Frequency with which log records are sent to CI from the collector # logRecordAggrIntervalSeconds: '20' # # change-observer-watch-ds additional tolerations. Use the following abbreviated single line format only. # # Inspect change-observer-watch-ds to view tolerations which are always present. # # Example: '{key: taint1, operator: Exists, effect: NoSchedule},{key: taint2, operator: Exists, effect: NoExecute}' # watch-tolerations: ''