监控Keystone Collector 的健康状况
您可以使用任何支持 HTTP 请求的监控系统来监控Keystone Collector 的健康状况。监测健康状况有助于确保Keystone仪表板上的数据可用。
默认情况下, Keystone健康服务不接受来自 localhost 以外的任何 IP 的连接。 Keystone健康端点是 /uber/health
,并在端口上监听Keystone Collector 服务器的所有接口 7777
。查询时,端点将返回一个带有 JSON 输出的 HTTP 请求状态代码作为响应,描述Keystone Collector 系统的状态。 JSON 主体提供了 `is_healthy`属性,它是一个布尔值;以及每个组件的详细状态列表 `component_details`属性。以下是一个例子:
$ curl http://127.0.0.1:7777/uber/health {"is_healthy": true, "component_details": {"vicmet": "Running", "ks-collector": "Running", "ks-billing": "Running", "chronyd": "Running"}}
返回以下状态代码:
-
200:表示所有被监控的组件都是健康的
-
503:表示一个或多个组件不健康
-
403:表示查询健康状态的 HTTP 客户端不在_允许_列表中,该列表是允许的网络 CIDR 列表。对于此状态,不会返回任何健康信息。 allow 列表使用网络 CIDR 方法来控制哪些网络设备被允许查询Keystone健康系统。如果您收到此错误,请从 * Keystone Collector 管理 TUI > 配置 > 健康监控* 将您的监控系统添加到_允许_列表中。
|
Linux 用户请注意此已知问题:
问题描述: Keystone Collector 作为使用计量系统的一部分运行许多容器。当使用美国国防信息系统局 (DISA) 安全技术实施指南 (STIG) 策略强化 Red Hat Enterprise Linux 8.x 服务器时,会间歇性地出现 fapolicyd (文件访问策略守护进程) 的已知问题。该问题被认定为"错误 1907870"。 解决方法:在 Red Hat Enterprise 解决之前, NetApp建议您通过以下方式解决此问题: fapolicyd`进入宽容模式。在/`etc/fapolicyd/fapolicyd.conf ,设置值 permissive = 1 。
|
查看系统日志
您可以查看Keystone Collector 系统日志来查看系统信息并使用这些日志执行故障排除。 Keystone Collector 使用主机的 journald 日志系统,并且可以通过标准 journalctl 系统实用程序查看系统日志。您可以使用以下关键服务来检查日志:
-
ks-收集器
-
ks-健康
-
ks-自动更新
主要数据收集服务 ks-collector 生成 JSON 格式的日志,其中包含 `run-id`与每个计划数据收集作业相关的属性。以下是标准使用数据收集成功作业的示例:
{"level":"info","time":"2022-10-31T05:20:01.831Z","caller":"light-collector/main.go:31","msg":"initialising light collector with run-id cdflm0f74cgphgfon8cg","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:04.624Z","caller":"ontap/service.go:215","msg":"223 volumes collected for cluster a2049dd4-bfcf-11ec-8500-00505695ce60","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:18.821Z","caller":"ontap/service.go:215","msg":"697 volumes collected for cluster 909cbacc-bfcf-11ec-8500-00505695ce60","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:41.598Z","caller":"ontap/service.go:215","msg":"7 volumes collected for cluster f7b9a30c-55dc-11ed-9c88-005056b3d66f","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:48.247Z","caller":"ontap/service.go:215","msg":"24 volumes collected for cluster a9e2dcff-ab21-11ec-8428-00a098ad3ba2","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:48.786Z","caller":"worker/collector.go:75","msg":"4 clusters collected","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:48.839Z","caller":"reception/reception.go:75","msg":"Sending file 65a71542-cb4d-bdb2-e9a7-a826be4fdcb7_1667193648.tar.gz type=ontap to reception","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:48.840Z","caller":"reception/reception.go:76","msg":"File bytes 123425","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"reception/reception.go:99","msg":"uploaded usage file to reception with status 201 Created","run-id":"cdflm0f74cgphgfon8cg"}
以下是可选性能数据收集成功作业的示例:
{"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"sql/service.go:28","msg":"initialising MySql service at 10.128.114.214"} {"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"sql/service.go:55","msg":"Opening MySql db connection at server 10.128.114.214"} {"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"sql/service.go:39","msg":"Creating MySql db config object"} {"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"sla_reporting/service.go:69","msg":"initialising SLA service"} {"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"sla_reporting/service.go:71","msg":"SLA service successfully initialised"} {"level":"info","time":"2022-10-31T05:20:51.324Z","caller":"worker/collector.go:217","msg":"Performance data would be collected for timerange: 2022-10-31T10:24:52~2022-10-31T10:29:52"} {"level":"info","time":"2022-10-31T05:21:31.385Z","caller":"worker/collector.go:244","msg":"New file generated: 65a71542-cb4d-bdb2-e9a7-a826be4fdcb7_1667193651.tar.gz"} {"level":"info","time":"2022-10-31T05:21:31.385Z","caller":"reception/reception.go:75","msg":"Sending file 65a71542-cb4d-bdb2-e9a7-a826be4fdcb7_1667193651.tar.gz type=ontap-perf to reception","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:21:31.386Z","caller":"reception/reception.go:76","msg":"File bytes 17767","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:21:33.025Z","caller":"reception/reception.go:99","msg":"uploaded usage file to reception with status 201 Created","run-id":"cdflm0f74cgphgfon8cg"} {"level":"info","time":"2022-10-31T05:21:33.025Z","caller":"light-collector/main.go:88","msg":"exiting","run-id":"cdflm0f74cgphgfon8cg"}
生成并收集支持包
Keystone Collector TUI 使您能够生成支持包并将其添加到服务请求中以解决支持问题。请遵循以下步骤:
-
启动Keystone Collector 管理 TUI 实用程序:
$ keystone-collector-tui
-
转到*故障排除>生成支持包*。
-
生成后,会显示该包的保存位置。使用 FTP、SFTP 或 SCP 连接到该位置并将日志文件下载到本地系统。
-
下载文件后,您可以将其附加到Keystone ServiceNow 支持票证。有关提出票证的信息,请参阅"生成服务请求"。