Contributors netapp-alavoie

Cloud Insights uses this data collector to gather metrics from Kafka.

Installation

  1. From Admin > Data Collectors, click +Data Collector. Under Services, choose Kafka.

    Select the Operating System or Platform on which the Telegraf agent is installed.

  2. If you haven’t already installed an Agent for collection, or you wish to install an Agent for a different Operating System or Platform, click Show Instructions to expand the Agent installation instructions.

  3. Select the Agent Access Key for use with this data collector. You can add a new Agent Access Key by clicking the + Agent Access Key button. Best practice: Use a different Agent Access Key only when you want to group data collectors, for example, by OS/Platform.

  4. Follow the configuration steps to configure the data collector. The instructions vary depending on the type of Operating System or Platform you are using to collect data.

Kafka configuration

Setup

The Kafka plugin is based on the telegraf’s Jolokia plugin. As such as a requirement to gather info from all Kafka brokers, JMX needs to be configured and exposed via Jolokia on all components.

Compatibility

Configuration was developed against Kafka version 0.11.0.2.

Setting up

All the instructions below assume your install location for kafka is '/opt/kafka'. You can adapt instructions below to reflect your install location.

Jolokia Agent Jar

A version the Jolokia agent jar file must be downloaded. The version tested against was Jolokia agent 1.6.0.

Instructions below assume that the downloaded jar file (jolokia-jvm-1.6.0-agent.jar) is placed under the location '/opt/kafka/libs/'.

Kafka Brokers

To configure Kafka Brokers to expose the Jolokia API, you can add the following in <KAFKA_HOME>/bin/kafka-server-start.sh, just before the 'kafka-run-class.sh' call:

export JMX_PORT=9999
export RMI_HOSTNAME=`hostname -I`
export KAFKA_JMX_OPTS="-javaagent:/opt/kafka/libs/jolokia-jvm-1.6.0-agent.jar=port=8778,host=0.0.0.0  -Dcom.sun.management.jmxremote.password.file=/opt/kafka/config/jmxremote.password -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=$RMI_HOSTNAME -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"

Note that example above is using 'hostname -I' to setup the 'RMI_HOSTNAME' environment variable. In multiple IP machines, this will need to be tweaked to gather the IP you care about for RMI connections.

You can choose a different port for JMX (9999 above) and Jolokia (8778). If you have an internal IP to lock Jolokia onto you can replace the "catch all" 0.0.0.0 by your own IP. Notice this IP needs to be accessible from the telegraf plugin. You can use the option '-Dcom.sun.management.jmxremote.authenticate=false' if you don’t want to authenticate. Use at your own risk.

Objects and Counters

The following objects and their counters are collected:

Object: Identifiers: Attributes: Datapoints:

Kafka Broker

Cluster
Namespace
Broker

Node Name
Node IP

Replica Manager Fetcher Max Lag
Zookeeper Client Connections
Zookeeper Client Connections (15m rate)
Zookeeper Client Connections (5m rate)
Zookeeper Client Connections (mean rate)
Zookeeper Client Connections (1m rate)
Replica Manager Partition Count
Thread Count Daemon
Thread Count Peak
Thread Count Current
Thread Count Total Started
Offline Partitions
Produce Requests Total Time (50th Percentile)
Produce Requests Total Time (75th Percentile)
Produce Requests Total Time (95th Percentile)
Produce Requests Total Time (98 Percentile)
Produce Requests Total Time (999th Percentile)
Produce Requests Total Time (99th Percentile)
Produce Requests Total Time
Produce Requests Total Time Max
Produce Requests Total Time Mean
Produce Requests Total Time Min
Produce Requests Total Time Stddev
Replica Manager ISR Shrinks
Replica Manager ISR Shrinks (15m rate)
Replica Manager ISR Shrinks (5m rate)
Replica Manager ISR Shrinks (mean rate)
Replica Manager ISR Shrinks (1m rate)
Request Handler Avg Idle
Request Handler Avg Idle (15m rate)
Request Handler Avg Idle (5m rate)
Request Handler Avg Idle (mean rate)
Request Handler Avg Idle (1m rate)
Garbage Collection G1 Old Generation Count
Garbage Collection G1 Old Generation Time
Garbage Collection G1 Young Generation Count
Garbage Collection G1 Young Generation Time
Zookeeper Read Only Connects
Zookeeper Read Only Connects (15m rate)
Zookeeper Read Only Connects (5m rate)
Zookeeper Read Only Connects (mean rate)
Zookeeper Read Only Connects (1m rate)
Network Processor Avg Idle
Requests Fetch Follower Total Time (50th percentile)
Requests Fetch Follower Total Time (75th percentile)
Requests Fetch Follower Total Time (95th percentile)
Requests Fetch Follower Total Time (98th percentile)
Requests Fetch Follower Total Time (999th percentile)
Requests Fetch Follower Total Time (99th percentile)
Requests Fetch Follower Total Time
Requests Fetch Follower Total Time Max
Requests Fetch Follower Total Time Mean
Requests Fetch Follower Total Time Min
Requests Fetch Follower Total Time Stddev
Requests Waiting in Produce Purgatory
Network Requests Fetch Consumer
Network Requests Fetch Consumer (5m rate)
Network Requests Fetch Consumer (15m rate)
Network Requests Fetch Consumer (mean rate)
Network Requests Fetch Consumer (1m rate)
Unclean Leader Elections
Unclean Leader Elections (15m rate)
Unclean Leader Elections (5m rate)
Unclean Leader Elections (mean rate)
Unclean Leader Elections (1m rate)
Active Controllers
Heap Memory Committed
Heap Memory Init
Heap Memory Max
Heap Memory Used
Zookeeper Session Expires
Zookeeper Session Expires (15m rate)
Zookeeper Session Expires (5m rate)
Zookeeper Session Expires (mean rate)
Zookeeper Session Expires (1m rate)
Zookeeper Authentication Failures
Zookeeper Authentication Failures (15m rate)
Zookeeper Authentication Failures (5m rate)
Zookeeper Authentication Failures (mean rate)
Zookeeper Authentication Failures (1m rate)
Leader Election Time (50th percentile)
Leader Election Time (75th percentile)
Leader Election Time (95th percentile)
Leader Election Time (98th percentile)
Leader Election Time (999th percentile)
Leader Election Time (99th percentile)
Leader Election Count
Leader Election Time (15m rate)
Leader Election Time (5m rate)
Leader Election Time Max
Leader Election Time Mean
Leader Election Time (mean rate)
Leader Election Time Min
Leader Election Time (1m rate)
Leader Election Time (stddev)
Network Requests Fetch Follower
Network Requests Fetch Follower (15m rate)
Network Requests Fetch Follower (5m rate)
Network Requests Fetch Follower (mean rate)
Network Requests Fetch Follower (1m rate)
Broker Topic Messages
Broker Topic Messages (15m rate)
Broker Topic Messages (5m rate)
Broker Topic Messages (mean rate)
Broker Topic Messages (1m rate)
Broker Topic Bytes In
Broker Topic Bytes In (15m rate)
Broker Topic Bytes In (5m rate)
Broker Topic Bytes In (mean rate)
Broker Topic Bytes In (1m rate)
Zookeeper Disconnects Count
Zookeeper Disconnects (15m rate)
Zookeeper Disconnects (5m rate)
Zookeeper Disconnects (mean rate)
Zookeeper Disconnects (1m rate)
Network Requests Fetch Consumer Total Time (50th percentile)
Network Requests Fetch Consumer Total Time (75th percentile)
Network Requests Fetch Consumer Total Time (95th percentile)
Network Requests Fetch Consumer Total Time (98th percentile)
Network Requests Fetch Consumer Total Time (999th percentile)
Network Requests Fetch Consumer Total Time (99th percentile)
Network Requests Fetch Consumer Total Time
Network Requests Fetch Consumer Total Time Max
Network Requests Fetch Consumer Total Time Mean
Network Requests Fetch Consumer Total Time Min
Network Requests Fetch Consumer Total Time Stddev
LeaderCount
Requests Waiting in Fetch Purgatory
Broker Topic Bytes Out
Broker Topic Bytes Out (15m rate)
Broker Topic Bytes Out (5m rate)
Broker Topic Bytes Out (mean rate)
Broker Topic Bytes Out (1m rate)
Zookeeper Authentications
Zookeeper Authentications (15m rate)
Zookeeper Authentications (5m rate)
Zookeeper Authentications (mean rate)
Zookeeper Authentications (1m rate)
Requests Produce Count
Requests Produce (15m rate)
Requests Produce (5m rate)
Requests Produce (mean rate)
Requests Produce (1m rate)
Replica Manager ISR Expands
Replica Manager ISR Expands (15m rate)
Replica Manager ISR Expands (5m rate)
Replica Manager ISR Expands (mean rate)
Replica Manager ISR Expands (1m rate)
Replica Manager Under Replicated Partitions

Troubleshooting

Additional information may be found from the Support page.