Kafka lag exporter are either broken, slow or only send metrics to influx. This is my attempt to write my own. This is inspired by burrowx, but simplified with the more robust logic I could think of. Yakle basically export the same sets of metrics as danielqsj/kafka_exporter and kminion. Dashboards are based on kminion one but a bit extended.
One cool feature compared to other exporters is that yakle can reports not only offset lag but also time lag (real time lag, not interpolated). Be aware that this only work for non compacted topics and that this feature is relatively slow (need to fetch lot of individual offsets). Yakle is "production" tested and worked since months in our environment (dozen of kafka clusters, hundred of brokers/topics/groups with many partitions) Yakle is tested fast enough. Parallelism is set to 10 workers by default. You can try to increase it but this is not measured faster and put a lot of pressure on kafka cluster.
Usage of ./yakle:
-kafka-brokers="localhost:9092": address list of kafka brokers to connect to
-kafka-fetch-timestamp=false: enable timestamps calculation
-kafka-label="kafka-cluster": kafka cluster name for labeling metrics
-kafka-workers=10: number of parallel workers for fetching metrics
-log-debug=false: enable debug and sarama logging
-refresh-interval=30: interval for refreshing metrics
-topic-filter="^__.*": regex for excluding topics, default to internal topics
-group-filter="^__.*": regex for excluding groups, default to internal groups
-web-listen-address=":8080": address (host:port) to listen on for telemetry
-web-telemetry-path="/metrics": path under which to expose metrics
Flags can be also be passed as environment variables.
Docker image exist at dockerhub ut0mt8/yakle:latest
cluster
: Cluster name
topic
: Topic name
partition
: Partition ID
group
: Consumer group name
Metric | Description |
---|---|
kafka_cluster_info{cluster, broker_count, controller_id, group_count, topic_count} |
General informations for the cluster |
kafka_broker_info{cluster, broker_id, address, is_controller, rack_id} |
Informations for a given broker |
Metric | Description |
---|---|
kafka_topic_info{cluster, topic, partition_count, replication_factor} |
Informations for a given topic |
kafka_topic_broker_logdir_size{cluster, topic, broker, path} |
Logdir size for a given topic/broker |
Metric | Description |
---|---|
kafka_topic_partition_info{cluster, topic, partition, leader, replicas, insync_replicas} |
Informations for a given topic/partition |
kafka_topic_partition_not_preferred{cluster, topic, partition} |
Boolean indicating if the leader don't use its preferred broker for a given topic/partition |
kafka_topic_partition_under_replicated{cluster, topic, partition} |
Boolean indicating if all replicas are in sync for a given topic/partition |
kafka_topic_partition_newest_offset{cluster, topic, partition} |
Latest commited offset for a given topic/partition |
kafka_topic_partition_oldest_offset{cluster, topic, partition} |
Oldest offset available for a given topic/partition |
kafka_topic_partition_oldest_time{cluster, topic, partition} |
Timestamp in ms of the oldest offset available for a given topic/partition |
Metric | Description |
---|---|
kafka_group_info{cluster, group, coordinator_id, state, members_count} |
Informations for a given group |
kafka_group_topic_partition_current_offset{cluster, group, topic, partition} |
Current offset for a given group/topic/partition |
kafka_group_topic_partition_offset_lag{cluster, group, topic, partition} |
Offset lag for a given group/topic/partition |
kafka_group_topic_partition_time_lag{cluster, group, topic, partition} |
Time lag (in ms) for a given group/topic/partition |
- Add unit tests...