Collect consumer metrics #1143

erikvanoosten · 2023-12-24T19:34:49Z

Collect metrics for the consumer using the zio-metrics API. This allows any zio-metrics backend to access and process the observed values.

By default no tags are added, but this can be configured via the new method ConsumerSettings.withMetricsLabels.

The following metrics are collected (kudos to @svroonland for most of the ideas):

Poll metrics: poll count (counter), number of records per poll (histogram), poll latency (histogram).
Partition stream metrics: queue size per partition (histogram), total queue size per consumer (histogram), number of polls for which records are idle in the queue (histogram).
The number of partitions that are paused/resumed (gauge).
Rebalance metrics: currently assigned partitions count (gauge), assigned/revoked/lost partitions (counter).
Commit metrics: commit count (counter), commit latency (histogram). These metrics measure commit requests issued through zio-kafka's api.
Aggregated commit metrics: commit count (counter), commit latency (histogram), commit size (number of offsets per commit) (histogram). After every poll zio-kafka combines all outstanding commit requests into 1 aggregated commit. These metrics are for the aggregated commits.
Number of entries in the command and commit queues (histogram).
Subscription state, 1 for subscribed, 0 of unsubscribed (gauge).

Like the zio-metrics API we follow Prometheus conventions. This means that:

durations are expressed in seconds,
counters can only increase,
metric names use snake_case and end in the unit where possible.

The histograms each use 10 buckets. To reach a decent range while keeping sufficient accuracy at the low end, most bucket boundaries use an exponential series based on 𝑒.

The following metric ideas were also raised, but these are kept for future work:

histogram of number of records returned in a poll, tagged per topic-partition,
number of records ignored (see PollResult),
number of in-flight records (last fetched offset - last committed offset), per partition or perhaps just the raw fetched and committed offsets per partition.

- add metric `allQueueSizeHistogram` - better metrics descriptions - better bucket boundaries - observe metrics in the background

svroonland

Nice, we should definitely have this!

I added suggestions for additional metrics. Let's think about the histogram dimension.

zio-kafka/src/main/scala/zio/kafka/consumer/ConsumerSettings.scala

zio-kafka/src/main/scala/zio/kafka/consumer/internal/ConsumerMetrics.scala

As last fallback use hash code of ConsumerSettings instead of random value.

zio-kafka/src/main/scala/zio/kafka/consumer/ConsumerSettings.scala

erikvanoosten · 2024-01-06T11:41:23Z

@svroonland I would like to stop here and merge the PR. The following metrics from your list were not implemented:

histogram of number of records returned in a poll, tagged per topic-partition
Tagging this per partition generates too many metrics. Therefore, this is only available as a histogram for all partitions combined.
number of records ignored (see PollResult)
We only ignore records just before a seek when doing manual offsets. I am not sure this is an interesting metric.
number of in-flight records (last fetched offset - last committed offset), per partition or perhaps just the raw fetched and committed offsets per partition
An interesting idea, but not easy to implement (the first one), or generating too many metrics (the second one).

Your review and ideas are welcome and valued as always.

svroonland · 2024-01-06T18:49:44Z

Of course, we can always implement the other metrics as follow-ups, consider them more ideas. Will have a look!

svroonland

Great additions. More stuff to consider in the comments

zio-kafka/src/main/scala/zio/kafka/consumer/internal/ConsumerMetrics.scala

zio-kafka/src/main/scala/zio/kafka/consumer/internal/Runloop.scala

zio-kafka/src/main/scala/zio/kafka/consumer/internal/ConsumerMetrics.scala

Also: simplify runloop construction.

Latency of aggregated commits no longer includes the lead time from commit request to start of commit. Also: use unit in metric name as recommended by Prometheus guide.

svroonland · 2024-01-20T19:15:45Z

zio-kafka/src/main/scala/zio/kafka/consumer/internal/Runloop.scala

  maxPollInterval: Duration,
-  commitTimeout: Duration,


Good to have less parameters here

svroonland

One tiny thing to think about, but otherwise looks great!

zio-kafka/src/main/scala/zio/kafka/consumer/internal/Runloop.scala

Collect consumer metrics

cb595dc

erikvanoosten requested review from guizmaii and svroonland December 24, 2023 19:34

erikvanoosten added 2 commits December 25, 2023 08:26

Improvements:

8ea6721

- add metric `allQueueSizeHistogram` - better metrics descriptions - better bucket boundaries - observe metrics in the background

Update metrics names to zio naming conventions

0b220dc

svroonland reviewed Dec 30, 2023

View reviewed changes

erikvanoosten added 2 commits December 30, 2023 19:50

Use group instance id if possible

b1803d2

As last fallback use hash code of ConsumerSettings instead of random value.

Give user full control of metric labels

fa3d780

svroonland reviewed Dec 31, 2023

View reviewed changes

zio-kafka/src/main/scala/zio/kafka/consumer/ConsumerSettings.scala Outdated Show resolved Hide resolved

erikvanoosten added 12 commits December 31, 2023 10:48

Add hint about using group id as label

3ad1fda

Add metric for the number of polls records are idling in the queue

15a0326

Change metric In type to Int iso Double

12ac387

Collect rebalance metrics

f9d9abb

Collect poll metrics

9fbaab1

Collect commit metrics (WIP)

a3e0bad

Collect partition stream metrics on a fixed schedule

d02480b

Complete commit metrics

3bdfe0e

Add resume/pause metrics

392033a

Add command/commit queue metrics

bd7a439

Add command/commit queue metrics

aceb325

Fix tests, simplify runloop metrics

749b642

Minimize code diff

af097d3

svroonland reviewed Jan 13, 2024

View reviewed changes

erikvanoosten added 5 commits January 14, 2024 09:31

Merge branch 'master' into consumer-metrics

ce6d0df

Address review comments

f137a70

Use ZIO's timed, and make runloopMetricsSchedule configurable

90ff904

Also: simplify runloop construction.

Fix for scala 3

0737fe3

Prepare for user customization

3f3e2c7

erikvanoosten mentioned this pull request Jan 16, 2024

partitionsFor method missing from zio.kafka.producer.Producer #1146

Closed

erikvanoosten added 5 commits January 18, 2024 20:46

Merge branch 'master' into consumer-metrics

624a8c4

Merge branch 'master' into consumer-metrics

4812592

Measure commit requests _and_ aggregated commits

8cb0552

Latency of aggregated commits no longer includes the lead time from commit request to start of commit. Also: use unit in metric name as recommended by Prometheus guide.

Fix typo in comment

05810df

Better metric descriptions

06e2c3d

svroonland reviewed Jan 20, 2024

View reviewed changes

svroonland approved these changes Jan 20, 2024

View reviewed changes

zio-kafka/src/main/scala/zio/kafka/consumer/internal/Runloop.scala Show resolved Hide resolved

erikvanoosten merged commit 7b2093e into master Jan 21, 2024
14 checks passed

erikvanoosten deleted the consumer-metrics branch January 21, 2024 07:10

erikvanoosten mentioned this pull request Jan 21, 2024

Make consumer metric collection more configurable #1153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Collect consumer metrics #1143

Collect consumer metrics #1143

erikvanoosten commented Dec 24, 2023 •

edited

svroonland left a comment

erikvanoosten commented Jan 6, 2024 •

edited

svroonland commented Jan 6, 2024

svroonland left a comment

svroonland Jan 20, 2024

svroonland left a comment

Collect consumer metrics #1143

Collect consumer metrics #1143

Conversation

erikvanoosten commented Dec 24, 2023 • edited

svroonland left a comment

Choose a reason for hiding this comment

erikvanoosten commented Jan 6, 2024 • edited

svroonland commented Jan 6, 2024

svroonland left a comment

Choose a reason for hiding this comment

svroonland Jan 20, 2024

Choose a reason for hiding this comment

svroonland left a comment

Choose a reason for hiding this comment

erikvanoosten commented Dec 24, 2023 •

edited

erikvanoosten commented Jan 6, 2024 •

edited