chore(pubsub): additional buckets to consume_latency prom metrics #697

juanamari94 · 2024-03-13T17:58:40Z

Summary

We're increasing the amount of buckets in
gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket to enable better tracking of tail end latencies.

TheKevJames · 2024-03-13T18:31:07Z

pubsub/gcloud/aio/pubsub/metrics.py

@@ -35,6 +35,8 @@
        namespace=_NAMESPACE,
        subsystem=_SUBSYSTEM,
        unit='seconds',
+        buckets=(.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5,


Can probably trim off some of these smaller ones, that level of granularity is going to be dominated just in transit time anyway. Maybe just do something like:

buckets=(.01, .1, .25, .5, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0, 30.0, float('inf'))

nit: while you're here, float('inf') is unnecessary

Is it? I've seen it used in the prometheus libraries themselves as well

Yeah we leave it out elsewhere and still receive values for those buckets. Interesting though, do you have a link? I only mention to avoid confusion in the places we've omitted including this, but it's fine if we want to keep it to be explicit here.

edit: for reference https://github.com/prometheus/client_python/blob/4535ce0f43097aa48e44a65747d82064f2aadaf5/prometheus_client/metrics.py#L618-L619

But I also see they explicitly mention inf here: https://github.com/prometheus/client_python/blob/4535ce0f43097aa48e44a65747d82064f2aadaf5/prometheus_client/metrics.py#L586

I conclude with no conclusion.

The second link is the one I saw

I believe they always add an inf bucket at the end if it's missing, so including it ourselves is a noop/stylistic choice.

cphoward · 2024-03-13T21:31:15Z

pubsub/gcloud/aio/pubsub/metrics.py

@@ -35,6 +35,8 @@
        namespace=_NAMESPACE,
        subsystem=_SUBSYSTEM,
        unit='seconds',
+        buckets=(.01, .1, .25, .5, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0,


The defaullt buckets seem to be:

from prometheus_client import Histogram # Create a Histogram without specifying buckets to see the default ones histogram = Histogram('request_duration_seconds', 'Description of histogram', unit='seconds') # Access the default buckets default_buckets = histogram._upper_bounds default_buckets

Results:

[0.005, 0.01, 0.025, 0.05, 0.075, 0.1, 0.25, 0.5, 0.75, 1.0, 2.5, 5.0, 7.5, 10.0, inf]

Given the 2 plus minutes we're being reported, we might even need to have bigger buckets, say, 60, 90 and 120.

I'd be fine going up to 2 minutes as well. I'm not sure the 90 second bucket is necessarily useful. For our purposes, if we're something like 30 seconds behind then we're definitely way too far behind already, I think we just wanted to have a better chance at capturing more accurate upper bounds (or perhaps more accurately, non-infinite upper bounds!). Also good to keep in mind this applies to all of our apps, plus the apps of our users who choose to make use of the metrics.

We're increasing the amount of buckets in `gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket` to enable better tracking of tail end latencies.

juanamari94 requested review from TheKevJames and shaundialpad as code owners March 13, 2024 17:58

juanamari94 force-pushed the chore/pubsub-buckets branch from bdd7e20 to aba3ae2 Compare March 13, 2024 18:00

juanamari94 requested a review from a team as a code owner March 13, 2024 18:00

juanamari94 requested review from leanaha and removed request for a team March 13, 2024 18:00

TheKevJames reviewed Mar 13, 2024

View reviewed changes

juanamari94 force-pushed the chore/pubsub-buckets branch 2 times, most recently from cf40913 to b2e0f9b Compare March 13, 2024 18:54

cphoward approved these changes Mar 13, 2024

View reviewed changes

juanamari94 force-pushed the chore/pubsub-buckets branch 2 times, most recently from 88c1cff to b19c2e9 Compare March 14, 2024 14:26

juanamari94 changed the title ~~chore(pubsub): add 20s, 30s buckets to consume_latency prom metrics~~ chore(pubsub): additional buckets to consume_latency prom metrics Mar 14, 2024

chore(pubsub): additional buckets to consume_latency prom metrics

52b397c

We're increasing the amount of buckets in `gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket` to enable better tracking of tail end latencies.

juanamari94 force-pushed the chore/pubsub-buckets branch from b19c2e9 to 52b397c Compare March 14, 2024 14:36

juanamari94 merged commit 9263a38 into master Mar 14, 2024
85 checks passed

juanamari94 deleted the chore/pubsub-buckets branch March 14, 2024 14:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(pubsub): additional buckets to consume_latency prom metrics #697

chore(pubsub): additional buckets to consume_latency prom metrics #697

juanamari94 commented Mar 13, 2024

TheKevJames Mar 13, 2024

juanamari94 Mar 13, 2024

shaundialpad Mar 13, 2024

juanamari94 Mar 13, 2024

shaundialpad Mar 13, 2024 •

edited

juanamari94 Mar 13, 2024

TheKevJames Mar 14, 2024

cphoward Mar 13, 2024

shaundialpad Mar 14, 2024 •

edited

chore(pubsub): additional buckets to consume_latency prom metrics #697

chore(pubsub): additional buckets to consume_latency prom metrics #697

Conversation

juanamari94 commented Mar 13, 2024

Summary

TheKevJames Mar 13, 2024

Choose a reason for hiding this comment

juanamari94 Mar 13, 2024

Choose a reason for hiding this comment

shaundialpad Mar 13, 2024

Choose a reason for hiding this comment

juanamari94 Mar 13, 2024

Choose a reason for hiding this comment

shaundialpad Mar 13, 2024 • edited

Choose a reason for hiding this comment

juanamari94 Mar 13, 2024

Choose a reason for hiding this comment

TheKevJames Mar 14, 2024

Choose a reason for hiding this comment

cphoward Mar 13, 2024

Choose a reason for hiding this comment

shaundialpad Mar 14, 2024 • edited

Choose a reason for hiding this comment

shaundialpad Mar 13, 2024 •

edited

shaundialpad Mar 14, 2024 •

edited