Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(pubsub): additional buckets to consume_latency prom metrics #697

Merged
merged 1 commit into from
Mar 14, 2024

Conversation

juanamari94
Copy link
Contributor

Summary

We're increasing the amount of buckets in
gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket to enable better tracking of tail end latencies.

@juanamari94 juanamari94 requested a review from a team as a code owner March 13, 2024 18:00
@juanamari94 juanamari94 requested review from leanaha and removed request for a team March 13, 2024 18:00
@@ -35,6 +35,8 @@
namespace=_NAMESPACE,
subsystem=_SUBSYSTEM,
unit='seconds',
buckets=(.005, .01, .025, .05, .075, .1, .25, .5, .75, 1.0, 2.5,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can probably trim off some of these smaller ones, that level of granularity is going to be dominated just in transit time anyway. Maybe just do something like:

buckets=(.01, .1, .25, .5, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0, 30.0, float('inf'))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: while you're here, float('inf') is unnecessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? I've seen it used in the prometheus libraries themselves as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we leave it out elsewhere and still receive values for those buckets. Interesting though, do you have a link? I only mention to avoid confusion in the places we've omitted including this, but it's fine if we want to keep it to be explicit here.

edit: for reference https://github.com/prometheus/client_python/blob/4535ce0f43097aa48e44a65747d82064f2aadaf5/prometheus_client/metrics.py#L618-L619

But I also see they explicitly mention inf here: https://github.com/prometheus/client_python/blob/4535ce0f43097aa48e44a65747d82064f2aadaf5/prometheus_client/metrics.py#L586

I conclude with no conclusion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second link is the one I saw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they always add an inf bucket at the end if it's missing, so including it ourselves is a noop/stylistic choice.

@juanamari94 juanamari94 force-pushed the chore/pubsub-buckets branch 2 times, most recently from cf40913 to b2e0f9b Compare March 13, 2024 18:54
@@ -35,6 +35,8 @@
namespace=_NAMESPACE,
subsystem=_SUBSYSTEM,
unit='seconds',
buckets=(.01, .1, .25, .5, 1.0, 2.5, 5.0, 7.5, 10.0, 20.0,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaullt buckets seem to be:

from prometheus_client import Histogram

# Create a Histogram without specifying buckets to see the default ones
histogram = Histogram('request_duration_seconds', 'Description of histogram', unit='seconds')

# Access the default buckets
default_buckets = histogram._upper_bounds
default_buckets

Results:

[0.005,
 0.01,
 0.025,
 0.05,
 0.075,
 0.1,
 0.25,
 0.5,
 0.75,
 1.0,
 2.5,
 5.0,
 7.5,
 10.0,
 inf]

Given the 2 plus minutes we're being reported, we might even need to have bigger buckets, say, 60, 90 and 120.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be fine going up to 2 minutes as well. I'm not sure the 90 second bucket is necessarily useful. For our purposes, if we're something like 30 seconds behind then we're definitely way too far behind already, I think we just wanted to have a better chance at capturing more accurate upper bounds (or perhaps more accurately, non-infinite upper bounds!). Also good to keep in mind this applies to all of our apps, plus the apps of our users who choose to make use of the metrics.

@juanamari94 juanamari94 force-pushed the chore/pubsub-buckets branch 2 times, most recently from 88c1cff to b19c2e9 Compare March 14, 2024 14:26
@juanamari94 juanamari94 changed the title chore(pubsub): add 20s, 30s buckets to consume_latency prom metrics chore(pubsub): additional buckets to consume_latency prom metrics Mar 14, 2024
We're increasing the amount of buckets in
`gcloud_aio_pubsub_subscriber_consume_latency_seconds_bucket` to enable
better tracking of tail end latencies.
@juanamari94 juanamari94 merged commit 9263a38 into master Mar 14, 2024
85 checks passed
@juanamari94 juanamari94 deleted the chore/pubsub-buckets branch March 14, 2024 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants