feat(prometheus sink): Support histograms#675
Conversation
|
This PR draft is a playground for discussing the UX of metrics pipelines. Specifically, it touches The intermediate result is that using this config we can transform such log content into such Prometheus plain text exposition |
|
Nice! Are there any specific issues you want us to look at? The config looks great to me, but we can have Luke or Lucio do a review on the direction if it helps. |
|
Now the issues and questions. Prometheus HELP sectionsAdding meta information about the metric seem like a good thing to allow in configs. The question is whether it should be part of metric source, LabelsLabels are imperative for Prometheus UX. It's hard to imagine how we could express typical Prom patterns like e.g. status code counter ( SetsUnlike Statsd, Prometheus doesn't support In general, the idea of handling metric types coercions, implicit or explicit, sounds a bit controversial to me. |
Individual buckets configsBucket configuration needs to be done in sinks, and it could be done similar to this. However I'm not sure this config is ergonomic, mainly because:
|
|
@binarylogic Yeah, need some help to spec this out. |
|
@loony-bean Great questions! I'm not sure of all the answers, but here are my thoughts:
I think the main question here is whether or not this information is specific to prometheus. If it is, any configuration of it should probably be limited to the prometheus sink. But we should also think about the overall user experience here. My suspicion is that this would require a lot of time spent writing configuration before providing much value to the user. And at first glance, it doesn't seem as through tools like https://github.com/prometheus/statsd_exporter have decided it's worthwhile to implement anything like this.
Labels were definitely one of the next things I think we should tackle in the metrics area. I always try to split up tasks into small iterative steps, but if you think it would be better to do labels alongside this work feel free to go ahead. I just don't want any individual change to snowball too much and become hard to review.
Yeah, sets are kinda funky. The simplest thing we could do is simply ignore them, which would be totally valid for this early stage. We've also been talking about trying to drop our dependency on the prometheus crate. It doesn't seem well suited to our use case, and we've experimented with leaving it out of some of our internal metrics implementations. If that would make this easier, feel free to explore that route as well.
I'm not sure I totally understand this example. Do we change the name of the field from
We could provide the option of giving a default bucketing for histograms. That would cut down on the repetitiveness if the users wants the same behavior for most fields. If we wanted to be fancier, we could let them define named bucketings and then just specify the name for each metric. What do you mean by different types for the same thing? |
|
Thanks for input @lukesteensen! Re HELP, Labels, Sets - all clear, will ignore it for now.
Yes, we have a special logic of adding a suffix to counter names. Maybe we can adjust this a bit to simply use the field name as is, like you said? /cc @binarylogic |
Using this config as an example: In here user can try to connect The question in general is if we are going to allow user to convert internal metric type (internal histogram) into different sink types (prometheus histogram/summary), and how user should be guided? |
|
I see what you're saying. This seems to boil down to how the sink is configured. I think we should maintain the current prometheus sink behavior of aggregating all the metrics it receives without requiring an explicit configuration for each metric. This makes the sink much easier to use by reducing the burden of writing configs, especially in use cases with many different metrics. Obviously, we'll still need a way for users to specify how certain metrics should be aggregated. I think our approach should be layers here, to try to minimize the amount of config a user would need to write:
Not all of this should be implemented right away, of course. Starting with (1) would be very reasonable, and we could add (2) and (3) as their own followup features. To get back to your question, I think the answer is that we should try to design our configuration to avoid that problem as much as possible. With the design above, nothing necessarily has to match correctly for the system to work. It will do a reasonable default thing on it's own, and you have the ability to layer patterned overrides on top. If an override doesn't match an incoming metric, it will simply be handled with the default behavior. |
a5b5b5d to
e2256ed
Compare
|
Thanks again @lukesteensen for your comments! In this PR all defaults (1) layer is respected, as well as @binarylogic Within this PR the way we construct a default name for Counters has changed, and a new example config is added. I'd like to understand the necessary amount of documentation changes that I need to include into this PR, thus asking for your review. |
|
Nice work!
You're talking specifically about Prometheus metrics, correct? If the user provides an explicit name for a metric (
Good question and thanks for considering that. It appears there are some behavior changes that we can better reflect in the docs. I can take a stab at that and push up a commit. |
Almost. The change happened in the upstream |
|
Sure, I'm ok with that for now. I think any naming changes we need to make for specific storages will become more clear over time. |
lukesteensen
left a comment
There was a problem hiding this comment.
Added some comments, but this looks good! I'll defer to @binarylogic for approval on docs and such.
Signed-off-by: Alexey Suslov <alexey.suslov@gmail.com>
Signed-off-by: Alexey Suslov <alexey.suslov@gmail.com>
Signed-off-by: Alexey Suslov <alexey.suslov@gmail.com>
| } | ||
| }), | ||
| Metric::Set { .. } => { | ||
| trace!("Sets are not supported in Prometheus sink"); |
There was a problem hiding this comment.
@LucioFranco as part of #628, do you think it makes sense to log this at the warning level and rate limit it? I'm curious what you think.
There was a problem hiding this comment.
A small remark here. I believe we will eventually add support for this, and this message will disappear.
Luke: I was also wondering if we could support sets by exposing them as a gauge/counter value. We don't need to do it now, but could be interesting for the future.
|
Just to update, I'm in the process of updating the docs. I have a few questions though
|
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
232eac3 to
e14e957
Compare
|
I've updated the docs to reflect everything going on here. I need answers to my above questions to verify some of the content. |
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
I'm sure you've seen this comparison. My understanding is that histograms provide more features, and require less work on our side (because the work is delegated to Prometheus server).
Sorry about this. This PR started as a draft, and some changes were not reflected in the upstream comments. I believe I'd need to start a new PR next time, or update the comments retrospectively. Currently our default buckets are precisely Prometheus' defaults, and user should not see 1, 10, 100... buckets.
This is a good question. Actually we have tests, but they live in Statsd module for historical reasons. We are missing histograms asserts in those tests though (will add some). I'm not sure if we need to copy or move the tests from this module or keep it there. |
|
Thanks @loony-bean
Agree, I was curious to hear your reasoning. That makes sense to me.
Perfect. I've updated the docs to reflect this.
Whatever you think is best from an organization standpoint. I just want to make sure we're testing behavior traits to ensure we don't introduce backwards incompatible changes in the future. New question! 😄 I think we might have an issue with units. Our default buckets are in seconds, but if a metric has a unit that is not seconds it will fall into the wrong bucket. For example, if we have an internal metric like: And our buckets are (in seconds): [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]This would fall into the I have experience working with the Elixir prometheus client, and they implemented a unit auto-detection featured based on the suffix in the name. Additionally, you could manually specify the unit so it was normalized. I think we should address this in a follow up PR? Or we can just require the user to redefine the buckets. |
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
|
This PR is getting pretty noisy and I think it's a good first step for supporting histograms. I've created follow up issues to address changes in separate PRs. Going to merge as a result. |
Closes #384