Add support for exporting metrics to Prometheus. #3784
WIP PR to export metrics to Prometheus, as discussed in #3644
I've gotten feedback from @demmer on the approach in this PR, but I'd like to get some feedback from others as well, especially around metric names and the API exposed. Despite the long list of changes below, there are no breaking changes in this PR for anybody that relies on the expvar stats.
Metrics backend changes / naming stuff
Metric types not exported
Additional things to do in follow-on PRs
The text was updated successfully, but these errors were encountered:
Reviewed 5 of 43 files at r2, 3 of 19 files at r3, 3 of 12 files at r4.
Is it necessary to have a different namespace for each binary?
If so, what about other binaries which we have e.g. vtctld and vtworker?
Does Prometheus require that? Its internal Google counterpart doesn't care and instead can distinguish binaries by their production job name ;)
As @demmer pointed out before, you'll probably need a
Note that the removal/rename of this will break the Google internal plugins. Given that, please give me or @alainjobart a head's up before merging this. This way, we can change all internal callers first and verify that they won't break due to this change.
Previously, sougou (Sugu Sougoumarane) wrote…
Turns out snake_case uses underscores but internally we're doing kebab-case (with hyphens).
I've exported the code: #3816
But I'm not sure if it's worth it merging it.
Also note that we do not have special handling for certain words e.g. the resulting internal variable name is
Given this style guide recommendation, you shouldn't rename the import to "prom".
(In general, I personally don't like abbreviations (unless it's a local variable or a Go receiver name). E.g. "prombackend" is less descriptive than "prometheus_backend". Similar comment for the other names e.g. "PublishPromMetric".)
Head's up: This is dead code which I'm going to delete here: #3814
Can you please delete the comments since you copied them into the description now?
Please make sure that no information gets lost e.g. the description for this stats var misses the examples
Comments from Reviewable
Replying here in a few comments, as I can't respond in Reviewable for now. (I need to check with some internal people about the permissions that responding in Reviewable requires (it asked for write perms to all my orgs...?))
Small stuff first:
@demmer and I talked about this a while back, but I forget why we landed on this. Prometheus does let you specify a namespace per binary in the static scrape config, which we could use instead of putting the namespace here. @demmer, any new thoughts?
That sounds good actually, I'll do a rename pass.
Thanks for the head's up : )
Updated the PR description for what's left.
Commit I just pushed included responses to all the review comments except:
Also, I still need to add unit tests, and we need to resolve what to do about this toSnakeCase thing.
Responded to your comments.
I did a best effort pass through of which of the DurationFuncs were gauges/counters, but would love an extra eye on that @demmer.
Also rebased, made all the other changes I
From my standpoint the only remaining item before I think this is good to merge is the minor reorganization of where the snake conversion goes.
@sougou told me in DM he doesn't care, so he doesn't get a vote on this one.
It looks like this is good to merge? If so, let's do as follows:
Let me try to get to this today or tomorrow.
I have an internal import open with all the necessary changes to our internal code. Tests are passing now.
Can you please look at my last comments and address them in additional CLs? Once that's done, I'll patch them into my pending import as well and then we can wrap things up.
Signed-off-by: Maggie Zhou <firstname.lastname@example.org>
* Fix the nanoseconds => ints for histogram buckets. * Gigantic rename of metrics: -Int => Counter -IntGauge => Gauge -IntFunc => GaugeFunc -Counters => CountersWithLabels -MultiCounters => CountersWithMultiLabels -MultiCounterFunc => CountersFuncWithMultiLabels -Gauges => GaugesWithLabels -MultiGauges=> GaugesWithMultiLabels * Add an explicit labelName for CountersWithLabels & GaugesWithLabels Signed-off-by: Maggie Zhou <email@example.com>
pull_backend.go stuff into prombackend/prombackend.go's Register call and make the pull_backend.go an interface only. Also, fix some more types from the gigantic refactor, and run GoLint/GoVet Signed-off-by: Maggie Zhou <firstname.lastname@example.org>
Fix tests to use the new metric names. Fix tests that expected gauges and not counters. (more to come on this thing, but these were the tests that indicated gauges early) Fix the `Set()` implementation for GaugesWithMultiLabels. Signed-off-by: Maggie Zhou <email@example.com>
Add plugins for each of the components. Rename prombackend => prometheusbackend Rename publishPromMetric to publishPrometheusMetric Rename Reset() => ResetAll() ResetCounter => Reset() Rename various Counters => Gauges per Mike's helpful pointers. Rename a few help strings. Remove the comments in go/vt/worker/worker.go metrics and move them 100% into help functions. Add a GaugeFuncWithMultiLabels Add a CounterFunc Signed-off-by: Maggie Zhou <firstname.lastname@example.org>
…icFunc` that allows us to both export ints (like it did before) and now also durations. - Moved Prom specific stuff in the plugins to the prometheus backend itself. - A ton of stats help string fix ups. - some counters are gauges, and some gauges are counters! Fix those. Signed-off-by: Maggie Zhou <email@example.com>
case. Log an warning when we call Add() on a counter with a negative number. Fix the double underscore in the prometheus exported metrics when we dedupe the namespace. Signed-off-by: Maggie Zhou <firstname.lastname@example.org>
counters. Remove some unneccessary commented out code. Signed-off-by: Maggie Zhou <email@example.com>
Signed-off-by: Maggie Zhou <firstname.lastname@example.org>
An existing test used this functionality. When we stopped exporting the "counters" type, the test lost that functionality. Instead of relying on an implementation detail, there is a proper public method now. Signed-off-by: Michael Berlin <email@example.com>
vitessio#3784 introduced two layers of code for "stats.Counter" (and the new type "stats.Gauge"). As a consequence, it was for example possible to create a "stats.Gauge" for a time.Duration value. This approach did not work well with our internal usage of the stats package. Therefore, I reversed this and simplified the code: - MetricFunc() interface was removed. Instead, a CountFunc or a GaugeFunc requires a simple func() int64 as input. - IntFunc() removed. Before vitessio#3784, it was used to implement the expvar.Var interface. But now, this is taken care of by the types itself e.g "stats.CounterFunc". Therefore, we do not need it anymore and users of the "stats" package can just pass a plain func() int64. - Added types "Duration" and "DurationFunc" back. - Added "Variable" interface. This allowed to simplify the Prometheus code which depends on the Help() method for each stats variable. - Prometheus: Conversion to float64 values is now done in prometheusbackend.go and removed from the stats package. BUG=78571948 Signed-off-by: Michael Berlin <firstname.lastname@example.org>