Add support for exporting metrics to Prometheus. #3784
WIP PR to export metrics to Prometheus, as discussed in #3644
I've gotten feedback from @demmer on the approach in this PR, but I'd like to get some feedback from others as well, especially around metric names and the API exposed. Despite the long list of changes below, there are no breaking changes in this PR for anybody that relies on the expvar stats.
Metrics backend changes / naming stuff
Metric types not exported
Additional things to do in follow-on PRs
Reviewed 5 of 43 files at r2, 3 of 19 files at r3, 3 of 12 files at r4.
Is it necessary to have a different namespace for each binary?
If so, what about other binaries which we have e.g. vtctld and vtworker?
Does Prometheus require that? Its internal Google counterpart doesn't care and instead can distinguish binaries by their production job name ;)
As @demmer pointed out before, you'll probably need a
Note that the removal/rename of this will break the Google internal plugins. Given that, please give me or @alainjobart a head's up before merging this. This way, we can change all internal callers first and verify that they won't break due to this change.
Previously, sougou (Sugu Sougoumarane) wrote…
Turns out snake_case uses underscores but internally we're doing kebab-case (with hyphens).
I've exported the code: #3816
But I'm not sure if it's worth it merging it.
Also note that we do not have special handling for certain words e.g. the resulting internal variable name is
Given this style guide recommendation, you shouldn't rename the import to "prom".
(In general, I personally don't like abbreviations (unless it's a local variable or a Go receiver name). E.g. "prombackend" is less descriptive than "prometheus_backend". Similar comment for the other names e.g. "PublishPromMetric".)
Head's up: This is dead code which I'm going to delete here: #3814
Can you please delete the comments since you copied them into the description now?
Please make sure that no information gets lost e.g. the description for this stats var misses the examples
Comments from Reviewable
Replying here in a few comments, as I can't respond in Reviewable for now. (I need to check with some internal people about the permissions that responding in Reviewable requires (it asked for write perms to all my orgs...?))
Small stuff first:
@demmer and I talked about this a while back, but I forget why we landed on this. Prometheus does let you specify a namespace per binary in the static scrape config, which we could use instead of putting the namespace here. @demmer, any new thoughts?
That sounds good actually, I'll do a rename pass.
Thanks for the head's up : )
Updated the PR description for what's left.
Commit I just pushed included responses to all the review comments except:
Also, I still need to add unit tests, and we need to resolve what to do about this toSnakeCase thing.
demmer left a comment
This is looking really really close!
Most of these comments relate to description nit picks.
The only substantive issue relates to the need for Counter/Gauge variants of DurationFunc.
Also as you I'm sure noticed, there are merge conflicts and test / CodeClimate cleanup.
Responded to your comments.
I did a best effort pass through of which of the DurationFuncs were gauges/counters, but would love an extra eye on that @demmer.
Also rebased, made all the other changes I
From my standpoint the only remaining item before I think this is good to merge is the minor reorganization of where the snake conversion goes.
@sougou told me in DM he doesn't care, so he doesn't get a vote on this one.
It looks like this is good to merge? If so, let's do as follows:
Let me try to get to this today or tomorrow.
I have an internal import open with all the necessary changes to our internal code. Tests are passing now.
Can you please look at my last comments and address them in additional CLs? Once that's done, I'll patch them into my pending import as well and then we can wrap things up.