Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prometheus Push Gateway Source #10304

Closed
csolidum opened this issue Dec 6, 2021 · 10 comments 路 Fixed by #18143
Closed

Prometheus Push Gateway Source #10304

csolidum opened this issue Dec 6, 2021 · 10 comments 路 Fixed by #18143
Labels
domain: sources Anything related to the Vector's sources source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.

Comments

@csolidum
Copy link

csolidum commented Dec 6, 2021

Community Note

  • Please vote on this issue by adding a 馃憤 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Current Vector Version

0.16.1

Use-cases

  • Use vector to accept Prometheus Push Gateway metrics requests

We'd like to set up a vector cluster to accept push metrics from lambda functions and cron jobs and fan them out to several sinks.

Attempted Solutions

We can implement this currently with a combination of the http source and a custom lua filter. This is non ideal due to the performance penalty that using lua filters incur.

Proposal

Create a new Prometheus Push Gateway source that will accept the prometheus push gateway format and translate it into vector's internal metric format. We would like to be able to pull out some http headers on this request as well and add that as tags to the metric.

@csolidum csolidum added the type: feature A value-adding code addition that introduce new functionality. label Dec 6, 2021
@jszwedko jszwedko added the domain: sources Anything related to the Vector's sources label Dec 6, 2021
@Sinjo
Copy link
Contributor

Sinjo commented Apr 20, 2023

Hey, I was about to write exactly this feature proposal, and then found this one when searching.

TL;DR: I'm down to build this and submit a PR, assuming the maintainers are +1 on the idea.

Why this would be useful

The reason I'm interested in this is that reasonably often I run into workloads - mostly batch jobs like cron jobs - that don't lend themselves to being scraped by Prometheus.

In a few cases, I've been able to get away with using the Pushgateway, but it's extremely limiting. It effectively only works for gauges, since you always want those to be replaced with the latest value. For counters, I've never run into a use-case where its behaviour of replacing the value of a counter under a given grouping key is what I want.

There's also prom-aggregation-gateway, written by Weaveworks and now maintained by Zapier. It's closer to what I want, but feels less complete and more rough around the edges than Vector. It also does something weird with gauges, and sums them (I'm going to open an issue with them to see if they'd be open to changing it), which doesn't really make sense. I abandoned it pretty quickly when I saw it summing a timestamp I was pushing as a gauge.

Implementation

There are two formats to support: text and protobuf. As far as I can tell, the heavy lifting for the text format is already implemented here, so it shouldn't be too much work to wrap that up in a push server. I might be mistaken, but it looks like the remote write protobuf is different to the one used for metric exposition (push or pull).

I'm thinking I'll start with text since it has the broadest support (not all the client libraries support the protobuf format) and looks like the least work.

There's also a bit of work to do parsing the grouping key specified in the URL and using values from it to overwrite values in the parsed metrics body.

Caveats

It's worth noting that this solution won't replicate Pushgateway's features 1:1 - specifically when it comes to grouping keys.

When you push metrics to a Prometheus Pushgateway, anything previously PUT to the same grouping key is replaced (quote from these docs):

PUT is used to push a group of metrics. All metrics with the grouping key specified in the URL are replaced by the metrics pushed with PUT.

I'm pretty sure this is impossible to replicate within Vector's architecture. By the time the source metrics have been converted into events, any notion of grouping key is gone.

To reduce confusion, we might want to support POST instead, as its behaviour is more similar to what we'll end up with:

POST works exactly like the PUT method but only metrics with the same name as the newly pushed metrics are replaced (among those with the same grouping key).

Similarly, I don't think DELETE makes sense for us to implement here, and metric removal would have to be handled through flush_period_secs in the prometheus_exporter sink.


Full disclosure: I'm one of the maintainers of the Prometheus Ruby client, and I'd love to have a usable push aggregation option - both for myself and to recommend to people in similar situations when they ask for help.

@bruceg
Copy link
Member

bruceg commented Apr 20, 2023

Given that the largest part of this work appears to be the encoding, I wonder how much of this could be accomplished with a codec specific to the Prometheus text or protobuf format combined with the http_server source. Certainly that would not help with any of the caveats listed above, but if it's just a matter of data formatting, we would very much prefer a codec over a new source.

@bruceg bruceg added the source: new A request for a new source label Apr 20, 2023
@Sinjo
Copy link
Contributor

Sinjo commented Apr 22, 2023

So that should be partly possible, but if I've understood the HTTP server source's code, it would introduce another caveat.

From what I can tell the request path gets encoded as a key on the event which wouldn't match the behaviour of the Prometheus Pushgateway, where alternating path components are treated as key-value pairs to add to all metrics in that push.

@Sinjo
Copy link
Contributor

Sinjo commented May 3, 2023

@bruceg Just wanted to check in and see if you have thoughts on that.

I think not supporting metric labels from the URL would be a reasonably big caveat to add. I'm not sure if there's something I missed in the HTTP server source's code that would let me access the URL and parse the relevant content from it (rather than the whole thing just being encoded as a key on the event).

@jszwedko
Copy link
Member

jszwedko commented May 8, 2023

Ah, yeah, given the fields are encoded in the URL as well, I don't think a codec would be sufficient. This does seem like it would need to be a standalone source; it could reuse shared bits from the existing http_server source though.

@Sinjo
Copy link
Contributor

Sinjo commented May 31, 2023

Sounds good! I'll look at how this source can make use of the code that's already in http_server, and any commonality that can be extracted out.

I'm just finishing up on an unrelated project at work, and then I'll have some time to work on this.

@Sinjo
Copy link
Contributor

Sinjo commented Jul 25, 2023

Hey, just wanted to give a little update.

I've started working on this properly now. I'm mostly figuring out my way around Rust and the Vector codebase (working here, but there's nothing much to see yet).

One thing I realised during this early part of the work is that there are two pretty distinct use-cases for this source:

  • As an aggregator for receiving metrics at the end of a batch job
    • In this use case, we want counters and histograms to be summed across metric pushes
    • This is the use case that got me interested in building this source to begin with
  • As a push target for a long-running service which periodically pushes its internal registry to a Pushgateway
    • In this case, we want counters to replace their previous values
    • This is closer to what the existing Prometheus Pushgateway does, though with the difference that Vector offers features like TTL, and can re-expose the metrics in formats other than Prometheus

While the first one is my motivation for building this, I think it would be cool to support both. I also don't think it should be too difficult, assuming I've understood Vector's event model properly.

I noticed that there's this MetricKind enum that supports Incremental and Absolute metrics. My thinking right now is to have a config flag (something like enable_aggregation) to toggle between the two. Having looked at a few different sources, I think I've understood this right, but please tell me if I haven't.

@Sinjo
Copy link
Contributor

Sinjo commented Aug 2, 2023

I've got something working, and left it in a draft PR at #18143. I figured that was the easiest way to start talking about specific implementation details.

@StephenWakely
Copy link
Contributor

@Sinjo, @csolidum and anyone else who can answer this. Is there anything that a PushGateway source could do that an OpenTelemetry source couldn't?

My thinking is both are push based protocols, but OpenTelemetry is much more likely to be universally supported - at least in due course. If we want to focus our development and maintenance efforts, I would much prefer to focus on OpenTelemetry.

We are planning on implementing OpenTelemetry soon.

@Sinjo
Copy link
Contributor

Sinjo commented Aug 8, 2023

I don't have data on relative usage, but there's a pretty large world of applications out there already using Prometheus for their metrics, and I don't see that disappearing any time soon even as OpenTelemetry Metrics start gaining support.

For anyone who already has a heavily-instrumented codebase with a mixture of long-running (i.e. easy to scrape the normal way with Prometheus) and short-lived batch jobs, using OpenTelemetry for the short-lived jobs is going to involve a big migration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: sources Anything related to the Vector's sources source: new A request for a new source type: feature A value-adding code addition that introduce new functionality.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants