-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding an "instance" tag for the prometheus scraper #9322
Comments
This adds a new configuration, `instance_key`, to the `prometheus_scrape` source. If set, it will tag scraped metrics with their instance in the same fashion as the prometheus scraper. I opted to let users opt into this as this source has been around for a while and I could see the extra tag causing some users some surprise as Prometheus, when scraping, if the metric already has `instance`, will avoid resetting it based on its `honor_labels` configuration. Ref: https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series Closes: #9322 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
If I read the code in #9330 correctly, it is extracting just the host and port out of the endpoint url, correct? I don't think this logic necessarily needs to be hard-coded. You could have an "endpoint" tag, and let VRL massage it if the user wants it to be more prometheus-like, using parse_url. However, if this logic is going to be built-in, I have a suggestion: if the URL contains a fragment identifier, then use the fragment as the instance tag instead of the address:port. For example:
This would set the instance tag to 'foo' and 'bar' respectively for those scrapes, giving meaningful instance labels. This is the sort of thing which you can do using relabelling in prometheus, but you wouldn't be able to do if the "instance" label has already been stripped down to host and port. Or you could provide both EDIT: a quick test shows that the fragment is already correctly removed from the path in the HTTP request, i.e. tcpdump shows |
Thanks for the thoughts @candlerb . I was thinking it'd be nice to match the prometheus scraper's concept of |
I think this would also address #6953? I wonder if this would be enough information to infer the job label and be truly prometheus compatible. Can I have a transform that's like a map from endpoint to job name? |
It looks like this issue is indeed a dupe of #6953, yes. When you say "infer the job label": the job label is static in prometheus, fixed for all targets in the scrape job. You can do it like this in VRL:
This ticket is about the instance label, which identifies the individual scraped endpoint. I'm proposing:
Given (1) you could implement (2) yourself in VRL, using the |
Having one source+transform per job would work, but is cumbersome for a large/dynamic number of jobs. Typically with prometheus you'd set the job label dynamically in relabel_configs and I am wondering if it's possible to achieve something similar with VRL once you have the endpoint tag as proposed here.
Sep 26, 2021 8:44:25 AM Brian Candler ***@***.***>:
It looks like this issue is indeed a dupe of #6953[#6953], yes.
When you say "infer the job label": the *job* label is static in prometheus, fixed for all targets in the scrape job. You can do it like this in VRL:
[sources.scrape_node]
type = "prometheus_scrape"
endpoints = [
'http://192.0.2.1:9100/metrics#foo',
'http://192.0.2.2:9100/metrics#bar',
]
scrape_interval_secs = 60
[transforms.tag_node]
type = "remap"
inputs = ["scrape_node"]
source = '''
.tags.job = "node"
'''
This ticket is about the *instance* label, which identifies the individual scraped endpoint.
I'm proposing:
1. > If you set *endpoint_tag = "endpoint"* then you get the whole URL as the tag
2. > If you set *instance_tag = "instance"* then you get the URL fragment, or if that doesn't exist, the address and port.
…
Given (1) you could implement (2) yourself in VRL, using the *parse_url()* function
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub[#9322 (comment)], or unsubscribe[https://github.com/notifications/unsubscribe-auth/ABDLDWEQCX4FNMPMKKQNDD3UD45VPANCNFSM5EUGIZMA].
Triage notifications on the go with GitHub Mobile for iOS[https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675] or Android[https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub]. [###24x24:true###][Tracking image][https://github.com/notifications/beacon/ABDLDWG25E7K4G624O5F6MDUD45VPA5CNFSM5EUGIZMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOG5C6B2Y.gif]
|
With prometheus, normally you'd set the For Vector: yes, if you've captured the whole endpoint URL in a tag, then using VRL to extract just the part that you want to appear in the In some cases, you want the instance label to be different to the target endpoint (e.g. you want a meaningful name in For Vector, since the endpoint is a URL, there is the "fragment" part of the URL (the bit after |
👍 #9330 should satisfy both requirements mentioned in #9322 (comment) . Namely it adds an |
…instance and endpoint (#9330) * enhancement(prometheus_scrape source): Tag metrics with instance and endpoint This adds: * instance_tag to the `prometheus_scrape` source. If set, it will tag scraped metrics with their instance in the same fashion as the prometheus scraper. * endpoint_tag which optionally adds the scraped endpoint as a tag per discussion in Adding an "instance" tag for the prometheus scraper #9322 (comment) * an honor_labels config option which mimics Prometheus's https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config to rename conflicting tags if the scraped source already has them. This is useful if Vector is scraping metrics from another scraper that has already added these tags (e.g. another Vector). I opted to let users opt into these new tags as this source has been around for a while and I could see the extra tags causing some users some surprise as Prometheus, when scraping, if the metric already has `instance`, will avoid resetting it based on its `honor_labels` configuration. Ref: https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series Closes: #9322 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
…instance and endpoint (#9330) * enhancement(prometheus_scrape source): Tag metrics with instance and endpoint This adds: * instance_tag to the `prometheus_scrape` source. If set, it will tag scraped metrics with their instance in the same fashion as the prometheus scraper. * endpoint_tag which optionally adds the scraped endpoint as a tag per discussion in Adding an "instance" tag for the prometheus scraper #9322 (comment) * an honor_labels config option which mimics Prometheus's https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config to rename conflicting tags if the scraped source already has them. This is useful if Vector is scraping metrics from another scraper that has already added these tags (e.g. another Vector). I opted to let users opt into these new tags as this source has been around for a while and I could see the extra tags causing some users some surprise as Prometheus, when scraping, if the metric already has `instance`, will avoid resetting it based on its `honor_labels` configuration. Ref: https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series Closes: #9322 Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Discussed in #9321
Originally posted by candlerb September 23, 2021
I've been playing with the prometheus scraper, and it's pretty awesome:
Scraping works instantly. The problem is that I can't see how to distinguish the metrics originating from different hosts. e.g.
Which host does each metric belong to??
With a "real" prometheus server, it would add a label
instance="..."
, which defaults to the__address__
which was scraped.What's the standard way in Vector of distinguish the same metric from multiple sources: a tag? A namespace? Does that association already exist for prometheus scrapes, but is hidden in the "console" sink output?
I can see that internal metrics have
tags.host_key
andtags.pid_key
, but I don't see anything equivalent for prometheus scraping.I'm only just getting to grips with the Vector data model, so I apologise if I've missed something obvious.
The text was updated successfully, but these errors were encountered: