Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(rfcs): Add RFC for Apache HTTP Server metrics source #3519

Merged
merged 13 commits into from
Aug 28, 2020

Conversation

jszwedko
Copy link
Member

@jszwedko jszwedko commented Aug 21, 2020

#3092

Signed-off-by: Jesse Szwedko jesse@szwedko.me

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Comment on lines 206 to 207
- Do we want to apply any metric labels based on the other information
available via the status page? I could see labeling the `url` at least
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But I assume #2660 would solve this for both logs and metrics, so it's probably blocked on that. Agree?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that is just for internal metrics, yes? It looks like the prometheus source adds labels (looks like we call them tags):

https://github.com/timberio/vector/blob/61e806d01d4cc6d2a527b52aa9388d4547f1ebc2/src/sources/prometheus/parser.rs#L43-L51

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I do think many of these event attributes would be better represented as trace context, but we can defer that for a later discussion.

Comment on lines 209 to 210
- Are there preferences between `apache` or `httpd` for the nomenclature? I
feel like `apache` is more well-known though `httpd` is more accurate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer httpd or apache_httpd? Apache feels a little generic since I think of the entire foundation. If we go with the apache_ prefix we should think about apache_kafka and so on :/.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had the same angst about using apache, but it does match how the other two collectors I looked at, datadog and telegraf, do it so people may be more familiar with that keyword as the component name if they are coming from those tools.

I'm open to either of those options though. I guess I prefer httpd over apache_httpd given that kafka is just kafka and not apache_kafka.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
later if this does not end up being the case and they really are all the same.

One downside of this is that I think it'd be less discoverable than a
first-class source for each type of endpoint we support scraping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree with this.

I can also see us (at some point) having "aliased" components, that merely defer to some other generic component with a default set of configurations.

In this case, we'd have a http_scrape source that is highly configurable, and a apache alias source, that makes it more discoverable and easier to configure but defers its actual operations to the http_scrape source.

Having said that; all of this is irrelevant for this initial source implementation, which I like 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is actually what you described in "Future Work", isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The future work is just to do this internally, but I do like your idea of eventually adding an http_scrape source that would be more flexible and allow people to scrape arbitrary endpoints. I imagine it might require more configuration which the "aliases" would simply default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can also see us (at some point) having "aliased" components, that merely defer to some other generic component with a default set of configurations.

We do this currently. Sources and sinks can wrap other sources and sinks. For example, the syslog source wraps the socket source.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this precedence, do we want to create a generic http_scrape source and the apache source based on this RFC, or do we want to start with the latter, and transition it to the former as soon as we need another source based on HTTP scraping?

I assume we'll just start with the singular apache source, but want to make sure we're on the same page.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd like to defer the http_scrape source as I think it'll require more thought and discussion about how to parse the incoming requests (formats? codecs? etc.).

I guess I figured I'd start with this one and the nginx one (#3091) for now and refactor them internally to share a component (along with the prometheus source).

rfcs/2020-08-21-3092-apache-metrics-source.md Outdated Show resolved Hide resolved
rfcs/2020-08-21-3092-apache-metrics-source.md Outdated Show resolved Hide resolved
rfcs/2020-08-21-3092-apache-metrics-source.md Outdated Show resolved Hide resolved
later if this does not end up being the case and they really are all the same.

One downside of this is that I think it'd be less discoverable than a
first-class source for each type of endpoint we support scraping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is actually what you described in "Future Work", isn't it?

- `response_timeout`; to cap request lengths
- `tls`: settings to allow setting specific chains of trust and client certs

But I chose to leave those out for now given the prometheous source doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the big missing piece here is Basic auth - a lot of people still use this as security for mod_status. Prom just adds it to the scrape URL.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I think we could support this in the same way (via the URL), but that it'd be useful to add basic auth options for all HTTP-based client sources. I propose deferring that until after we refactor them as mentioned in "Future Work".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - missed that. +1.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@jszwedko
Copy link
Member Author

jszwedko commented Aug 25, 2020

@jamtur01 @JeanMertz @binarylogic

I believe I've addressed all of the comments here. Given you all three have commented extensively, I figured you'd be the best to target for formal RFC approval reviews (step 3) so I rerequested reviews from you three 😄

Copy link
Contributor

@jamtur01 jamtur01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Answer: yes, the config has been updated to allow multiple endpoints.
- Are there preferences between `apache` or `httpd` for the nomenclature? I
feel like `apache` is more well-known though `httpd` is more accurate. Answer:
standardize on `apache`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting, we'll want apache_metrics for clarity.

Copy link
Member Author

@jszwedko jszwedko Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think that calling it apache_metrics rather than just apache, while making it clearer that the source ingests metrics, also makes it stand out against the other sources which are not suffixed with their data type. For example, we have statsd rather than statsd_metrics and docker rather than docker_logs. One could argue that statsd is less ambiguously metrics, but docker, in my mind, could be ambiguous: it could be either docker metrics or docker logs (or both).

This is actually standing out to me as something where the precedent will stick around for a while so it's probably worth having some explicit guidance for source naming. Sources could be one of, more than one of, or all of metrics, logs, and traces. It seems like the vision of vector is to treat all observability data (logs, metrics, and traces) as first-class and so it seems like we should reflect that in the component naming and terminology.

I actually kinda like Datadog's model where they just have one apache integration that ingests both metrics and logs from Apache. They configure it via a separate log section: https://github.com/DataDog/integrations-core/blob/master/apache/datadog_checks/apache/data/conf.yaml.example#L326-L347 . Given, Datadag started with metrics and moved to logs, which I think is reflected in the fact that logs is a separate subsection rather than metrics being so. For us, we might want to have metrics/log/trace configuration at the same level. For users forwarding to datadog, they may want to have similar functionality as what dd-agent currently supports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points.

or example, we have statsd rather than statsd_metrics and docker rather than docker_logs

Then we should rename these sources to statsd_metrics and docker_logs or change our model.

Sources could be one of, more than one of, or all of metrics logs and traces. It seems like the vision of vector is to treat all observability data (logs, metrics, and traces) as first-class and so it seems like we should reflect that in the component naming and terminology.

It's a good point. From a UX perspective, drawing lines around components based on types makes it easier to build pipelines. For example, what happens if you connected a multi-type docker source to regex_parser that only operates on logs? Setting aside the fact that we can't technically do this, I think it complicates the UX. I am very happy to discuss this change in a separate issue, and then an RFC if we deem it worthy. Would also like other people's thoughts on the matter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been going back and forth on this all afternoon. On one hand, having a single source name, docker/apache/etc, references that single point of observability. But I also started thinking "are there circumstances in which I'd only collect metrics from a service and I'd find apache on its own confusing?" Also, I wonder whether having differently named sources might make pipeline configurations more clear?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder whether having differently named sources might make pipeline configurations more clear?

This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?

Copy link
Member Author

@jszwedko jszwedko Aug 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?

It is a good question. I note that we do have a one source (vector), a few sinks (vector, console, blackhole), and a few transforms (filter, lua remap (soon) that can handle both logs and metrics, but I can imagine it would be common to need to transform them differently.

I realize this is going off on quite a tangent; we could break this discussion off to a separate issue. I think it is important though.

I currently see 3 approaches:

  1. Call this apache_metrics and move towards renaming all sources / sinks that only handle a single type to have a suffix (like sematext_logs). If/when we add trace as another type, we'd add things like sematex_traces.
  2. Move towards one component for each "integration type' for sinks/sources and do what I describe below to control the flow of event types through the pipeline when needed.
  3. Keep doing what we are doing which seems like a mix of strategies with respect to a type handling; defining some sort of "test" to determine when the type suffix should be applied.

I find the current state to be a little confusing. Some sources/sinks integrations can handle multiple metric types (vector, console) and other ones are split into _metrics and _logs (like datadog_logs/datadog_metrics). The clearest rule I can pull out for deciding is "can the integration handle pulling or pushing both logs and metrics with the same configuration? If yes, no suffix, if no, two components with _logs and _metrics". On issue with that rule is that downstream support for observaliblity types can change, making it difficult to predict if they will add metrics or logs support in the future.

I'd personally would appreciate some guidance laid out for component naming to avoid this question cropping up in the future.

As a thought experiment, maybe it is useful to imagining we add metrics support for clickhouse (#3435). Where would that go? A new component (clickhouse_metrics)? If so, would we rename clickhouse to clickhouse_logs? Or would we add it as part of the existing clickhouse sink?

Expanding on approach 2:

One way we could approach that with a single source, is allow transforms and sinks to ask for only certain datatypes with something like:

[transforms.regex]
type = "regex_parser"
inputs = ["mysource.logs"]

Where it would only receive logs from the source and not metrics (or traces). If there was no suffix, it would simply receive both. In the case of a transform that only works with a subset of "observability types" we could error at start-up if there is an invalid pipeline.

Another way to limit the event types if you only want certain types from the source would be something like:

[sources.apache]
type = "apache"
event_types = ["log", "metric"]

Perhaps with syntax sugar for one type:

[sources.apache]
type = "apache.logs"

I think there is something attractive about one component to observe one "integration" (like apache, nginx, etc.).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clearest rule I can pull out for deciding is "can the integration handle pulling or pushing both logs and metrics with the same configuration?

I agree we should have a stronger convention here and this is a good start. Right now, we handle multi-type sources by automatically inserting a type filter where necessary in front of downstream components. We should make sure that doesn't lead to confusing scenarios if we start using it more widely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, we handle multi-type sources by automatically inserting a type filter where necessary in front of downstream components.

Interesting, I wasn't aware of that. I could see that be a bit confusing if it is implicit.

As for the convention, I'm ok with starting with that one, but it does mean we'll need to deprecate sources/sinks occasionally. I'm imagining, had vector existed a few years ago, before Datadog had support for logs, we may have just called the sink datadog and then had to deprecated it later in-lieu of datadog_metrics when datadog_logs was added.

I'm ok with just calling this apache_metrics for now since, prompted by this discussion, I've observed we already have a mix of suffixed and unsuffixed sinks and sources so this doesn't seem to make the situation noticeably worse and my concern about precedent is unfounded. I can create a new issue to come up with a convention for component naming. I'm still curious to hear thoughts on separate vs. integrated sources/sinks as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can create a new issue to come up with a convention for component naming. I'm still curious to hear thoughts on separate vs. integrated sources/sinks as well.

👍 it's worth a discussion. I'm not leaning strongly in either direction, but as you can see there are scenarios we should discuss before making a decision. I like the simplicity of single, multi-type components from a documentation standpoint, but I want to make sure it doesn't over complicate the actual UX.

Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the apache_metrics name, everything looks good.

Answer: yes, the config has been updated to allow multiple endpoints.
- Are there preferences between `apache` or `httpd` for the nomenclature? I
feel like `apache` is more well-known though `httpd` is more accurate. Answer:
standardize on `apache`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder whether having differently named sources might make pipeline configurations more clear?

This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?

rfcs/2020-08-21-3092-apache-metrics-source.md Outdated Show resolved Hide resolved
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@binarylogic
Copy link
Contributor

Before we merge this, I think it's worth, one last time, considering the Prometheus style naming (#3581 (comment)) since this will be very difficult to change later. Calling this "Prometheus style" is a little misleading since it's just good metric naming practices. From what I can tell there is nothing Prometheus specific about it.

@binarylogic
Copy link
Contributor

binarylogic commented Aug 27, 2020

I should have included my opinon 😄 . I think we should follow the Prometheus style for a number of reasons:

  1. Prometheus found value in this naming scheme, so much that they updated their exporters to follow it. Clearly, this was driven by the pain of not know what metrics represent.
  2. James makes a good point here. That's just a nice side benefit.
  3. The coordination across Vector will make it much nicer to use. In the same way that the coordination across Prometheus' exporters makes it nicer.

@jszwedko
Copy link
Member Author

jszwedko commented Aug 27, 2020

James makes a good point here. That's just a nice side benefit.

Notably, he is matching the node exporter naming, but it looks like Telegraf actually exposes apache metrics with a different style (which I assume most people pushing metrics into prometheus would be using):

➜  telegraf curl -s localhost:9273/metrics | grep apache | grep -v '#'
apache_BusyWorkers{host="a-8cktst39ij0a",port="8080",server="localhost"} 1
apache_BytesPerReq{host="a-8cktst39ij0a",port="8080",server="localhost"} 1024
apache_BytesPerSec{host="a-8cktst39ij0a",port="8080",server="localhost"} 83.5918
apache_CPUChildrenSystem{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_CPUChildrenUser{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_CPULoad{host="a-8cktst39ij0a",port="8080",server="localhost"} 0.408163
apache_CPUSystem{host="a-8cktst39ij0a",port="8080",server="localhost"} 0.02
apache_CPUUser{host="a-8cktst39ij0a",port="8080",server="localhost"} 0.18
apache_ConnsAsyncClosing{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_ConnsAsyncKeepAlive{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_ConnsAsyncWriting{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_ConnsTotal{host="a-8cktst39ij0a",port="8080",server="localhost"} 2
apache_DurationPerReq{host="a-8cktst39ij0a",port="8080",server="localhost"} 0.25
apache_IdleWorkers{host="a-8cktst39ij0a",port="8080",server="localhost"} 74
apache_Load1{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_Load15{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_Load5{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_ParentServerConfigGeneration{host="a-8cktst39ij0a",port="8080",server="localhost"} 1
apache_ParentServerMPMGeneration{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_Processes{host="a-8cktst39ij0a",port="8080",server="localhost"} 3
apache_ReqPerSec{host="a-8cktst39ij0a",port="8080",server="localhost"} 0.0816327
apache_ServerUptimeSeconds{host="a-8cktst39ij0a",port="8080",server="localhost"} 49
apache_Stopping{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_TotalAccesses{host="a-8cktst39ij0a",port="8080",server="localhost"} 4
apache_TotalDuration{host="a-8cktst39ij0a",port="8080",server="localhost"} 1
apache_TotalkBytes{host="a-8cktst39ij0a",port="8080",server="localhost"} 4
apache_Uptime{host="a-8cktst39ij0a",port="8080",server="localhost"} 49
apache_scboard_closing{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_dnslookup{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_finishing{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_idle_cleanup{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_keepalive{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_logging{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_open{host="a-8cktst39ij0a",port="8080",server="localhost"} 325
apache_scboard_reading{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_sending{host="a-8cktst39ij0a",port="8080",server="localhost"} 1
apache_scboard_starting{host="a-8cktst39ij0a",port="8080",server="localhost"} 0
apache_scboard_waiting{host="a-8cktst39ij0a",port="8080",server="localhost"} 7

So to get the benefit of users being able to use preconfigured dashboards, we'd have to match that.

@binarylogic
Copy link
Contributor

I know this looks a lot like bikeshedding, but I think it has much broader implications than we think, especially as we set precedence, so it is worth discussing and obtaining consensus.

So to get the benefit of users being able to use preconfigured dashboards, we'd have to match that.

Sure. I don't think we should try to solve this at the source-level though. We should shift this concern to the sink-level. Our logging components have the same problem, it's why I opened https://github.com/timberio/vector-product/issues/34. In my opinion, this is more about the style we want to adopt.

which I assume most people pushing metrics into prometheus would be using

What makes you think that instead of the Promtheus apache_exporter?


I still lean towards Prometheus' style. It's cleaner, semantically better, and proven by a large community.

@jszwedko
Copy link
Member Author

jszwedko commented Aug 27, 2020

I know this looks a lot like bikeshedding, but I think it has much broader implications than we think, especially as we set precedence, so it is worth discussing and obtaining consensus.

I agree, what we do here will set a precedent. I appreciate you pushing on this as I do think it is important to attempt to get right to avoid an annoying shift later.

I had missed https://github.com/Lusitaniae/apache_exporter ; I guess that is semiofficial since it is the only one listed on https://prometheus.io/docs/instrumenting/exporters/ . If I were picking, I'd probably have gone with telegraf though due to broader support.

I still lean towards Prometheus' style. It's cleaner, semantically better, and proven by a large community.

Do want to exactly match the interface of https://github.com/Lusitaniae/apache_exporter ? Or just the general naming conventions? If we do wanto to match, a corollary question: do you want to try to match the prometheus exporters (if one exists) for all metric ingestion in vector? It would let people reuse dashboards built from them. The flipside is that expect many people are using telegraf which has a different naming format and may have built dashboards around that.

I do actually like that they rely on labels more heavily; I think I'd like to adopt that pattern regardless.

Example export from apache_exporter:

apache_accesses_total 901
apache_connections{state="closing"} 0
apache_connections{state="keepalive"} 0
apache_connections{state="total"} 2
apache_connections{state="writing"} 0
apache_cpuload 0.0132664
apache_duration_total 229
apache_scoreboard{state="closing"} 0
apache_scoreboard{state="dns"} 0
apache_scoreboard{state="graceful_stop"} 0
apache_scoreboard{state="idle"} 74
apache_scoreboard{state="idle_cleanup"} 0
apache_scoreboard{state="keepalive"} 0
apache_scoreboard{state="logging"} 0
apache_scoreboard{state="open_slot"} 325
apache_scoreboard{state="read"} 0
apache_scoreboard{state="reply"} 1
apache_scoreboard{state="startup"} 0
apache_sent_kilobytes_total 1037
apache_up 1
apache_uptime_seconds_total 8970
apache_version{version="Apache/2.4.46 (Unix)"} 2.4
apache_workers{state="busy"} 1
apache_workers{state="idle"} 74

Some notes on theirs:

  • apache_duration_total 229; I think that is missing a unit (probably seconds though I need to verify)
  • They don't report the CPU user, CPU system, CPU children user, CPU children system, requests/s, bytes/s, bytes/req, # processes, # stopping, metrics though we certainly could in a backwards compatible manner
  • apache_connections{state="total"} 2 doesn't seem right; I think you would want to just sum the other connection states
  • I'd originally planned to not report system load as part of this (instead let the upcoming New host_metrics RFC #3191 do it)

I guess my inclination would be to adopt prometheus naming conventions, but not necessarily exactly match existing prometheus exporters unless they are "Official" exporters or we find that we particularly like the existing one as the non-official ones seem to have varying levels of support.

@binarylogic
Copy link
Contributor

I guess my inclination would be to adopt prometheus naming conventions, but not necessarily exactly match existing prometheus exporters unless they are "Official" exporters or we find that we particularly like the existing one as the non-official ones seem to have varying levels of support.

👍 Agree. Again, we're going to have to solve mapping at some point. Given that everyone is different, we should focus on building our own quality catalog with consistency and provide migration ramps later. The ultimate fallback is #3463, where users can remap the metrics right in the source to their own liking.

@jszwedko
Copy link
Member Author

👍 I'll update this RFC with prometheus-style naming and ping for another round of 👀

And prefer labels over separate metrics for related metrics.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@jszwedko
Copy link
Member Author

jszwedko commented Aug 27, 2020

I pushed 7426b37 which switches this to use the prometheus metric naming conventions. Other differences are:

  • I corrected how "namespace" is used for metrics; rather than a . delimiter, it uses _ which matches what prometheus does with it.
  • I folded the related metrics into individual metrics with labels
  • I dropped processes and stopping as I couldn't quite figure out if processes included stopping or not. Neither of the other exporters (https://github.com/Lusitaniae/apache_exporter and https://github.com/influxdata/telegraf/tree/release-1.15/plugins/inputs/apache) seemed to include them either so they probably aren't terribly useful. We can add them later if someone asks (and maybe we can have them explain what they mean 😄 )
  • I dropped the metrics which can be calculated from other metrics. Specifically: requests_per_second, bytes_per_second, bytes_per_request, duration_per_request
  • Added apache_up which matches other prometheus exporters

This did raise the question for me: how this will play with the namespace configured in the prometheus and other metrics sinks? I think it would be preferable to not prefix an additional namespace, but I don't think users could avoid it currently as it is a required configuration field. I think we may want a follow up issue to make namespace on the prometheus and other metrics sinks, optional. @binarylogic thoughts there?

@jamtur01 @binarylogic @lukesteensen (and anyone else!) do you mind giving 7426b37 a look?

@binarylogic
Copy link
Contributor

Nice work

I think it would be preferable to not prefix an additional namespace, but I don't think users could avoid it currently as it is a required configuration field. I think we may want a follow up issue to make namespace on the prometheus and other metrics sinks, optional.

Agree. #3609


I figure we probably don't want metrics for:

- Load (should be handled by a `cpu` or similar metrics source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have this metric listed above. Do you still want to keep it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That one is actually different, I believe it is just the CPU load of the httpd process as opposed to the entire system load (which is also available via mod_status). I think the general system load would be better handled by the host_metrics source.

I wish they had better docs for mod_status, but that was my interpretation based on experiments and looking at https://github.com/apache/httpd/blob/0a172a81ec8b9d280349da1dec455a2dba287ace/modules/generators/mod_status.c#L476-L508

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct from memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(perhaps rename this to System Load to make this less ambiguous)

Copy link
Contributor

@binarylogic binarylogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work. This will help set precedence for future integrations.

Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work @jszwedko 👍

I'm also happy to see some other discussions this triggered, such as consistent naming of metrics and components.


I figure we probably don't want metrics for:

- Load (should be handled by a `cpu` or similar metrics source)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(perhaps rename this to System Load to make this less ambiguous)

later if this does not end up being the case and they really are all the same.

One downside of this is that I think it'd be less discoverable than a
first-class source for each type of endpoint we support scraping.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this precedence, do we want to create a generic http_scrape source and the apache source based on this RFC, or do we want to start with the latter, and transition it to the former as soon as we need another source based on HTTP scraping?

I assume we'll just start with the singular apache source, but want to make sure we're on the same page.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@jszwedko jszwedko merged commit b730196 into master Aug 28, 2020
@jszwedko jszwedko deleted the add-apache-metrics-source-rfc branch August 28, 2020 15:46
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
…dev#3519)

* chore(rfcs): Add RFC for Apache HTTP Server metrics source

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants