Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(rfcs): Add RFC for Apache HTTP Server metrics source #3519
chore(rfcs): Add RFC for Apache HTTP Server metrics source #3519
Changes from all commits
b7c3c1c
fd12446
186c1d8
473f128
995d81d
d87d29f
e3e9560
39aaa65
1a07528
96a9412
d0922b3
7426b37
7833587
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the big missing piece here is Basic auth - a lot of people still use this as security for mod_status. Prom just adds it to the scrape URL.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I think we could support this in the same way (via the URL), but that it'd be useful to add basic auth options for all HTTP-based client sources. I propose deferring that until after we refactor them as mentioned in "Future Work".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah - missed that. +1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree with this.
I can also see us (at some point) having "aliased" components, that merely defer to some other generic component with a default set of configurations.
In this case, we'd have a
http_scrape
source that is highly configurable, and aapache
alias source, that makes it more discoverable and easier to configure but defers its actual operations to thehttp_scrape
source.Having said that; all of this is irrelevant for this initial source implementation, which I like 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is actually what you described in "Future Work", isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The future work is just to do this internally, but I do like your idea of eventually adding an
http_scrape
source that would be more flexible and allow people to scrape arbitrary endpoints. I imagine it might require more configuration which the "aliases" would simply default.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do this currently. Sources and sinks can wrap other sources and sinks. For example, the
syslog
source wraps thesocket
source.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given this precedence, do we want to create a generic
http_scrape
source and theapache
source based on this RFC, or do we want to start with the latter, and transition it to the former as soon as we need another source based on HTTP scraping?I assume we'll just start with the singular
apache
source, but want to make sure we're on the same page.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'd like to defer the
http_scrape
source as I think it'll require more thought and discussion about how to parse the incoming requests (formats? codecs? etc.).I guess I figured I'd start with this one and the nginx one (#3091) for now and refactor them internally to share a component (along with the
prometheus
source).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting, we'll want
apache_metrics
for clarity.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually think that calling it
apache_metrics
rather than justapache
, while making it clearer that the source ingests metrics, also makes it stand out against the other sources which are not suffixed with their data type. For example, we havestatsd
rather thanstatsd_metrics
anddocker
rather thandocker_logs
. One could argue thatstatsd
is less ambiguously metrics, butdocker
, in my mind, could be ambiguous: it could be either docker metrics or docker logs (or both).This is actually standing out to me as something where the precedent will stick around for a while so it's probably worth having some explicit guidance for source naming. Sources could be one of, more than one of, or all of metrics, logs, and traces. It seems like the vision of
vector
is to treat all observability data (logs, metrics, and traces) as first-class and so it seems like we should reflect that in the component naming and terminology.I actually kinda like Datadog's model where they just have one
apache
integration that ingests both metrics and logs from Apache. They configure it via a separatelog
section: https://github.com/DataDog/integrations-core/blob/master/apache/datadog_checks/apache/data/conf.yaml.example#L326-L347 . Given, Datadag started with metrics and moved to logs, which I think is reflected in the fact that logs is a separate subsection rather than metrics being so. For us, we might want to have metrics/log/trace configuration at the same level. For users forwarding to datadog, they may want to have similar functionality as what dd-agent currently supports.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.
Then we should rename these sources to
statsd_metrics
anddocker_logs
or change our model.It's a good point. From a UX perspective, drawing lines around components based on types makes it easier to build pipelines. For example, what happens if you connected a multi-type
docker
source toregex_parser
that only operates on logs? Setting aside the fact that we can't technically do this, I think it complicates the UX. I am very happy to discuss this change in a separate issue, and then an RFC if we deem it worthy. Would also like other people's thoughts on the matter.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been going back and forth on this all afternoon. On one hand, having a single source name, docker/apache/etc, references that single point of observability. But I also started thinking "are there circumstances in which I'd only collect metrics from a service and I'd find
apache
on its own confusing?" Also, I wonder whether having differently named sources might make pipeline configurations more clear?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main concern for me. Combined sources may seem convenient, but how often do you want to route logs and metrics through the same pipeline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good question. I note that we do have a one source (
vector
), a few sinks (vector
,console
,blackhole
), and a few transforms (filter
,lua
remap
(soon) that can handle both logs and metrics, but I can imagine it would be common to need to transform them differently.I realize this is going off on quite a tangent; we could break this discussion off to a separate issue. I think it is important though.
I currently see 3 approaches:
apache_metrics
and move towards renaming all sources / sinks that only handle a single type to have a suffix (likesematext_logs
). If/when we addtrace
as another type, we'd add things likesematex_traces
.I find the current state to be a little confusing. Some sources/sinks integrations can handle multiple metric types (
vector
,console
) and other ones are split into_metrics
and_logs
(likedatadog_logs
/datadog_metrics
). The clearest rule I can pull out for deciding is "can the integration handle pulling or pushing both logs and metrics with the same configuration? If yes, no suffix, if no, two components with_logs
and_metrics
". On issue with that rule is that downstream support for observaliblity types can change, making it difficult to predict if they will addmetrics
orlogs
support in the future.I'd personally would appreciate some guidance laid out for component naming to avoid this question cropping up in the future.
As a thought experiment, maybe it is useful to imagining we add metrics support for clickhouse (#3435). Where would that go? A new component (
clickhouse_metrics
)? If so, would we renameclickhouse
toclickhouse_logs
? Or would we add it as part of the existingclickhouse
sink?Expanding on approach 2:
One way we could approach that with a single source, is allow transforms and sinks to ask for only certain datatypes with something like:
Where it would only receive logs from the source and not metrics (or traces). If there was no suffix, it would simply receive both. In the case of a transform that only works with a subset of "observability types" we could error at start-up if there is an invalid pipeline.
Another way to limit the event types if you only want certain types from the source would be something like:
Perhaps with syntax sugar for one type:
I think there is something attractive about one component to observe one "integration" (like apache, nginx, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we should have a stronger convention here and this is a good start. Right now, we handle multi-type sources by automatically inserting a type filter where necessary in front of downstream components. We should make sure that doesn't lead to confusing scenarios if we start using it more widely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, I wasn't aware of that. I could see that be a bit confusing if it is implicit.
As for the convention, I'm ok with starting with that one, but it does mean we'll need to deprecate sources/sinks occasionally. I'm imagining, had vector existed a few years ago, before Datadog had support for logs, we may have just called the sink
datadog
and then had to deprecated it later in-lieu ofdatadog_metrics
whendatadog_logs
was added.I'm ok with just calling this
apache_metrics
for now since, prompted by this discussion, I've observed we already have a mix of suffixed and unsuffixed sinks and sources so this doesn't seem to make the situation noticeably worse and my concern about precedent is unfounded. I can create a new issue to come up with a convention for component naming. I'm still curious to hear thoughts on separate vs. integrated sources/sinks as well.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 it's worth a discussion. I'm not leaning strongly in either direction, but as you can see there are scenarios we should discuss before making a decision. I like the simplicity of single, multi-type components from a documentation standpoint, but I want to make sure it doesn't over complicate the actual UX.