-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(new source): add internal metrics source #1953
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fantastic!
From planning: looking for feedback around the internal API before we start collecting metrics throughout Vector. |
@Hoverbear @a-rodin - would you mind reviewing this? This is probably our #1 requested feature from customers and I want to unblock this work. This doesn't mean rush the review 😄 . The only concern I have is around sharing context with metrics. As long as we have a clear path for that then I'm ok getting started without that. We'll just have to manually make sure all of the tags are formatted properly. |
Yes, this is the major downside. As I've started instrumenting more things, the need has become clearer. I still think we should get a basic set up metrics up and running ASAP, but work on a |
74ce4f8
to
4d48d69
Compare
Great PR! Please pay attention to the following items before merging: Files matching
This is an automatically generated QA checklist based on modified files |
Agree. We can still add labels manually, right? I haven’t see any in the examples you’ve added. Also, we should agree on a naming and tagging scheme and then just inventory all of it. We’ll have to do this for the docs anyway and I’m happy to do the heavy lifting on that. Otherwise, this will become a mess very fast. I found the Prometheus naming guidelines interesting, but I’m not sure that fits well with Vector. Notice how they recommend very general naming with the use of tags. Ex: ‘tcp_sent_bytes{component_type=‘sinks.vector’}’. I don’t completely agree with this style for Vector, but I understand the appeal. Graphing all bytes sent over TCP, and then grouping by ‘component_type’, is trivial this way. It would not be as easy with separate names. |
Yes, the examples I've added so far have been more classic statsd-style keys. As I mentioned in #2007, we can and maybe should move some of the segments to tags/labels.
I have a rough proposal here with a big checklist. Once we settle on a naming scheme, we can edit that into a big list of specific metrics and start working our way through it. So let me know what you all think! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking great! I am really happy with how this is turning out :)
|
||
let labels = key | ||
.labels() | ||
.map(|label| (String::from(label.key()), String::from(label.value()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unfortunate, I feel like labels don't really change that much would be nice if we could somehow do some sort of ref counting here or have some sort of pool of labels. Not a priority but just thinking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, I can open another issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the choice to use quotes on the measured metrics deliberate? I think it might be possible to support unquoted metrics if its desirable.
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
04b03c4
to
42c9987
Compare
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a good start. I don't want to bikeshed on the metric names given that we'll address that in a separate follow up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, no blockers.
use metrics::counter; | ||
|
||
#[derive(Debug)] | ||
pub struct PrometheusRequestCompleted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: but maybe consider making this PrometheusRequestCompleted {}
to kinda align with our other structs. I personally find plain structs kinda confusing when reading code.
|
||
fn emit_metrics(&self) { | ||
counter!("tcp_connections_established", 1, | ||
"component_kind" => "sink", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these don't have a type because they can be crated from many different types of sinks and we don't have that metadata here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. This is where we'll need to lean on tracing
metadata.
src/sources/internal_metrics.rs
Outdated
.send_all(futures01::stream::iter_ok(metrics)) | ||
.compat() | ||
.await | ||
.expect("sending??"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can only fail if the topology task gets shutdown while we are sending, so I think we can ignore that here, instead of panicking?
Co-Authored-By: Lucio Franco <luciofranco14@gmail.com>
Signed-off-by: Luke Steensen <luke.steensen@gmail.com>
Closes #1761
Closes #68
Ref #230
This is a spike (i.e. still rough) implementation of a system for collecting internal metrics and running them through a "normal" vector source.
It uses the
metrics
crate as the API for instrumentation, not thetracing
-based system we were using before. This was not my first choice, but I believe it's the best solution for right now, as well as forwards-compatible with things we'll want to do later.A big benefit of
tracing
is its ability to propagate contextual information about where you are in the system. We use this extensively for our logging to tell you what type and the name of the current component without having to manually pass that data around everywhere just for log messages. This same functionality would be very beneficial for naming and tagging metric data, so atracing
based solution is what we started with and what I spent quite a bit of time exploring.Unfortunately, the state of using
tracing
to collect metrics is extremely early right now. What we want to do is definitely possible, but both the exact implementation and the public API are still undergoing a lot of experimentation. This seems like an area we could jump in and help advance the state of the art, but my current read is that we need a metrics system sooner than that work will be ready, even with our help.Therefore, I've opted to go with a more traditional metrics library (
metrics
) as the basis of the system for right now. This will let us get started instrumenting our components much more quickly, but it's also flexible enough that we should be able to slide in atracing
-based implementation behind the same interface down the line.The primary disadvantage of the non-
tracing
system right now is that we don't have a good way to tag metrics with the configured name of the current component. Once atracing
-based implementation solidifies, however, we should be able to get that additional data with no change to the existing instrumentation sites.Still to be done:
internal_metrics
?)Sorry for the wall of text, but this turned out to be significantly more complex than I anticipated. Let me know what you think!