New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(remap): added rate limit parameter to log function. #6622
Conversation
This expands the support for independently rate limiting based on component_name in span keys to additionaly keys useful for VRL rate limiting. I had originally wanted to have the trace line specify these keys, like: ``` info!("hello", internal_log_rate_secs = 5, internal_log_rate_group_by = ["vrl_line_number"]) ``` But tracing doesn't permit lists to be passed through (`tracing::Value` has no `Vec` or similar implementations: https://docs.rs/tracing/0.1.25/tracing/trait.Value.html). So, instead, I just added it as another globally grouped key for now. We can see how this evolves if we need to additional keys, it might be worth revisiting how this is structured. Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Taking a look at this now, it seems like the span isn't being associated for some reason 🤔
I would have expected to see the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the compile_with_span
solution, but it gets us where we need to be for now, so we can revisit that solution later 👍
I realize this isn't working properly yet, but happy to see this land once it does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good start to me!
My suggestion would be: Instead of adding a compile_with_span
method, have whatever calls execute
on the function create the span. This would ensure that any other tracing calls appearing in VRL will have the same span context. For the topology span (transform{component_kind="transform" component_name=remap component_type=remap}
for example), we actually just use error_span!
for everything, I believe to make sure the span is not filtered out by a lower verbosity level.
I know you mentioned it might not be possible to annotate the line number. Is that the case? I think the line number would be much more useful than the position.
lib/vrl/stdlib/src/log.rs
Outdated
_ => info!("{}", value), | ||
_ => { | ||
let span = info_span!("remap", vrl_position = &self.span.start()); | ||
let _ = span.enter(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you know what, I forgot that let _ =
doesn't behave the same as let _foo =
. _
causes it to just ignore the binding. We need it to bind to drop later, after the info!()
call. You could write this as:
let _enter = span.enter();
This seems to work for me.
If you prefer, there is also this closure syntax you can use:
span!(Level::TRACE, "inner_child").in_scope(|| {
// and so on...
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let _ =
doesn't behave the same aslet _foo =
.
Ah, wasn't aware of that. Thanks. Yes that fixes it.
I'm not keen on it either. It's just there for expediency at the moment. I think ultimately I need to add |
👍 I think my suggestion would be for none of the functions to be aware of it and instead inject the span at a level above that. |
I like this idea a lot! I presume the performance hit is negligible? |
That would be my assumption at least 😄 I feel like tracing usually likes to be minimally invasive, but I could definitely be wrong in this particular case. I'll run a quick bench to check. I think another advantage here is that any other events ( |
Ran a quick benchmark: use criterion::{black_box, criterion_group, criterion_main, Criterion};
use tracing::{span, Level};
#[inline]
fn fibonacci(n: u64) -> u64 {
match n {
0 => 1,
1 => 1,
n => fibonacci(n - 1) + fibonacci(n - 2),
}
}
fn benchmark(c: &mut Criterion) {
let mut group = c.benchmark_group("tracing");
group.bench_function("no_span", |b| {
b.iter(|| fibonacci(black_box(30)));
});
group.bench_function("span", |b| {
b.iter(|| {
let span = span!(Level::ERROR, "span");
let _enter = span.enter();
fibonacci(black_box(30))
})
});
}
criterion_group!(benches, benchmark);
criterion_main!(benches); The overhead seems to be minimal:
The distribution is not quite so narrow, but still seems fine to me. |
Yeah have got this down now and it works really well, is much cleaner to implement and has all those other advantages! Thanks for the suggestion. |
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
While reviewing #6622 I was going to suggest that users could use 0 there if they wanted no rate limiting; however, when I tried that, I realized it actually hits the panic on line 35 :D This updates the rate limiting logic to allow 0 to disable rate limiting. If any other invalid value is provided (for example -1 or "foo") then it is treated as 0. This seems like acceptable behavior to me over panicking. Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! This looks good!
name: "rate_limit_secs" | ||
description: "Specifies that the log message is output no more than once per the given number of seconds." | ||
required: false | ||
default: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 it might be worth calling out that they can use 0
here to disable the rate limit after #6629 is merged.
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
* chore(observability): Allow internal_rate_limit_secs to be 0 While reviewing #6622 I was going to suggest that users could use 0 there if they wanted no rate limiting; however, when I tried that, I realized it actually hits the panic on line 35 :D This updates the rate limiting logic to allow 0 to disable rate limiting. If any other invalid value is provided (for example -1 or "foo") then it is treated as 0. This seems like acceptable behavior to me over panicking. Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@@ -21,6 +21,11 @@ impl Function for Log { | |||
kind: kind::BYTES, | |||
required: false, | |||
}, | |||
Parameter { | |||
keyword: "rate_limit_secs", | |||
kind: kind::INTEGER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit late on the review, but maybe one follow-up here: I can imagine users wanting to specify rate limits such as 0.1s
or 1.5s
.
Does internal_log_rate_secs
not support floats, or is there any other reason why fractions of a second shouldn't be allowed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The internal rate limiting logic only handles seconds right now. Interestingly it looks like tracing doesn't have support for passing floats around (https://docs.rs/tracing/0.1.25/tracing/trait.Value.html) so we'd need to adjust it to be something like internal_rate_limit_ms
to allow finer granularity. Here it could still be represented as a float though and just marshalled (whenever ms support would be added).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #6643 to track this one.
Closes #6307
This adds a rate_limit_secs parameter to the remap log function, defaulting to 1 second.
This changes VRL to wrap each function call in:
thus ensuring that each tracing call made in a VRL function is keyed on the position of the function call in the remap source.
Note, it is keyed on position rather than line number. This is perhaps not as ergonomic to the user should that get reported in the logs. However, it is probably theoretically more correct as it is technically possible to have more than one log message on the same line:
So, really it would need both pieces of information. There need to be some enhancements to the parser
Span
struct to enable it to hold the line number before this can be done.Currently pending #6615
Signed-off-by: Stephen Wakely stephen.wakely@datadoghq.com