Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap): added rate limit parameter to log function. #6622

Merged
merged 6 commits into from Mar 4, 2021

Conversation

StephenWakely
Copy link
Contributor

@StephenWakely StephenWakely commented Mar 4, 2021

Closes #6307

This adds a rate_limit_secs parameter to the remap log function, defaulting to 1 second.

This changes VRL to wrap each function call in:

span!(Level::ERROR, "remap", vrl_position = &self.span.start()).in_scope(|| {
           ...
        })

thus ensuring that each tracing call made in a VRL function is keyed on the position of the function call in the remap source.

Note, it is keyed on position rather than line number. This is perhaps not as ergonomic to the user should that get reported in the logs. However, it is probably theoretically more correct as it is technically possible to have more than one log message on the same line:

log(.this); log(.that)

So, really it would need both pieces of information. There need to be some enhancements to the parser Span struct to enable it to hold the line number before this can be done.

Currently pending #6615

Signed-off-by: Stephen Wakely stephen.wakely@datadoghq.com

jszwedko and others added 2 commits March 3, 2021 16:17
This expands the support for independently rate limiting based on
component_name in span keys to additionaly keys useful for VRL rate
limiting.

I had originally wanted to have the trace line specify these keys, like:

```
info!("hello", internal_log_rate_secs = 5, internal_log_rate_group_by = ["vrl_line_number"])
```

But tracing doesn't permit lists to be passed through (`tracing::Value`
has no `Vec` or similar implementations:
https://docs.rs/tracing/0.1.25/tracing/trait.Value.html).

So, instead, I just added it as another globally grouped key for now. We
can see how this evolves if we need to additional keys, it might be
worth revisiting how this is structured.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
@jszwedko
Copy link
Member

jszwedko commented Mar 4, 2021

Taking a look at this now, it seems like the span isn't being associated for some reason 🤔

Mar 04 09:32:14.161  INFO transform{component_kind="transform" component_name=remap component_type=remap}: vrl_stdlib::log: "bar" internal_log_rate_secs=1

I would have expected to see the remap span added here in there too. I'll dig in.

Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the compile_with_span solution, but it gets us where we need to be for now, so we can revisit that solution later 👍

I realize this isn't working properly yet, but happy to see this land once it does.

Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good start to me!

My suggestion would be: Instead of adding a compile_with_span method, have whatever calls execute on the function create the span. This would ensure that any other tracing calls appearing in VRL will have the same span context. For the topology span (transform{component_kind="transform" component_name=remap component_type=remap} for example), we actually just use error_span! for everything, I believe to make sure the span is not filtered out by a lower verbosity level.

I know you mentioned it might not be possible to annotate the line number. Is that the case? I think the line number would be much more useful than the position.

_ => info!("{}", value),
_ => {
let span = info_span!("remap", vrl_position = &self.span.start());
let _ = span.enter();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you know what, I forgot that let _ = doesn't behave the same as let _foo = . _ causes it to just ignore the binding. We need it to bind to drop later, after the info!() call. You could write this as:

let _enter = span.enter();

This seems to work for me.

If you prefer, there is also this closure syntax you can use:

span!(Level::TRACE, "inner_child").in_scope(|| {
            // and so on...
        });

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let _ = doesn't behave the same as let _foo = .

Ah, wasn't aware of that. Thanks. Yes that fixes it.

@StephenWakely
Copy link
Contributor Author

I'm not sure about the compile_with_span solution

I'm not keen on it either. It's just there for expediency at the moment. I think ultimately I need to add Span as a parameter to compile.. although that's not ideal either as very few functions will use it. Perhaps another struct that contains the Span and the ParameterList would be the tidiest?

@jszwedko
Copy link
Member

jszwedko commented Mar 4, 2021

I'm not sure about the compile_with_span solution

I'm not keen on it either. It's just there for expediency at the moment. I think ultimately I need to add Span as a parameter to compile.. although that's not ideal either as very few functions will use it. Perhaps another struct that contains the Span and the ParameterList would be the tidiest?

👍 I think my suggestion would be for none of the functions to be aware of it and instead inject the span at a level above that.

@StephenWakely
Copy link
Contributor Author

I think my suggestion would be for none of the functions to be aware of it and instead inject the span at a level above that.

I like this idea a lot! I presume the performance hit is negligible?

@jszwedko
Copy link
Member

jszwedko commented Mar 4, 2021

I think my suggestion would be for none of the functions to be aware of it and instead inject the span at a level above that.

I like this idea a lot! I presume the performance hit is negligible?

That would be my assumption at least 😄 I feel like tracing usually likes to be minimally invasive, but I could definitely be wrong in this particular case. I'll run a quick bench to check. I think another advantage here is that any other events (info!, etc.) will have the span context linking them to a place in the remap source.

@jszwedko
Copy link
Member

jszwedko commented Mar 4, 2021

Ran a quick benchmark:

use criterion::{black_box, criterion_group, criterion_main, Criterion};
use tracing::{span, Level};

#[inline]
fn fibonacci(n: u64) -> u64 {
    match n {
        0 => 1,
        1 => 1,
        n => fibonacci(n - 1) + fibonacci(n - 2),
    }
}

fn benchmark(c: &mut Criterion) {
    let mut group = c.benchmark_group("tracing");
    group.bench_function("no_span", |b| {
        b.iter(|| fibonacci(black_box(30)));
    });
    group.bench_function("span", |b| {
        b.iter(|| {
            let span = span!(Level::ERROR, "span");
            let _enter = span.enter();
            fibonacci(black_box(30))
        })
    });
}
criterion_group!(benches, benchmark);
criterion_main!(benches);

The overhead seems to be minimal:

tracing/no_span         time:   [2.4556 ms 2.4605 ms 2.4657 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
tracing/span            time:   [2.7179 ms 2.8010 ms 2.8890 ms]

Screen Shot 2021-03-04 at 10 59 12 AM

The distribution is not quite so narrow, but still seems fine to me.

@StephenWakely
Copy link
Contributor Author

I think my suggestion would be for none of the functions to be aware of it and instead inject the span at a level above that.

Yeah have got this down now and it works really well, is much cleaner to implement and has all those other advantages! Thanks for the suggestion.

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
@StephenWakely StephenWakely marked this pull request as ready for review March 4, 2021 16:17
@StephenWakely StephenWakely requested review from a team and pablosichert and removed request for a team March 4, 2021 16:17
Base automatically changed from log-rate-limit-extra-span-keys to master March 4, 2021 17:41
Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
jszwedko added a commit that referenced this pull request Mar 4, 2021
While reviewing #6622 I was going
to suggest that users could use 0 there if they wanted no rate
limiting; however, when I tried that, I realized it actually hits
the panic on line 35 :D

This updates the rate limiting logic to allow 0 to disable rate
limiting. If any other invalid value is provided (for example -1 or
"foo") then it is treated as 0. This seems like acceptable behavior to
me over panicking.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks good!

name: "rate_limit_secs"
description: "Specifies that the log message is output no more than once per the given number of seconds."
required: false
default: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 it might be worth calling out that they can use 0 here to disable the rate limit after #6629 is merged.

Signed-off-by: Stephen Wakely <stephen.wakely@datadoghq.com>
jszwedko added a commit that referenced this pull request Mar 4, 2021
* chore(observability): Allow internal_rate_limit_secs to be 0

While reviewing #6622 I was going
to suggest that users could use 0 there if they wanted no rate
limiting; however, when I tried that, I realized it actually hits
the panic on line 35 :D

This updates the rate limiting logic to allow 0 to disable rate
limiting. If any other invalid value is provided (for example -1 or
"foo") then it is treated as 0. This seems like acceptable behavior to
me over panicking.

Signed-off-by: Jesse Szwedko <jesse@szwedko.me>
@StephenWakely StephenWakely merged commit 333ba23 into master Mar 4, 2021
@StephenWakely StephenWakely deleted the vrl-log-rate-limited branch March 4, 2021 23:14
@@ -21,6 +21,11 @@ impl Function for Log {
kind: kind::BYTES,
required: false,
},
Parameter {
keyword: "rate_limit_secs",
kind: kind::INTEGER,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit late on the review, but maybe one follow-up here: I can imagine users wanting to specify rate limits such as 0.1s or 1.5s.

Does internal_log_rate_secs not support floats, or is there any other reason why fractions of a second shouldn't be allowed?

Copy link
Member

@jszwedko jszwedko Mar 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The internal rate limiting logic only handles seconds right now. Interestingly it looks like tracing doesn't have support for passing floats around (https://docs.rs/tracing/0.1.25/tracing/trait.Value.html) so we'd need to adjust it to be something like internal_rate_limit_ms to allow finer granularity. Here it could still be represented as a float though and just marshalled (whenever ms support would be added).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #6643 to track this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default the Remap log function to implement rate-limiting
4 participants