Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(vrl): Support DataDog grok parser #7705

Closed
wants to merge 41 commits into from
Closed

Conversation

vladimir-dd
Copy link
Contributor

@vladimir-dd vladimir-dd commented Jun 2, 2021

This is the initial PR to support DD grok parsing.

Here is an example of grok rules:

access.common %{_client_ip} %{_ident} %{_auth} \[%{_date_access}\] "(?>%{_method} |)%{_url}(?> %{_version}|)" %{_status_code} (?>%{_bytes_written}|-)
access.combined %{access.common} (%{number:duration:scale(1000000000)} )?"%{_referer}" "%{_user_agent}"( "%{_x_forwarded_for}")?.*

You can write parsing rules with the %{MATCHER:EXTRACT:FILTER} syntax:

  • Matcher: A rule (possibly a reference to another token rule) that describes what to expect (number, word, notSpace, etc.)
  • Extract (optional): An identifier representing the capture destination for the piece of text matched by the Matcher.
  • Filter (optional): A post-processor of the match to transform it.

Each rule can reference parsing rules defined above itself in the list. Only one can match any given log. The first one that matches, from top to bottom, is the one that does the parsing. For further documentation and the full list of available matcher and filters check out https://docs.datadoghq.com/logs/processing/parsing

We introduce a new parse_datadog_grok VRL function(hidden by FF). An example of usage:

parsed = parse_datadog_grok(value,  
 [
    'access.common %{_client_ip} %{_ident} %{_auth} \[%{_date_access}\] "(?>%{_method} |)%{_url}(?> %{_version}|)" %{_status_code} (?>%{_bytes_written}|-)',
    'access.combined %{access.common} (%{number:duration:scale(1000000000)} )?"%{_referer}" "%{_user_agent}"( "%{_x_forwarded_for}")?.*'
],
[
    '_auth %{notSpace:http.auth:nullIf("-")}',
    '_bytes_written %{integer:network.bytes_written}',
    '_client_ip %{ipOrHost:network.client.ip}',
    '_version HTTP\/(?<http.version>\d+\.\d+)',
    '_url %{notSpace:http.url}',
    '_ident %{notSpace:http.ident}',
    '_user_agent %{regex("[^\\\"]*"):http.useragent}',
    '_referer %{notSpace:http.referer}',
    '_status_code %{integer:http.status_code}',
    '_method %{word:http.method}',
    '_date_access %{date("dd/MMM/yyyy:HH:mm:ss Z"):date_access}',
    '_x_forwarded_for %{regex("[^\\\"]*"):http._x_forwarded_for:nullIf("-")}'
])

In the follow-up PRs we will add all missing matchers and filters.

Signed-off-by: Vladimir Zhuk vladimir.zhuk@datadoghq.com

Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
@vladimir-dd vladimir-dd marked this pull request as ready for review June 16, 2021 14:06
@vladimir-dd vladimir-dd requested review from a team and removed request for a team June 16, 2021 14:06
@vladimir-dd vladimir-dd changed the title feat(remap): Support DataDog grok parsing function feat(remap): Support DataDog grok parser Jun 16, 2021
@vladimir-dd vladimir-dd added provider: datadog Anything `datadog` service provider related type: enhancement A value-adding code change that enhances its existing functionality. labels Jun 16, 2021
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Copy link
Member

@leebenson leebenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still going through this, but wanted to drop a few early comments to be looking at in the meantime.

cargo check is also failing with 13 warnings, and cargo test with 7 errors.

Will continue going through 👍🏻

Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
lib/datadog/grok/Cargo.toml Outdated Show resolved Hide resolved
lib/datadog/grok/patterns/core Outdated Show resolved Hide resolved
lib/datadog/grok/src/build.rs Outdated Show resolved Hide resolved
lib/datadog/grok/src/build.rs Outdated Show resolved Hide resolved
lib/datadog/grok/src/build.rs Outdated Show resolved Hide resolved
lib/datadog/grok/src/build.rs Outdated Show resolved Hide resolved
lib/datadog/grok/src/build.rs Outdated Show resolved Hide resolved
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
use bytes::Bytes;
use std::collections::BTreeMap;

pub fn parse(bytes: &Bytes) -> Result<Value, String> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a general curiosity, but I'm wondering if there was specific motivation for a home-brew version of query string parsing vs. reaching for something off-the-shelf, like serde_urlencoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point, I believe something off-the-shelf could also work since DD support pretty standard behaviour. Will double-check and replace with the library one then.


impl DiagnosticError for Error {
fn code(&self) -> usize {
109
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has error 109 been documented anywhere? I noticed https://errors.vrl.dev jumps from 108 to 110.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, nice catch, I copied it from 'parse_grok'. @StephenWakely(as the author of the "standard" function) - do you think we need to document this error message?

Copy link
Member

@leebenson leebenson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @vladimir-dd 🎉

Left a few comments, but nothing I'd consider blocking.

Co-authored-by: Lee Benson <lee@leebenson.com>
@netlify
Copy link

netlify bot commented Jul 23, 2021

❌ Deploy Preview for vector-project failed.

🔨 Explore the source changes: 5cbe134

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/6116694aee0656000898ed7f

vladimir-dd and others added 2 commits July 23, 2021 18:30
Co-authored-by: Lee Benson <lee@leebenson.com>
Co-authored-by: Lee Benson <lee@leebenson.com>
})
.map(|e| {
e.try_bytes_utf8_lossy()
.expect("should be string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible for a static expression to be types other than a string (numbers, timestamps) so try_bytes could well fail here. The error needs to be propagated up so that VRL can report it properly. expect will cause a panic.

It is fiddly with chained functions, but something along the lines of this should do it:

                    .map(|e| {
                        Ok(e.try_bytes_utf8_lossy()?
                            .into_owned())
                    })
                    .transpose()?

})
.map(|e| {
e.try_bytes_utf8_lossy()
.expect("should be string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

Comment on lines +70 to +71

fn parse_rules<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to document what the parameters prev_rule_definitions and prev_rule_filters are used for. It's not instantly apparent to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, definitely. Refactored and added some comments in 5f3fc10


// replace grok patterns with "purified" ones
let mut pure_pattern_it = pure_grok_patterns.iter();
for r in raw_grok_patterns {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for r in raw_grok_patterns {
for (r, pure) in raw_grok_patterns.iter().zip(pure_grok_patterns.iter()) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in d924c53

grok_patterns
.iter()
.filter(|&rule| {
rule.destination.is_some() && rule.destination.as_ref().unwrap().filter_fn.is_some()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern matching may be nicer here.

Suggested change
rule.destination.is_some() && rule.destination.as_ref().unwrap().filter_fn.is_some()
matches!(
rule,
GrokPattern {
destination: Some(Destination {
filter_fn: Some(_),
..
}),
..
}
)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, definitely, the final version is in dce9511

let filter = GrokFilter::try_from(dest.filter_fn.as_ref().unwrap())?;
Ok((dest, filter))
})
.collect::<Result<Vec<_>, Error>>()?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But given all that, I think this whole chain could be reduced to a single loop which means it could avoid having to create this intermediate Vec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed! Simplified in 9cd2497.

@binarylogic
Copy link
Contributor

binarylogic commented Jul 26, 2021

Just to bring visibility to discussions we've had across the team, people are having difficulty reviewing this PR, as evident by the number of comments (75+). I hate to say this, but I think we should close this PR and identify incremental changes that can be reviewed appropriately and quickly. I'm open to pushing this over the finish line, but historically that has not worked out well.

Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
@jszwedko
Copy link
Member

@vladimir-dd feel free to flag whenever this is ready for review again.

@vladimir-dd
Copy link
Contributor Author

Closing this PR to break it down in smaller chunks. Will address all left comments in the follow-up PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
provider: datadog Anything `datadog` service provider related type: enhancement A value-adding code change that enhances its existing functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants