New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vrl): Support DataDog grok parser #7705
Conversation
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
cde0bf8
to
c328282
Compare
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
c328282
to
a486b98
Compare
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still going through this, but wanted to drop a few early comments to be looking at in the meantime.
cargo check
is also failing with 13 warnings, and cargo test
with 7 errors.
Will continue going through 👍🏻
use bytes::Bytes; | ||
use std::collections::BTreeMap; | ||
|
||
pub fn parse(bytes: &Bytes) -> Result<Value, String> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general curiosity, but I'm wondering if there was specific motivation for a home-brew version of query string parsing vs. reaching for something off-the-shelf, like serde_urlencoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point, I believe something off-the-shelf could also work since DD support pretty standard behaviour. Will double-check and replace with the library one then.
|
||
impl DiagnosticError for Error { | ||
fn code(&self) -> usize { | ||
109 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has error 109 been documented anywhere? I noticed https://errors.vrl.dev jumps from 108 to 110.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, nice catch, I copied it from 'parse_grok'. @StephenWakely(as the author of the "standard" function) - do you think we need to document this error message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @vladimir-dd 🎉
Left a few comments, but nothing I'd consider blocking.
Co-authored-by: Lee Benson <lee@leebenson.com>
❌ Deploy Preview for vector-project failed. 🔨 Explore the source changes: 5cbe134 🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/6116694aee0656000898ed7f |
Co-authored-by: Lee Benson <lee@leebenson.com>
Co-authored-by: Lee Benson <lee@leebenson.com>
}) | ||
.map(|e| { | ||
e.try_bytes_utf8_lossy() | ||
.expect("should be string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible for a static expression to be types other than a string (numbers, timestamps) so try_bytes
could well fail here. The error needs to be propagated up so that VRL can report it properly. expect
will cause a panic.
It is fiddly with chained functions, but something along the lines of this should do it:
.map(|e| {
Ok(e.try_bytes_utf8_lossy()?
.into_owned())
})
.transpose()?
}) | ||
.map(|e| { | ||
e.try_bytes_utf8_lossy() | ||
.expect("should be string") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
|
||
fn parse_rules<'a>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to document what the parameters prev_rule_definitions
and prev_rule_filters
are used for. It's not instantly apparent to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, definitely. Refactored and added some comments in 5f3fc10
|
||
// replace grok patterns with "purified" ones | ||
let mut pure_pattern_it = pure_grok_patterns.iter(); | ||
for r in raw_grok_patterns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for r in raw_grok_patterns { | |
for (r, pure) in raw_grok_patterns.iter().zip(pure_grok_patterns.iter()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d924c53
grok_patterns | ||
.iter() | ||
.filter(|&rule| { | ||
rule.destination.is_some() && rule.destination.as_ref().unwrap().filter_fn.is_some() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pattern matching may be nicer here.
rule.destination.is_some() && rule.destination.as_ref().unwrap().filter_fn.is_some() | |
matches!( | |
rule, | |
GrokPattern { | |
destination: Some(Destination { | |
filter_fn: Some(_), | |
.. | |
}), | |
.. | |
} | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely, the final version is in dce9511
let filter = GrokFilter::try_from(dest.filter_fn.as_ref().unwrap())?; | ||
Ok((dest, filter)) | ||
}) | ||
.collect::<Result<Vec<_>, Error>>()? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But given all that, I think this whole chain could be reduced to a single loop which means it could avoid having to create this intermediate Vec
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed! Simplified in 9cd2497.
Just to bring visibility to discussions we've had across the team, people are having difficulty reviewing this PR, as evident by the number of comments (75+). I hate to say this, but I think we should close this PR and identify incremental changes that can be reviewed appropriately and quickly. I'm open to pushing this over the finish line, but historically that has not worked out well. |
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
Signed-off-by: Vladimir Zhuk <vladimir.zhuk@datadoghq.com>
@vladimir-dd feel free to flag whenever this is ready for review again. |
Closing this PR to break it down in smaller chunks. Will address all left comments in the follow-up PRs. |
This is the initial PR to support DD grok parsing.
Here is an example of grok rules:
You can write parsing rules with the
%{MATCHER:EXTRACT:FILTER}
syntax:Each rule can reference parsing rules defined above itself in the list. Only one can match any given log. The first one that matches, from top to bottom, is the one that does the parsing. For further documentation and the full list of available matcher and filters check out https://docs.datadoghq.com/logs/processing/parsing
We introduce a new
parse_datadog_grok
VRL function(hidden by FF). An example of usage:In the follow-up PRs we will add all missing matchers and filters.
Signed-off-by: Vladimir Zhuk vladimir.zhuk@datadoghq.com