Skip to content

feat(new transform): New explode transform#6545

Closed
eunchuldev wants to merge 3 commits intovectordotdev:masterfrom
eunchuldev:transform-explode
Closed

feat(new transform): New explode transform#6545
eunchuldev wants to merge 3 commits intovectordotdev:masterfrom
eunchuldev:transform-explode

Conversation

@eunchuldev
Copy link
Copy Markdown

Ref: #6459

Signed-off-by: Eunchul Song eunchulsong9@gmail.com

Signed-off-by: Eunchul Song <eunchulsong9@gmail.com>
@eunchuldev eunchuldev requested review from a team, JeanMertz, blt and prognant and removed request for a team February 24, 2021 10:56
Signed-off-by: Eunchul Song <eunchulsong9@gmail.com>
@binarylogic binarylogic requested review from jszwedko and removed request for blt and prognant February 24, 2021 15:18
Signed-off-by: Eunchul Song <eunchulsong9@gmail.com>
Copy link
Copy Markdown
Collaborator

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work, thanks @sech9446 ! Thank you for adding all of the tests.

I left a few small comments. I'm also wondering if we might be able to expand the functionality of this transform to handle arrays of objects differently.

I'm imagining a case where a user might have input like:

{
  "events": [
    {
      "message": "event 1"
    },
    {
      "message": "event 2"
    }
  ]
}

And they want to output:

{
  "message": "event 1"
}

and

{
  "message": "event 2"
}

I think this could be handled by checking the type of the array value as we are iterating and, for objects, instead of inserting as a child field, merge with the original event.

I could see this being a configurable option so that users could still have it as a child field when desired, which is how you currently have it.

This will also need docs, but I can help with that once it is ready to be merged.

Comment thread src/transforms/explode.rs
#[derive(Deserialize, Serialize, Debug, Clone)]
#[serde(deny_unknown_fields)]
pub struct ExplodeConfig {
#[serde(default)]
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just pull the serde(default) up to the struct level.

Comment thread src/transforms/explode.rs
new_event.as_mut_log().insert(&target_field, v);
output.push(new_event);
}
} else {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For other parsing transforms, like json_parser, we've typically included a drop_invalid config option that users could use to either drop events that don't match expectations (if true) or pass them along as-is (if false). Could we add that option here as well? We've typically defaulted to false which would pass along non-confirming events as-is.

impl<'a> InternalEvent for ExplodeFieldIsNotArray<'a> {
fn emit_logs(&self) {
warn!(
message = "Cannot explode non array kind field.",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
message = "Cannot explode non array kind field.",
message = "Cannot explode non-array field.",

Comment thread src/transforms/explode.rs
if let Value::Array(array) = value {
if event
.as_mut_log()
.insert(&target_field, Value::Null)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you could just use .get() here to avoid the unnecessary insert operation.

@eunchuldev
Copy link
Copy Markdown
Author

eunchuldev commented Mar 1, 2021

Thank you for the detailed review!

Your suggestion makes sense. I am going to add two new configurable options drop_invalid and flatten, with default value false of both.

@jszwedko
Copy link
Copy Markdown
Collaborator

@sech9446 thanks for your work on this! We used it as inspiration for the unnest function in #7404 which, combined with #7267 will let users do this transformation just in remap. I'll close this out given the other approach, but let us know if that approach doesn't work for you.

@jszwedko jszwedko closed this May 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants