Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap): Add join function #6313

Merged
merged 26 commits into from Feb 5, 2021
Merged

enhancement(remap): Add join function #6313

merged 26 commits into from Feb 5, 2021

Conversation

lucperkins
Copy link
Contributor

@lucperkins lucperkins commented Feb 1, 2021

This PR adds a join function to VRL that works like so:

join(["Larry", "Moe", "Curly"], ", ")
# "Larry, Moe, Curly"

Closes #6127

Luc Perkins added 3 commits January 19, 2021 13:56
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
@lucperkins lucperkins requested review from a team and pablosichert and removed request for a team February 1, 2021 23:29
@lucperkins lucperkins marked this pull request as draft February 1, 2021 23:30
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 4 commits February 1, 2021 17:39
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
.map(|s| String::from_utf8_lossy(&s).to_string())
.collect();

// This length comparison is used to discern whether one or more array items was
Copy link
Member

@jszwedko jszwedko Feb 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see allowing elements like integers to be marshaled to a string.

@jszwedko
Copy link
Member

jszwedko commented Feb 2, 2021

@lucperkins I was able to avoid the clone and, I think, more directly handle errors from non-string types in the array, via the changes you see here: vrl-join-function...vrl-join-function-refactor . There you can see I:

  • Switched the way that errors are handled by collecting a Vec<Result<_, _>> into a Result<Vec<_>, _> which is a useful trick supported by the stdlib. Ana actually wrote about this here: https://hoverbear.org/blog/a-useful-error-pattern/
  • Used try_bytes_utf8_lossy on Value to avoid the clone. That make a Cow<_> of the underlying bytes rather than copying them

I had to play around with the code to see for sure hence the branch 🙂

Signed-off-by: Luc Perkins <luc@timber.io>
@lucperkins
Copy link
Contributor Author

lucperkins commented Feb 2, 2021

@jszwedko That's awesome! Man, I love all of these Result and Option tricks. I did have to make one addition (iterating to convert the Cow<'_, str>s to Strings) but otherwise this worked great.

@lucperkins lucperkins marked this pull request as ready for review February 2, 2021 15:41
@jszwedko
Copy link
Member

jszwedko commented Feb 2, 2021

@jszwedko That's awesome! Man, I love all of these Result and Option tricks. I did have to make one addition (iterating to convert the Cow<'_, str>s to Strings) but otherwise this worked great.

Nice! I think your mapping to String via to_string reintroduces the clone though. It should work if you just leave it as Cow which will avoid the clone, as in my branch.

Signed-off-by: Luc Perkins <luc@timber.io>
@lucperkins
Copy link
Contributor Author

@jszwedko Ah yes, right you are. This Cow sensation is one to watch, folks!

Signed-off-by: Luc Perkins <luc@timber.io>
Copy link
Contributor

@StephenWakely StephenWakely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think that inner_type_def check should be made. We do need to test that this scenario would work, because I'm not convinced it will - but that should be a separate issue:

.thing = ["a", "b", "c"]
.thing[1] = 1
join(.thing) # Should be fallible now

Copy link
Contributor

@pablosichert pablosichert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! (Agree with the previous suggestions.)

Suggesting to add a test to remap.toml as well.

Luc Perkins added 2 commits February 3, 2021 08:54
Luc Perkins added 4 commits February 3, 2021 09:55
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
.fallible_unless(Kind::Array)
.merge_optional(separator_type)
.with_constraint(Kind::Bytes)
.with_inner_type(self.value.type_def(state).inner_type_def)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? I don't think the output here, Bytes, has an inner type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, the inner type is being applied the value array, not to the output

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true? with_inner_type seems to set the inner_type of the type being returned by the remap function:

https://github.com/timberio/vector/blob/cf6f56ac088ae525b1e368de64eaf9655487fbd3/lib/remap-lang/src/type_def.rs#L204-L207

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, here it is used to set the inner type for the output of parse_regex: https://github.com/timberio/vector/blob/f2529f091786df59ab9d7155c69483f93fec3041/lib/remap-functions/src/parse_regex.rs#L61

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes you are correct. That with_inner_type shouldn't be there. That's one other reason it would be nice for Kind::Array to be an enum with the inner type. It would then be impossible to specify an inner type for Bytes.

In this case, if you swapped the order of the calls with_constraint and with_inner_type, the with_constraint call would actually wipe out the inner type if the type was a scalar. But since they are in this order the inner_type would linger about..

https://github.com/timberio/vector/blob/master/lib/remap-lang/src/type_def.rs#L194-L199

Copy link
Contributor Author

@lucperkins lucperkins Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I wrote that without seeing Stephen's comment 😄 The issue with eliminating with_inner_type here is that fallible_unless_array_has_inner_type needs to have inner_type_def set to work as expected. I also don't see an issue with calling with_constraint (sets inner_type_def to None) followed by with_inner_type (which could set it to either Some or None).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you just need to move the fallible_unless_array_has_inner_type up before with_constraint is called. It seems like we need to always have the fallible checks before we add the type def constraints since those modify the type.

Copy link
Member

@jszwedko jszwedko Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put it more clearly, this patch seems to work:

diff --git a/lib/remap-functions/src/join.rs b/lib/remap-functions/src/join.rs
index 35131c4a0..08126c7fb 100644
--- a/lib/remap-functions/src/join.rs
+++ b/lib/remap-functions/src/join.rs
@@ -76,9 +76,8 @@ impl Expression for JoinFn {
             .type_def(state)
             .fallible_unless(Kind::Array)
             .merge_optional(separator_type)
-            .with_constraint(Kind::Bytes)
-            .with_inner_type(self.value.type_def(state).inner_type_def)
             .fallible_unless_array_has_inner_type(Kind::Bytes)
+            .with_constraint(Kind::Bytes)
     }
 }
 
@@ -96,7 +95,7 @@ mod test {
             def: TypeDef {
                 fallible: false,
                 kind: Kind::Bytes,
-                inner_type_def: Some(inner_type_def!([ Kind::Bytes ]))
+                ..Default::default()
             },
         }
 
@@ -108,7 +107,7 @@ mod test {
             def: TypeDef {
                 fallible: true,
                 kind: Kind::Bytes,
-                inner_type_def: Some(inner_type_def!([ Kind::Bytes | Kind::Integer ]))
+                ..Default::default()
             },
         }
 
@@ -132,7 +131,7 @@ mod test {
             def: TypeDef {
                 fallible: true,
                 kind: Kind::Bytes,
-                inner_type_def: Some(inner_type_def!([ Kind::Bytes ]))
+                ..Default::default()
             },
         }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening in this patch, though, is that the inner_type_def simply remains None throughout (which is the default), and that in turn makes the fallible_unless_array_has_inner_type call a no-op.

Copy link
Member

@jszwedko jszwedko Feb 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't remain None though, it actually starts at whatever the input value's inner type is via the self.value.type_def(state) that starts this function chain. It's with_constraint that strips it since we are setting the type to a scalar (Bytes). By moving the fallible check up, it is able to access the inner type of the value being passed to the remap function. You would have run into the same issue had you put .fallible_unless(Kind::Array) after you called with_constraint. The issue is that with_constraint modifies the type.

I think the moral of the story is to just for us to be sure to put the fallible checks before with_constraint (or any other methods that mutate the type information) is called. That is: call with_constraint and with_inner_type (only when the kind is an array or map) last.

/// Applies a type constraint to the items in an array. If you need all items in the array to
/// be integers, for example, set `Kind::Integer`; if items can be either integers or Booleans,
/// set `Kind::Integer | Kind::Boolean`; and so on.
pub fn fallible_unless_array_has_inner_type(mut self, kind: impl Into<value::Kind>) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 to this constraint, but the naming / implementation feels a bit awkward to me. This might be due to the fact that inner_type is a bit awkward right now in-general, but, based on the current naming, I'm wondering if this function shouldn't be setting self.fallible = true if a non-array type is passed into it.

Another suggestion would be to call this fallible_unless_inner_type(...) and implement it such that it could accept any of our types that has an inner type.

In my dream world, I think maybe it'd make sense to have Kind::Array itself be an enum type so that you could do things like: Kind::Array(Kind::Integer | Kind::Boolean) to represent an array of integers/booleans, but that would be a more substantive refactoring.

Copy link
Contributor Author

@lucperkins lucperkins Feb 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jszwedko Good call on that. I've updated the behavior to set fallible to true for cases when there's no inner type (plus a test for that 😎). I do like your suggestion about the desired end state but I'll leave that to one of our Remap Hero(in)es.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function could be made much more generic. Some of the different rules we could want:

  • a map that has a certain field of a certain type
  • an array of maps that has a certain field of a certain type
  • either an array of strings or an array of maps that have a certain field of a certain type (say, either ["a", "b", "c"] or [{message: "a"}, {message: "b"}, {message: "c"}]

I think we should wait to see if we actually ever need that level of flexibility before tackling it because it would take a fair bit of thought to get right. So for now I'd say it's good enough to have a fairly specific function that does what we need, but no more. It does result in the awkward name, but I think that reflects the specific nature of where we are up to so far!

In my dream world.

I share this vision! The issue here is that Kind is a bit flag, so Kind::Array | Kind::String is still a Kind. Which is nice. If we make Kind:Array an an enum type, it could no longer be a bit flag. We would then need to distinguish between primitive Kinds and complex Kinds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I share this vision! The issue here is that Kind is a bit flag, so Kind::Array | Kind::String is still a Kind. Which is nice. If we make Kind:Array an an enum type, it could no longer be a bit flag. We would then need to distinguish between primitive Kinds and complex Kinds.

Agreed, that seems to be the distinction here 👍

Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 3 commits February 3, 2021 15:19
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 3 commits February 4, 2021 15:04
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 2 commits February 4, 2021 19:44
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for bearing with me @lucperkins 😄

@lucperkins
Copy link
Contributor Author

To paraphrase what I said to @FungusHumungus, you're not "bearing with" someone when they're right 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

join function
4 participants