New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhancement(remap): Add join function #6313
Conversation
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
lib/remap-functions/src/join.rs
Outdated
.map(|s| String::from_utf8_lossy(&s).to_string()) | ||
.collect(); | ||
|
||
// This length comparison is used to discern whether one or more array items was |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could see allowing elements like integers to be marshaled to a string.
@lucperkins I was able to avoid the clone and, I think, more directly handle errors from non-string types in the array, via the changes you see here: vrl-join-function...vrl-join-function-refactor . There you can see I:
I had to play around with the code to see for sure hence the branch 🙂 |
Signed-off-by: Luc Perkins <luc@timber.io>
@jszwedko That's awesome! Man, I love all of these |
Nice! I think your mapping to |
Signed-off-by: Luc Perkins <luc@timber.io>
@jszwedko Ah yes, right you are. This |
Signed-off-by: Luc Perkins <luc@timber.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think that inner_type_def
check should be made. We do need to test that this scenario would work, because I'm not convinced it will - but that should be a separate issue:
.thing = ["a", "b", "c"]
.thing[1] = 1
join(.thing) # Should be fallible now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! (Agree with the previous suggestions.)
Suggesting to add a test to remap.toml as well.
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
lib/remap-functions/src/join.rs
Outdated
.fallible_unless(Kind::Array) | ||
.merge_optional(separator_type) | ||
.with_constraint(Kind::Bytes) | ||
.with_inner_type(self.value.type_def(state).inner_type_def) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this right? I don't think the output here, Bytes
, has an inner type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, the inner type is being applied the value
array, not to the output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true? with_inner_type
seems to set the inner_type of the type being returned by the remap function:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, here it is used to set the inner type for the output of parse_regex
: https://github.com/timberio/vector/blob/f2529f091786df59ab9d7155c69483f93fec3041/lib/remap-functions/src/parse_regex.rs#L61
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes you are correct. That with_inner_type
shouldn't be there. That's one other reason it would be nice for Kind::Array
to be an enum with the inner type. It would then be impossible to specify an inner type for Bytes.
In this case, if you swapped the order of the calls with_constraint
and with_inner_type
, the with_constraint
call would actually wipe out the inner type if the type was a scalar. But since they are in this order the inner_type would linger about..
https://github.com/timberio/vector/blob/master/lib/remap-lang/src/type_def.rs#L194-L199
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I wrote that without seeing Stephen's comment 😄 The issue with eliminating with_inner_type
here is that fallible_unless_array_has_inner_type
needs to have inner_type_def
set to work as expected. I also don't see an issue with calling with_constraint
(sets inner_type_def
to None
) followed by with_inner_type
(which could set it to either Some
or None
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you just need to move the fallible_unless_array_has_inner_type
up before with_constraint
is called. It seems like we need to always have the fallible checks before we add the type def constraints since those modify the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To put it more clearly, this patch seems to work:
diff --git a/lib/remap-functions/src/join.rs b/lib/remap-functions/src/join.rs
index 35131c4a0..08126c7fb 100644
--- a/lib/remap-functions/src/join.rs
+++ b/lib/remap-functions/src/join.rs
@@ -76,9 +76,8 @@ impl Expression for JoinFn {
.type_def(state)
.fallible_unless(Kind::Array)
.merge_optional(separator_type)
- .with_constraint(Kind::Bytes)
- .with_inner_type(self.value.type_def(state).inner_type_def)
.fallible_unless_array_has_inner_type(Kind::Bytes)
+ .with_constraint(Kind::Bytes)
}
}
@@ -96,7 +95,7 @@ mod test {
def: TypeDef {
fallible: false,
kind: Kind::Bytes,
- inner_type_def: Some(inner_type_def!([ Kind::Bytes ]))
+ ..Default::default()
},
}
@@ -108,7 +107,7 @@ mod test {
def: TypeDef {
fallible: true,
kind: Kind::Bytes,
- inner_type_def: Some(inner_type_def!([ Kind::Bytes | Kind::Integer ]))
+ ..Default::default()
},
}
@@ -132,7 +131,7 @@ mod test {
def: TypeDef {
fallible: true,
kind: Kind::Bytes,
- inner_type_def: Some(inner_type_def!([ Kind::Bytes ]))
+ ..Default::default()
},
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening in this patch, though, is that the inner_type_def
simply remains None
throughout (which is the default), and that in turn makes the fallible_unless_array_has_inner_type
call a no-op.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't remain None
though, it actually starts at whatever the input value's inner type is via the self.value.type_def(state)
that starts this function chain. It's with_constraint
that strips it since we are setting the type to a scalar (Bytes
). By moving the fallible check up, it is able to access the inner type of the value being passed to the remap function. You would have run into the same issue had you put .fallible_unless(Kind::Array)
after you called with_constraint
. The issue is that with_constraint
modifies the type.
I think the moral of the story is to just for us to be sure to put the fallible checks before with_constraint
(or any other methods that mutate the type information) is called. That is: call with_constraint
and with_inner_type
(only when the kind is an array or map) last.
/// Applies a type constraint to the items in an array. If you need all items in the array to | ||
/// be integers, for example, set `Kind::Integer`; if items can be either integers or Booleans, | ||
/// set `Kind::Integer | Kind::Boolean`; and so on. | ||
pub fn fallible_unless_array_has_inner_type(mut self, kind: impl Into<value::Kind>) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 to this constraint, but the naming / implementation feels a bit awkward to me. This might be due to the fact that inner_type
is a bit awkward right now in-general, but, based on the current naming, I'm wondering if this function shouldn't be setting self.fallible = true
if a non-array type is passed into it.
Another suggestion would be to call this fallible_unless_inner_type(...)
and implement it such that it could accept any of our types that has an inner type.
In my dream world, I think maybe it'd make sense to have Kind::Array
itself be an enum type so that you could do things like: Kind::Array(Kind::Integer | Kind::Boolean)
to represent an array of integers/booleans, but that would be a more substantive refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jszwedko Good call on that. I've updated the behavior to set fallible
to true
for cases when there's no inner type (plus a test for that 😎). I do like your suggestion about the desired end state but I'll leave that to one of our Remap Hero(in)es.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function could be made much more generic. Some of the different rules we could want:
- a map that has a certain field of a certain type
- an array of maps that has a certain field of a certain type
- either an array of strings or an array of maps that have a certain field of a certain type (say, either
["a", "b", "c"]
or[{message: "a"}, {message: "b"}, {message: "c"}]
I think we should wait to see if we actually ever need that level of flexibility before tackling it because it would take a fair bit of thought to get right. So for now I'd say it's good enough to have a fairly specific function that does what we need, but no more. It does result in the awkward name, but I think that reflects the specific nature of where we are up to so far!
In my dream world.
I share this vision! The issue here is that Kind
is a bit flag, so Kind::Array | Kind::String
is still a Kind
. Which is nice. If we make Kind:Array
an an enum type, it could no longer be a bit flag. We would then need to distinguish between primitive Kinds and complex Kinds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share this vision! The issue here is that
Kind
is a bit flag, soKind::Array | Kind::String
is still aKind
. Which is nice. If we makeKind:Array
an an enum type, it could no longer be a bit flag. We would then need to distinguish between primitive Kinds and complex Kinds.
Agreed, that seems to be the distinction here 👍
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
a335792
to
0d146e7
Compare
Signed-off-by: Luc Perkins <luc@timber.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for bearing with me @lucperkins 😄
To paraphrase what I said to @FungusHumungus, you're not "bearing with" someone when they're right 😂 |
This PR adds a
join
function to VRL that works like so:Closes #6127