Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap): Add push and append functions #5750

Merged
merged 33 commits into from Jan 11, 2021

Conversation

lucperkins
Copy link
Contributor

@lucperkins lucperkins commented Dec 28, 2020

Closes #5749. This PR appends (buh dum crash) an append function to VRL that adds each element in an array to an array and also a push function that adds a single item to an array.

Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 4 commits December 28, 2020 13:32
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question about the return value, otherwise this looks good to me!

As for avoiding duplicate items, I'd be inclined to either have a different function or just have users guard with if (!in_array(...)). Otherwise I think we are introducing set-like semantics to the array type. Either way, this could be an additive change later when we have a user need, even if we decide to add a new parameter here.

We could also add a unique() function that deduplicates.

type: ["any"]
},
]
return: ["array"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might warrant a larger discussion, but I think it would be clearer that append modifies the array in-place if it didn't return anything. The fact that it returns the modified array could lead users to believe it is returning a new array that is a copy of the old array plus the new element.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do know that Jean et al would eventually like to make del, for example, return either the deleted value or null if the field doesn't exist. Returning the new array would enable you to do things like unique(append(.fruits, "mango")) (mentioned in my other comment) and feels more consistent with some programming languages (Go et al) but not others (Rust et al). I don't have strong feelings there, especially on this first iteration. Happy to defer. For now I'll leave the current behavior in place (return the new array) and update the docs.

Copy link
Member

@jszwedko jszwedko Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Go is a bit special because it actually returns a new "struct" representing the array. Usually this is pointing to the same underlying memory, but it'll actually be different if it had to extend the array; this is a common gotcha for people coming to the language. Consider that Go's in-place sorting routines do not return the sorted array. I think this makes it clearer that it is happening in-place.

That being said, maybe I'm in the minority here. I know Ruby returns the same array to make it easier to chain though they also have sort create a copy while sort! sorts in-place.

I guess the most important thing is consistency in VRL. We can choose to always return the modified value, even if it happens in-place, and hopefully that consistency will make it easy to reason about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would update the docs to make it clear that the array itself is modified though, the current wording:

Appends the specified item to an array and returns the new array

Might lead people to imagine it returns a new array. Specifically the word new there. Maybe:

Appends the specified item to an array and returns the modified array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you're saying now. I'll update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jszwedko Every VRL function except del is a "pure" function that generates a value but doesn't modify the object. In this case, the value of the list and the item are merely extracted from the function arguments and used to produce a new array. Compare with the del function logic:

self.paths
    .iter()
    .try_for_each(|path| object.remove(path.as_ref(), false))?; // Object is manipulated

The Object type has methods that give you hooks into the event itself. If you don't invoke those then the underlying event is unchanged. So I'm not sure that tests verifying that the event is unchanged are any more necessary here than in all of the other functions (except del, of course).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, I'm probably missing something in the plumbing of remap-lang. I would have assumed that complex types like the Vec and BTreeMap would have been accessed by reference to avoid unnecessary clones (though we do want the clone here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably missing something in the plumbing of remap-lang.

You're not missing anything 😄. We opted to accept the clone, as a trade-off to keep the initial implementation simpler.

Since we're using the bytes crate, the cloning is actually relatively cheap, since it uses reference-counting internally, and other most other value types implement copy, and are thus cheap to clone (cloning a map is probably most expensive, since we're storing the keys as strings on the heap).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha 🤔 thanks for the explanation!

@lucperkins I think it'd still be nice to add a test that asserts the original array was unchanged in case we change the behavior of always cloning values and need to add an explicit clone here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every VRL function except del is a "pure" function that generates a value but doesn't modify the object.

merge is another one. merge(., .foo) will merge .foo into the root.

@lucperkins
Copy link
Contributor Author

lucperkins commented Dec 29, 2020

@jszwedko I think that keeping this behavior as-is and providing a function like unique or dedupe is the way to go. I prefer unique(append(.fruits, "mango")) to append(.fruits, "mango", overwrite = false) or something along those lines. Set behavior is something we may consider down the road but agreed, not here. Namespaces would also be useful here à la array::append vs. set::add if we go that route.

Luc Perkins added 3 commits December 29, 2020 11:58
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
@jamtur01 jamtur01 added the domain: vrl Anything related to the Vector Remap Language label Jan 1, 2021
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR looks fine to me, but I would probably prefer we rename this function to push:

$foo = [0, 1]
$foo = push($foo, 2)
[0, 1, 2]

And have a separate append function to add one array to another:

$foo = [0, 1]
$foo = append($foo, [2, 3])
[0, 1, 2, 3]

Also, we probably want an unshift function to push to the front of the array:

$foo = [0, 1]
$foo = unshift($foo, -1)
[-1, 0, 1]

Additionally, we could support using the + operator to push elements onto an array:

$foo = [0, 1]
$foo = $foo + 2
[0, 1, 2]

@lucperkins
Copy link
Contributor Author

lucperkins commented Jan 3, 2021

@JeanMertz Okay, push makes sense to me. I've also added append, as it's low-hanging fruit.

Luc Perkins added 5 commits January 2, 2021 20:12
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
@binarylogic binarylogic changed the title enhancement(remap): Add append function enhancement(remap): Add puh function Jan 3, 2021
@binarylogic binarylogic changed the title enhancement(remap): Add puh function enhancement(remap): Add push function Jan 3, 2021
@lucperkins lucperkins changed the title enhancement(remap): Add push function enhancement(remap): Add push and append functions Jan 4, 2021
Copy link
Contributor

@JeanMertz JeanMertz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, but once those are resolved, this LGTM 👌

docs/reference/remap/append.cue Outdated Show resolved Hide resolved
docs/reference/remap/append.cue Outdated Show resolved Hide resolved
lib/remap-functions/src/append.rs Outdated Show resolved Hide resolved
lib/remap-functions/src/lib.rs Outdated Show resolved Hide resolved
lib/remap-functions/src/push.rs Outdated Show resolved Hide resolved
lib/remap-functions/src/push.rs Outdated Show resolved Hide resolved
tests/behavior/transforms/remap.toml Outdated Show resolved Hide resolved
Luc Perkins added 3 commits January 4, 2021 18:31
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 5 commits January 5, 2021 20:43
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
@lucperkins
Copy link
Contributor Author

lucperkins commented Jan 6, 2021

@JeanMertz Okay, I've added the inner type definitions, though the way I've done it seems pretty inelegant. If there's a smoother path here, please let me know.

Luc Perkins added 4 commits January 5, 2021 23:46
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 2 commits January 7, 2021 10:37
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Luc Perkins added 4 commits January 9, 2021 10:26
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
Signed-off-by: Luc Perkins <luc@timber.io>
@lucperkins lucperkins merged commit 69df85b into master Jan 11, 2021
@lucperkins lucperkins deleted the remap-append-function branch January 11, 2021 20:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VRL: "append" function
6 participants