chore: Remap v1 RFC #3134

Jeffail · 2020-07-21T12:50:41Z

Closes #2744

Rendered

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

binarylogic

This is excellent. One comment on a proposed syntax change that I would like to resolve, but otherwise looks good.

binarylogic · 2020-07-21T21:45:11Z

rfcs/2020-07-21-2744-vicscript.md

+  inputs = [ "foo" ]
+  type = "vicscript"
+  mapping = """
+foo = "hello"


Since this is the first presentation of this syntax, the thing that immediately jumps out to me is the lack of clarity around event field assignment. It looks like a bunch of variables are being set with no real action. Following precedence set by our lua transform, I'm curious what you think about this alternative syntax?

event.log.foo = "hello" event.log.bar = event.log.bar + 10 event.log.baz_is_large = event.log.baz_is_large > 10

I realize the intent of your syntax is to avoid maintaining a runtime environment. So things like this would not be possible:

# not possible baz = 1 event.log.bar = baz

And we could simply throw an error saying that baz is "unknown". The advantage with my syntax is:

It's obvious you are setting an event level field.

The log namespace means we can use this with metrics (unless you were thinking of a better way to solve this).

It's closer in syntax to other transforms that manipulate events (like lua and wasm).

Disadvantages:

It suggests that this is a runtime. Users might be surprised that my baz = 1 example doesn't work.

I think that'd be reasonable, one thing to note is that enforcing a path prefix would be stripped within a map definition, so you'd still get bare assignments like this:

map foo { type = (type.uppercase() | "UNKNOWN") name = name.titlecase() } event.log.metadata = event.log.metadata.apply("foo")

Another thing to note is that both IDML and Bloblang have an optional path prefix root (root.foo = "bar" would set the event field foo to "bar"), which is useful for when a path clashes with a keyword like if or match, but is also implicit when omitted as a convenience. It's assumed it usually won't be used unless you specifically want to assign directly to the root. On the right hand side we have a similar keyword this which represents the source value in the current context.

Adding them to the above example looks like this:

map foo { root.type = (this.type.uppercase() | "UNKNOWN") root.name = this.name.titlecase() } root.event.log.metadata = this.event.log.metadata.apply("foo")

I would opt to leave these in the spec as they are. We could choose to make them mandatory but I wouldn't rename root to something like event as it's somewhat misleading in the context of maps.

On the topic of variables, we can actually support them without a runtime or making the spec turing complete, it's just a mapping that isn't part of the output event. I originally omitted them from the RFC so we could consider them later as a more advanced feature, Bloblang has them and it's definitely useful, I already have an example here that uses them.

I would opt to leave these in the spec as they are.

Sounds good. I've come around to leaving out the prefix, especially as we start to incorporate it in other places where specifying event.log would be confusing. Such as dynamic configuration values.

map foo { type = (type.uppercase() | "UNKNOWN") name = name.titlecase() }

The map syntax feels a little strange to me as well. As I noted in my other comment, we should agree on these details in the form of a spec so that we can obtain consensus. I realize that's the part of what this RFC aims to accomplish, but it still feels hand-wavy to me.

On the topic of variables, we can actually support them without a runtime or making the spec turing complete, it's just a mapping that isn't part of the output event.

Sounds good to me.

I agree it's a little unclear to have assignment look like free variables, but I'm not a fan of the verbosity of event.log..

The jq approach is interesting here, where assignment to a foo key is just .foo = "bar".

Yeah we can do that. One thing that's open to confusion here is that . on the left hand side means something different from the right hand side.

Using root and this makes that distinction clear (even if it's not immediately obvious what they point to), whereas a very common pitfall of jq is that users will naturally assume in many places that a dot on the left and the right are the same thing. It means the docs for lots of features ends up needing to also explain that same concept continuously, making them quite verbose: https://stedolan.github.io/jq/manual/#Assignment.

One thing that's open to confusion here is that . on the left hand side means something different from the right hand side.

Yeah, this is interesting. It looks like in "normal" = assignment they are the same thing, but "most users will want to use modification assignment operators", where the RHS is a filter. I get the feeling there's some history there and I'm curious what led to the current state.

binarylogic · 2020-07-21T21:45:34Z

rfcs/2020-07-21-2744-vicscript.md

+And supports coalescing fields within a path with a pipe operator `|`, where if the first value is null or missing then the next value is selected, and so on:
+
+```coffee
+foo = bar.(baz | buz).bev


👍 really like this.

Same! This is really nice and concise.

binarylogic · 2020-07-21T21:46:26Z

rfcs/2020-07-21-2744-vicscript.md

+You can assign a value to the root of your event by writing to the `root` keyword:
+
+```coffee
+root = nested.object


This is another example where it's clearer to write:

event.log = nested.object

In both of these cases I worry about the user data having root, event, or log fields. If we can sidestep that whole problem it could save us a lot of headache.

binarylogic · 2020-07-21T21:47:21Z

rfcs/2020-07-21-2744-vicscript.md

+Vicscript supports float, int, boolean, string, null, array and object literals:
+
+```coffee
+root = [


I'm 👍 on the primitive types you've laid out. But this reminds me, do we want to allow non "object" types at the root? I'm concerned this will break in other parts of Vector.

rfcs/2020-07-21-2744-vicscript.md

binarylogic · 2020-07-21T21:50:08Z

rfcs/2020-07-21-2744-vicscript.md

+
+It also seems like there are some key limitations with the language that we would need to contribute solutions for (or fork). The main one being simple coalescing expressions, I can't find anything in their docs to allow something like `foo = bar.(baz | buz).bev` without a nasty bunch of nested `match` expressions.
+
+Similar to JQ the syntax is also far enough removed from basic c-style declarations that there's a learning curve and more difficulty in maintaining syntax highlighters.


Agree with all of the above regarding Tremor script. I think it's an impressive feature of Tremor, but would like to avoid it for the reasons you listed.

binarylogic · 2020-07-21T21:51:18Z

rfcs/2020-07-21-2744-vicscript.md

+
+### It's more to support
+
+A key role for us as Vector maintainers is to assist users with their configs and thereby encourage adoption. It stands to reason that making Vector as a project larger would conflict with our ability to do this. However, we already have a range of transforms for mapping and after reviewing some large example configs ([https://github.com/timberio/vector/issues/1965](https://github.com/timberio/vector/issues/1965)) I think scrapping them in favor of a more condensed and readable language would be overall beneficial to us when we're sporting our support hats.


Agree, I think this will aid us in support.

binarylogic · 2020-07-21T21:52:14Z

rfcs/2020-07-21-2744-vicscript.md

+
+### It's a lot of effort
+
+Vicscript is an entirely new language spec where we'll need to implement both the parser and executor, which is clearly going to be a significant chunk of work. The obvious mitigation to this is to simply try it out and put a time cap on it, then review how far along the project got. This is only a speculative drawback and it's only going to draw us back once (until we decide to rewrite it for fun).


The value exceeds the effort in my opinion.

binarylogic · 2020-07-21T21:53:11Z

rfcs/2020-07-21-2744-vicscript.md

+
+Mapping events is a core feature of Vector, and once we introduce a performant and more powerful alternative to our existing mapping transforms it's reasonable to expect it to become the standard for small or moderate sized pipelines. As such, I think it's important for us to have strong ownership of this language. It will allow us to build it into exactly the right solution for Vector users. It will also allow us to guarantee its performance and stability in the future, and provide first class support for it.
+
+Given most of the risk around this idea is simply how difficult it will be I'd say the obvious first step is to test that out with a proof of concept. We can start with a quick implementation of a small subset of the spec and build a parser and execution engine for review. If we end up with nothing, or with a hostile codebase, then we can instead look into adopting other projects and compare.


Agree with everything you've laid out as a rationale. I am big fan of starting with a small experiment.

binarylogic · 2020-07-21T21:57:09Z

rfcs/2020-07-21-2744-vicscript.md

+
+1. Dip our toes and build a new transform with a parser using something like [nom](https://github.com/Geal/nom). This should only go as far as a tiny subset of the spec, where we allow basic operations that are roughly equivalent to our existing mapping transforms and work directly to/from our internal event type.
+2. Review the parser and the codebase, and compare the performance of this transform with the existing mapping transforms. If it sucks then it goes in the bin and we investigate using other existing alternatives.
+3. If instead we're happy with it then at this point we should consider breaking the project out of Vector. Since the spec is mostly a copy of Bloblang it would be sensible to share efforts for community tooling such as syntax highlighters, linters, documentation, etc. Breaking our parser out as a reference Rust implementation would help build that community.


I'd add a step before 3 that would outline an actual spec for the language. We need consensus on the details and priority, but this is not necessary for the initial experiment.

rfcs/2020-07-21-2744-vicscript.md

rfcs/2020-07-21-2744-vicscript/example1.coffee

Jeffail · 2020-07-23T13:33:19Z

I'm thinking in the interest of keeping the RFC simpler in the absence of a more in-depth spec I should remove more advanced features such as map, match expressions and iterator methods for now. We can add them one-by-one once we're agreed on the fundamentals and if we need more discussion they can be broken out into their own RFCs.

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

rfcs/2020-07-21-2744-vicscript.md

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

binarylogic

Nice work!

bruceg

This clears up a lot for me. It also explains why there was no "delete" operation that I had also forgotten to ask about. While mutating the original event may be more obvious for writers (think "I just want to modify this one field"), this approach leaves open opportunities for internal optimizations.

lukesteensen

This is awesome! My biggest questions revolve around what level of complexity we want to target.

If our primary target use case is on the lower end of the complexity spectrum, I'd be tempted to make this a bit more jq-like just to remove some ambiguity and hopefully take advantage of broad user knowledge of that syntax. It definitely doesn't scale as well to large programs, but it's still well within the ideal of "make simple things simple and complicated things possible".

If we're targetting higher on the complexity scale, I think it's reasonable to differ from the one-liner style and focus more on readability. I do worry that this is sacrificing ease of use for the simplest use cases, which are likely to be some of the most popular.

lukesteensen · 2020-07-24T14:48:37Z

rfcs/2020-07-21-2744-vicscript.md

+  inputs = [ "foo" ]
+  type = "vicscript"
+  mapping = """
+foo = "hello"


I agree it's a little unclear to have assignment look like free variables, but I'm not a fan of the verbosity of event.log..

The jq approach is interesting here, where assignment to a foo key is just .foo = "bar".

lukesteensen · 2020-07-24T19:09:20Z

rfcs/2020-07-21-2744-vicscript.md

+And supports coalescing fields within a path with a pipe operator `|`, where if the first value is null or missing then the next value is selected, and so on:
+
+```coffee
+foo = bar.(baz | buz).bev


Same! This is really nice and concise.

lukesteensen · 2020-07-24T19:10:33Z

rfcs/2020-07-21-2744-vicscript.md

+The most common query type is a dot separated path, describing how to reach a target field within the input event:
+
+```coffee
+foo = bar.baz.0.buz


Does this differentiate between array and map indexing at all? Or would .0 work for both {"0": "foo"} and ["foo"]?

Because it's on a right hand side we have the reference object to look at, so this path follows the same rules as JSON pointers where if we approach an array and the path segment can be parsed as a integer then we use it as an index. However, this obviously can't always work on the left hand side unless the array already exists from a previous assignment, so there's room for confusion there.

We could instead opt to force array indexes to be explicit all over and foo.0.bar would always mean an object foo with a key 0, and indexes are expressed as foo[0].bar.

lukesteensen · 2020-07-24T19:11:22Z

rfcs/2020-07-21-2744-vicscript.md

+You can assign a value to the root of your event by writing to the `root` keyword:
+
+```coffee
+root = nested.object


In both of these cases I worry about the user data having root, event, or log fields. If we can sidestep that whole problem it could save us a lot of headache.

lukesteensen · 2020-07-24T20:07:21Z

rfcs/2020-07-21-2744-vicscript.md

+Out: {"id":"second","sorted_foo":["a","b","c","d"]}
+```
+
+The resulting event will only contain fields that are explicitly mapped, in order to begin your map with a full copy of the input event structure you can start by assigning `this` to `root`:


I don't love this pattern. Again, the jq approach here is interesting and doesn't require these special names. A full copy is the default with simple assignment like .foo = "bar", while {foo: "bar"} gives you a fresh object.

I'm also curious if there would be performance implications to this for simple cases like add_fields. With enough work we could definitely optimize away the copying, but I'm not sure we really want to sign ourselves up for that.

lukesteensen · 2020-07-24T20:27:32Z

rfcs/2020-07-21-2744-vicscript.md

+
+### JQ
+
+JQ is a great tool and rightfully gets a lot of love. It also basically solves the same problem as Vicscript. Unfortunately, it's common knowledge that JQ doesn't score high on readability, and therefore it doesn't scale well at all in relation to mapping size and complexity (imagine trying to solve [https://github.com/timberio/vector/issues/1965](https://github.com/timberio/vector/issues/1965) with it).


There are two important factors here, readability and scalability.

Readability is a tricky one. I definitely agree that jq can be hard to understand, but that's largely a non-issue if the user is already familiar with jq from somewhere else. Given that it's pretty popular, it's hard to say how those balance out.

Scalability is one where we need to think about the target use cases. If we're expecting for this transform to handle mostly light reshaping, then scalability is less of an issue. Users will likely have the opportunity to master the basics before diving into something more complicated and may not mind spending a bit more time to figure out something they recognize is pretty complicated. On the other hand, if we expect a meaningful number of users to jump straight into larger mapping declarations, it's more important that we have something that scales smoothly to that size and level of complexity.

binarylogic

After thinking about this more, I'm starting to agree with @lukesteensen that the jq syntax is a better path forward:

I don't particularly like the proposed syntax. I find it mildly confusing for the reasons specified above.
The learning curve favors JQ.
1. Users without JQ experience will likely find JQ easier to understand. Additionally, there are a lot of resources online for JQ.
2. Users with JQ experience will have no learning curve.
It removes the burden of defining a spec and all of the decisions that go along with that. I'm sure there are some tricky questions that JQ has already solved.
It'll be easier to maintain and improve over time.

If others disagree, I'm happy to move forward with the proposed syntax. Otherwise, I would like to obtain consensus and update this RFC to reflect the change.

I'm also curious if we can build off of jq_rs.

Jeffail · 2020-07-27T11:42:02Z

This is worth digging into but I think there's two different mapping syntaxes being conflated here, and with a side order of the path syntax which is also a separate consideration.

Wholesale Mapping

Firstly, there's wholesale "this is the thing" mapping, which is basically just an object literal at the root, this is the traditional jq style:

{foo: .bar, baz: "something"}

And with our current proposal (assuming we eventually implement object literals), it looks like this:

root = {"foo": bar, "baz": "something"}

Or, as I've done with bloblang, the above can be simplified to a single expression, making it even more jq-like:

{"foo": bar, "baz": "something"}

In order to match jq syntax for these mappings we would just need to allow non-quoted keys, the main limitation with doing that is that it makes it awkward to have dynamic key values. In bloblang the mapping {key:value} would have a dynamic key determined by the field key of the input event, but I think it's totally reasonable to decide this is out of scope for vicscript.

Other than that minor distinction (and the rest of the language) this mapping style is basically the same.

Mapping with Assignments

The other style of mapping is a series of assignments that can optionally follow an initial wholesale mapping, this is the primary mapping style of IDML and Bloblang as it's more readable for expressing larger mappings, in jq it looks like this:

.tagPath = (.tagPath // (.key | split("/")))

And in this proposal it actually looks basically the same, the difference is in the other operators and path syntax:

tagPath = tagPath | key.split("/")

They also behave the same, where both make an explicit copy of the right hand side in order to preserve immutability of the right hand side context.

However, where they differ is in the initial state of the mapped object, where jq begins with the original event and this proposal starts with an empty object. The performance merits of either should be minor if any, starting with the input event doesn't automatically remove any copy overhead in practice because you still need an immutable reference of the input event during the sequence of assignments, optimizing ends up looking basically the same (citation needed).

In terms of safety starting with a full copy makes it less likely you'll accidentally drop fields from input events (or ones that are added later), an empty copy makes it less likely you'll accidentally include fields that you don't want to leak (or ones that are added later).

In terms of annoyance they can both be worked around with a single expression. For the current proposal starting with a full copy is root = this, but if we changed it to start with a copy and a user wants it to start empty then it's simple root = {}.

My personal take is that starting empty makes sense from the large mapping perspective, and it's nice to use the mapping as an explicit list of what the output events look like (although you can't quite derive an entire schema from it). However, I also agree that root = this looks naff and in our case it's worth focusing on the smaller use cases. Therefore I'm happy to go with either approach.

Path Syntax

I've mentioned above but will paste again here that we can go for a jq style path syntax but one thing that's open to confusion is that . on the left hand side means something different from the right hand side.

Using root and this makes that distinction clear (even if it's not immediately obvious what they point to), whereas a very common pitfall of jq is that users will naturally assume in many places that a dot on the left and the right are the same thing. It means the docs for lots of features ends up needing to also explain that same concept continuously, making them quite verbose: https://stedolan.github.io/jq/manual/#Assignment.

However, if we adopt this style we can simply reference their docs (which are pretty good) rather than have to explain it ourselves.

Going with JQ

In terms of going full-in on jq syntax we are only starting with very basic assignments and therefore it's totally possible to match it. However, the spec is huge and so ultimately we're either going to end up with a small subset of the spec or we'll end up using a third party implementation (which is great if we can find one that's suitable).

I'm also curious if we can build off of jq_rs.

We can try it, but from what I've understood this library simply calls out to the C-lib and we'd need to serialize/deserialize events, which will likely make it perform significantly worse than the existing transforms.

lukesteensen · 2020-07-28T13:14:17Z

Wholesale Mapping

It sounds like the relevant questions here are:

Assignment to root keyword vs bare object literal
Quoted vs non-quoted keys

I'm relatively strongly in favor of going straight to the bare object literal syntax for (1), mostly to avoid the problem of keywords that may also exist as user keys. For (2), I don't feel strongly and it seems like something we could easily add later, so maybe we start with quoted keys only since they're unambiguous.

Mapping with Assignments

Here it seems like we just need to decide between defaulting to the original vs an empty object. Given that the transforms we're trying to replace are all mutation-based, I assumed a similar default would make sense. It seems like it doesn't make that big a difference since the mode can be switched quite easily.

Path Syntax

I agree that there's definitely some unfortunate complexity around jq's assignment operators, but I'm not sure that's necessarily a downside to adopting their path syntax. To me, removing the need for keywords is a big benefit. That's one less thing for users to have to learn, and I'd just be waiting for the GitHub issue asking why their data disappeared when they tried to assign to a "root" key.

Going with JQ

This seems like the biggest question. Do we go with an actual subset of jq (confusing assignment and all), or do we just borrow the parts of their syntax that we find convenient?

I would lean towards just taking the parts that we like, at least for the time being. It sacrifices some of the potential compatibility benefits, but true compatibility seems out of reach so maybe it's better not to pretend. For simple use cases we should be able to match jq for that basic level of familiarity, but we shouldn't feel like we need to extend that past what we think makes sense.

For example, jq is very much cli-focused and is designed for single line programs. Two assignments would look like a pipeline:

.foo = "bar" | .bar = "baz"

In our case, the mappings are written in in a file, where the more natural style would be to use multiple lines:

.foo = "bar"
.bar = "baz"

Given that there's almost no chance of us reaching true compatibility with jq, I think it'd make sense to make some early breaks like this to avoid setting the expectation of being exactly the same.

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

binarylogic · 2020-07-29T18:40:44Z

Given that there's almost no chance of us reaching true compatibility with jq, I think it'd make sense to make some early breaks like this to avoid setting the expectation of being exactly the same.

Agree, we can note it's heavily inspired by jq.

binarylogic · 2020-07-29T18:46:46Z

rfcs/2020-07-21-2744-vicscript.md

+Remove fields by assigning them the `delete` keyword:
+
+```text
+.foo = delete
+```


Everything else here is great, but this still feels funky to me. I understand why it's this way, the left-hand side is the "target" and the right-hand side is the "action", but the traditional function syntax seems more intuitive. Going with our jq theme, this would just be del(.foo):

del(.foo)

Does this introduce significant complexity? I'm also curious to get @lukesteensen's opinion.

Yeah, this is another case where I think we can try to avoid keywords and use something less ambiguous like a builtin function.

Got options here, bloblang and IDML use foo = deleted(), so it's a function rather than a keyword which we could copy. Implementing deletions as a right-hand query allows you to use it in all the same ways as regular mappings, which means they can be conditional:

.content = if .type == "bar" { .bar.body } else { deleted() }

And also mapped in an iterator in order to remove elements from arrays, etc:

.filtered_foos = .foo.map(if .ele.value < 10 { .ele.value + 10 } else { deleted() })

However, iterators aren't part of the RFC scope and if we're keeping the spec minimal then I think it's reasonable not to support those cases. It's easy to add a del(.foo) left-hand function, but if we want to enable conditional deletions the same as conditional mappings then we'd need to implement if as a statement you can put on the left-hand side:

if .type == "bar" { del(.bar.id) }

At which point we need to decide whether we bother implementing it as a right-hand expression as they'd be parsed and implemented separately. Doing so is useful for expanding the language later, but not so much if we're keeping the language minimal.

I see what you're saying, and it makes sense, but from a UX standpoint I find the following much more intuitive:

if .type == "bar" { del(.bar.id) }

I feel like I might be suggesting something that is fundamentally at odds with the purpose of this language (performance). If that is the case, then I'd prefer to go with your proposal, otherwise, I find the above clearer. Performance is a key requirement of this language and I don't want to sacrifice that.

Ok, apologies for the back and forth. I've been trying to build consensus off line on the syntax proposed here. Just confirming was we settled on for this RFC:

.foo = "bar" del(.baz)

This would:

Add a root foo key with the value "bar".

Delete the root baz key.

binarylogic · 2020-08-02T15:59:51Z

@Jeffail, given the above, can we:

Wrap up the final changes and merge this?
Rename this to Remapping Syntax RFC. It's likely we'll call this ou remapping language and call the transform remap.

Jeffail · 2020-08-03T08:55:17Z

Cool, I've updated to swap vicscript to remap and added the del(.foo) statement. I've also removed if statements for now and we can break that conversation out elsewhere. The syntax of the transform is currently:

[transforms.foo_remapped]
  inputs = [ "foo" ]
  type = "remap"
  mapping = """
.foo = "hello"
.bar = .bar + 10
.baz_is_large = .baz > 10
"""

Where we have a single field mapping which stutters slightly (remap.mapping), maybe it could be script or something instead.

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

binarylogic

Thanks! Looks great.

Signed-off-by: Brian Menges <brian.menges@anaplan.com>

Add vicscript RFC

fbef810

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

Jeffail force-pushed the vicscript-rfc branch from cb93a96 to fbef810 Compare July 21, 2020 12:51

Jeffail added 2 commits July 21, 2020 14:10

fmt

07804ac

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

Markdown lint

c29ac05

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

Jeffail force-pushed the vicscript-rfc branch from 41959ed to c29ac05 Compare July 21, 2020 14:00

Add object walker mapping

946fbeb

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

binarylogic reviewed Jul 21, 2020

View reviewed changes

bruceg approved these changes Jul 22, 2020

View reviewed changes

rfcs/2020-07-21-2744-vicscript.md Outdated Show resolved Hide resolved

rfcs/2020-07-21-2744-vicscript.md Outdated Show resolved Hide resolved

rfcs/2020-07-21-2744-vicscript/example1.coffee Outdated Show resolved Hide resolved

Reduce scope of vicscript RFC

86cca1e

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

Jeffail force-pushed the vicscript-rfc branch from a42d843 to 86cca1e Compare July 23, 2020 13:43

bruceg reviewed Jul 23, 2020

View reviewed changes

rfcs/2020-07-21-2744-vicscript.md Outdated Show resolved Hide resolved

rfcs/2020-07-21-2744-vicscript.md Outdated Show resolved Hide resolved

rfcs/2020-07-21-2744-vicscript.md Outdated Show resolved Hide resolved

Clarify behavior around copies/mutations

e117e40

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

Jeffail force-pushed the vicscript-rfc branch from eebea7e to e117e40 Compare July 24, 2020 10:31

binarylogic approved these changes Jul 24, 2020

View reviewed changes

bruceg approved these changes Jul 24, 2020

View reviewed changes

lukesteensen reviewed Jul 24, 2020

View reviewed changes

binarylogic self-requested a review July 26, 2020 17:34

binarylogic suggested changes Jul 26, 2020

View reviewed changes

Update to a more jq-like syntax and behavior

b9941ab

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

binarylogic reviewed Jul 29, 2020

View reviewed changes

binarylogic mentioned this pull request Aug 2, 2020

Proposal: Compose Transform #1653

Closed

Rename vicscript transform to remap

2213383

Signed-off-by: Ashley Jeffs <ash@jeffail.uk>

Jeffail force-pushed the vicscript-rfc branch from 816ccc7 to 2213383 Compare August 3, 2020 08:55

binarylogic changed the title ~~chore: Vicscript RFC~~ chore: Remap RFC Aug 3, 2020

binarylogic approved these changes Aug 3, 2020

View reviewed changes

binarylogic merged commit 8b0b1c8 into master Aug 3, 2020

binarylogic deleted the vicscript-rfc branch August 3, 2020 15:52

binarylogic changed the title ~~chore: Remap RFC~~ chore: Remap v1 RFC Aug 6, 2020

vector-vic mentioned this pull request Sep 16, 2020

Log schemas #3910

Closed

mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020

chore: Remap RFC (vectordotdev#3134)

e9f8dc2

Signed-off-by: Brian Menges <brian.menges@anaplan.com>

JeanMertz mentioned this pull request Jan 19, 2021

switch VRL parser from Pest to Nom #6139

Closed


		It also seems like there are some key limitations with the language that we would need to contribute solutions for (or fork). The main one being simple coalescing expressions, I can't find anything in their docs to allow something like `foo = bar.(baz \| buz).bev` without a nasty bunch of nested `match` expressions.

		Similar to JQ the syntax is also far enough removed from basic c-style declarations that there's a learning curve and more difficulty in maintaining syntax highlighters.


		### It's more to support

		A key role for us as Vector maintainers is to assist users with their configs and thereby encourage adoption. It stands to reason that making Vector as a project larger would conflict with our ability to do this. However, we already have a range of transforms for mapping and after reviewing some large example configs ([https://github.com/timberio/vector/issues/1965](https://github.com/timberio/vector/issues/1965)) I think scrapping them in favor of a more condensed and readable language would be overall beneficial to us when we're sporting our support hats.


		### It's a lot of effort

		Vicscript is an entirely new language spec where we'll need to implement both the parser and executor, which is clearly going to be a significant chunk of work. The obvious mitigation to this is to simply try it out and put a time cap on it, then review how far along the project got. This is only a speculative drawback and it's only going to draw us back once (until we decide to rewrite it for fun).


		Mapping events is a core feature of Vector, and once we introduce a performant and more powerful alternative to our existing mapping transforms it's reasonable to expect it to become the standard for small or moderate sized pipelines. As such, I think it's important for us to have strong ownership of this language. It will allow us to build it into exactly the right solution for Vector users. It will also allow us to guarantee its performance and stability in the future, and provide first class support for it.

		Given most of the risk around this idea is simply how difficult it will be I'd say the obvious first step is to test that out with a proof of concept. We can start with a quick implementation of a small subset of the spec and build a parser and execution engine for review. If we end up with nothing, or with a hostile codebase, then we can instead look into adopting other projects and compare.


		### JQ

		JQ is a great tool and rightfully gets a lot of love. It also basically solves the same problem as Vicscript. Unfortunately, it's common knowledge that JQ doesn't score high on readability, and therefore it doesn't scale well at all in relation to mapping size and complexity (imagine trying to solve [https://github.com/timberio/vector/issues/1965](https://github.com/timberio/vector/issues/1965) with it).

chore: Remap v1 RFC #3134

chore: Remap v1 RFC #3134

Conversation

Jeffail commented Jul 21, 2020

binarylogic left a comment

Choose a reason for hiding this comment

binarylogic Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Jeffail commented Jul 23, 2020

binarylogic left a comment

Choose a reason for hiding this comment

bruceg left a comment

Choose a reason for hiding this comment

lukesteensen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binarylogic left a comment • edited Loading

Choose a reason for hiding this comment

Jeffail commented Jul 27, 2020

Wholesale Mapping

Mapping with Assignments

Path Syntax

Going with JQ

lukesteensen commented Jul 28, 2020

binarylogic commented Jul 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

binarylogic commented Aug 2, 2020

Jeffail commented Aug 3, 2020

binarylogic left a comment

Choose a reason for hiding this comment

binarylogic Jul 21, 2020 •

edited

Loading

binarylogic left a comment •

edited

Loading