Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(remap): compile-time program result type checking #4902

Merged
merged 30 commits into from Nov 16, 2020

Conversation

JeanMertz
Copy link
Contributor

@JeanMertz JeanMertz commented Nov 6, 2020

This PR implements RFC #4862.

This PR is best reviewed *per commit*. While the full diff is rather large, most of that comes from numerous tests.

User-Level Changes

  1. does not alter any of the grammar of the Remap language.
  2. does not add any new functions.
  3. does not change anything for the Remap transform
  4. does invalidate certain scripts in certain component conditions.

As for the last point, the one place this currently invalidates certain Remap scripts is if you use remap as a condition.

Take for example:

[transforms.reduce]
    type = "reduce"
    starts_when.type = "remap"
    starts_when.source = "1337"

The Condition trait we use only allows returning boolean values (for performance reasons), so the only thing we could do was to accept the above starts_when condition, but always return false because 1337 is not a boolean.

After this PR, we can check the return value at boot-time, which allows us to return an error and prevent the above example config from being valid, pointing out that starts_when.source needs to resolve to a boolean. A user can use any of the available functions or operations to make sure the return value is a boolean:

  • to_bool(1337) — true
  • 1337 >= 0 — true
  • contains(to_string(1337), substring = "133") — true
  • etc

This PR has no impact on the Remap transform, as the return value of the script isn't relevant to the outcome of the transform (the script is used to manipulate events, not to resolve to a final value).

Program Type Check

When compiling a Remap script, the callee informs the program which results it expects. The following constraints can be defined:

  • fallible or not
  • optional value or not (support for optional values will probably be removed from Remap soon, to make things easier to grok)
  • one or more value type constraints

For example:

TypeDef {
	fallible: false,
	optional: false,
	constraint: Constraint::Any,
}

The above constraint requires the program to be infallible, and return a concrete value type of any kind.

Exact value constraints can be defined as Any, Exact(Boolean). Multiple value kinds are defined as OneOf(vec![Integer, Float]).

The program returns an error at compile-time if the result does not match the expectation, e.g.:

remap error: program error: expected to be infallible, but is not

or

remap error: program error: expected to resolve to string or float values, but instead resolves to an error, or integer or boolean values

The fallible check applies to all (root) expressions in a program (since any failing expression will fail the entire program), whereas the optional and constraint checks only apply to the last expression (the one that resolves to the final value of the program).

Expression Type Definitions

To make the above work, every expression now implements:

fn type_def(&self, state: &state::Compiler) -> TypeDef;

There are several rules that apply:

  • Path queries (.foo.bar) resolve to Any (because we don't use any schemas currently, but that could change with Log schemas #3910), unless a path is assigned a value, in which case the path gets whatever constraint the value has (e.g. .foo = true will type check .foo as a Boolean).
  • Literal values type check to their concrete type.
  • Variable references type-check to whatever value is assigned to them last (e.g. $foo = true; $foo = "bar"; $foo type-checks to a string)
  • Some wrapper-expressions type check to their inner expressions (e.g. a block expression delegates type checking to the final expression in the block, and an if-else statement merges the if and else type checks together)
  • Functions implement their own type_def, and have to make sure the contract between implementation and TypeDef holds.

Future Work

While we're currently only using this information on a program-level, we can (and probably will) use this on an expression-level, to reject certain programs.

For example:

$foo = "bar"
if $foo {
	// ...
}

The above will fail at runtime because "foo" is of kind String, whereas the condition in an if-statement has to resolve to a boolean.

In a future PR, we can (at compile-time) ask the constraints of $foo, and know that it'll never resolve to a boolean, and thus return a compile-time error. To resolve this, one could instead write if to_bool($foo). The same applies to certain operators (e.g. "foo" > 2 would fail to compile).

@JeanMertz JeanMertz added the domain: vrl Anything related to the Vector Remap Language label Nov 6, 2020
@JeanMertz JeanMertz added this to the 2020-10-26: Recognizer milestone Nov 6, 2020
@JeanMertz JeanMertz self-assigned this Nov 6, 2020
@JeanMertz JeanMertz linked an issue Nov 9, 2020 that may be closed by this pull request
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Previously, accessing a path was a fallible operation, and the remap
condition implementation is configured to allow a script to fail (in
which case the condition returns `false`).

Paths queries have been changed to never return an error, but return
`None` if the path doesn't exist.

This means the condition check now also has to accept none values, and
consider them `false`.

This is a temporary change, as optional values are going to be removed
from the Remap language. Non-existing paths will instead resolve to
`null`.

Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
Signed-off-by: Jean Mertz <git@jeanmertz.com>
@JeanMertz JeanMertz marked this pull request as ready for review November 12, 2020 21:34
@StephenWakely
Copy link
Contributor

I do really like this. It is a great start, and gives us a lot of potential to take things much further.

Thinking ahead to template fields, it would be really useful if we could limit template fields to just return Strings (or at least reject Map and Array). The issue is that since paths resolve to Any many template fields that use simple paths would be rejected for example {{ .foo }}, {{ .foo + .bar }}.

Obviously, path schemas would alleviate this issue. But in the absense of those, I think it would be useful to specify a type constraint of String if the type is known, but allow it if it isn't. So, it could allow {{ .foo }} - that would have to be a runtime error if .foo was the wrong type, but it would reject {{ contains("..", "..") }} as that is definitely the wrong type.

@JeanMertz
Copy link
Contributor Author

I think it would be useful to specify a type constraint of String if the type is known, but allow it if it isn't.

Yeah, I like that. I'll make a note and tackle that in a follow-up PR, so that we can use it in the templating work we're doing 👍

@JeanMertz
Copy link
Contributor Author

Merging this, as @FungusHumungus has approved it, and there are quite a few PRs dependent on this one being merged.

I messaged @jszwedko that I'd be happy to receive a post-merge review. I'll tackle any review comments in a follow-up PR, if needed.

@JeanMertz JeanMertz merged commit e2d01b8 into master Nov 16, 2020
@JeanMertz JeanMertz deleted the jean/remap-rfc-4862 branch November 16, 2020 10:59
mengesb pushed a commit to jacobbraaten/vector that referenced this pull request Dec 9, 2020
…4902)

Signed-off-by: Brian Menges <brian.menges@anaplan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve Remap language compile-time checks
2 participants