Modularity in tremor script and query #21

darach · 2020-04-21T13:40:10Z

No description provided.

text/0010-modularity.md

Licenser

👍

m0n0chr0m3 · 2020-04-23T16:09:59Z

I like this RFC. It tackles a problem that clearly exists, and does it in a principled way. The whole spectrum of the problem is well devised, from the language-theoretic aspects around the tricky issue of allowing recursion, to the practical issues like TREMOR_PATH.

In the interest of improving it even further, I provide some "C"s to the "RFC" below.

Detailed Comments

On the Preprocessor

I dislike the way that the preprocessor is made part of "the specification" in this RFC. Using a preprocessor as a transparent way of adding modules to Tremor Script/Trickle is great. However, the specification of a #!line and #!config preprocessor directive cements "the preprocesser" as a part of Tremor; as something that has to be supported in a backwards-compatible way forever.

In many cases, inlining a module is a great solution. On the other hand, I can envision situations where a different semantics would be desired. By the specification of this RFC, if a large (subtree of) module(s) is used from many scripts in the same Tremor deployment, the runtime has to duplicate all data pertaining to those modules, including constants and AST nodes, just because the specification says so. As far as I can tell, the only requirements that this RFC should impose are that

the use-chain is tree-shaped (i.e., no cycles),
the root of each tree is a non-module file, and that
the nodes of the tree do not change after initial resolve.

But it's likely that I'm missing some insight. Perhaps related to that, I don't fully get what is meant by

The latter constraint is to ensure that logic deployed into the runtime is always traceable to source loaded by a user.

On side-effects in Tremor Scripts

The RFC, including the statement

By design constraint at this time, tremor-script is biased towards pure side-effect free functional programming.

leaves open the possibility for side-effecting code in Tremor Script. In practice, only very constrained forms of side-effects exist in Tremor Script. I might be missing something, so to explain my reasoning: emit and drop act as early returns, not side-effects. Re-assignments to variables is logically equivalent to declaring new variables due to the limitations on loops. patches logically yield a new value. The only way to have proper side-effecting actions, is by linking custom natively implemented functions (= "intrinsic"/"built-in"?) into the runtime (and — if we're being pedantic — the functions in the std::random/tremor::origin/tremor::system namespaces are technically not referentially-transparent, yet still deterministic in the context of a certain sequence of events), but it's not clear to me to what extent adding side-effecting functions is supported (e.g., "you can technically do it, but if it breaks it's your problem").

The RFC also states that

side effecting operations such as with the emit or drop are also not allowed in functions.

As mentioned, I don't even really consider these to be side-effects: from the perspective of the script they're just fancy returns. All in all, the RFC leaves the status of side-effects in Tremor Script/Trickle underspecified. Is it the intent that the new functions are purely functional? More to the point: would, e.g., adding an optimizer that assumes that referential transparency holds, be backwards compatible with Tremor Script/Trickle code that is written to this RFC's specification?

On Partial Functions

The part of the RFC specifying "Matching Functions" states that

The match form supports partial functions.

but also

The matching function form imposes a default case requirement so that unmatches cases have error handling defined. Unlike match expressions the default case in user defined functions must not ( and can not ) be omitted.

Does the first quotation refer to match expressions? If not, both quotations appear to be at odds: if a default case must be present, then the matching function automatically becomes a total function, no?

On Recursion

Obviously, not allowing multiple recursion, limits the expressiveness. On the other hand, embedded Tremor scripts probably aren't the place to do a lot of tree-traversals anyway. I support the choice of limiting loops to only for comprehensions and tail recursion. Similarly, higher-order functions seem like a can of worms you shouldn't open unless you have a clear use-case (do make sure that the Tremor Script documentation makes clear that functions are not values that can be assigned to variables!).

A minor "paper-cut" with the current approach of defining functions only when their definition ends, is that the order in which functions are defined in a module matters. Do you know whether your users will expect that? (In any case, this behavior is not specified in the RFC document, and the usability could easily be improved at a later point by adding a specific error message, should it turn out users are confused by this.)

On Terminating when Depth > RECURSION_LIMIT

The hardcoded recursion depth feels like the roughest edge in the RFC. It is the only point where the pragmatism threatens the model. It's difficult to add backwards-compatible improvements later, when currently the language semantics boils down to "don't recurse too often, eh". Any theshold that's picked (at language-runtime-compile-time!) will be either too strict for simple recursive functions (say, a factorial), too loose to ensure good throughput with expensive recursive functions, or both. A function that works fine in a test environment might fail in production because the threshold there is lower, or even because the end-user-provided data gives rise to more recursive calls.

What's perhaps missing in the RFC is an exposition on the rationale behind this design decision. In decreasing order of strictness, the goal could be

to ensure that every event is processed in a specified time frame,
to ensure that every event is either processed, or dropped due to timeout, in a specified time frame,
to ensure that every event is processed in a specified number of steps,
to ensure that every event is either processed, or dropped due to timeout, in a specified number of steps,
to ensure that each function is either processed, or dropped due to timeout, in a specified number of steps, or
to ensure that each function terminates for all its inputs (either statically or by, e.g. run-time size-change termination).

As far as I understand the RFC and its implementation, it currently targets the fifth option. (Is my assessment correct, and) does this behavior align with the expectations of your users?

Summary of Comments

Specifying the preprocessor in this way burdens future evolution paths for little gain right now.
The RFC leaves the question unanswered whether Tremor Script functions are envisioned to be and remain purely functional.
The description on partial functions might be written in a confusing way, or I might have to turn in my CS degree...
The order in which functions are defined matters in the implementation, which is suboptimal but fine.
The recursion depth is arbitrary, but the RFC also does not state what exactly the goal is.

Thanks for the awesome work putting out this RFC and its implementation.

Licenser · 2020-04-23T17:22:07Z

Heya thank you so much for the feedback :) I'll go over it fully tomorrow just a quickly reply here:

Specifying the preprocessor in this way burdens future evolution paths for little gain right now.

Good point! we should probably point out that this is an implementation detail and not to be dependent on!

The RFC leaves the question unanswered whether Tremor Script functions are envisioned to be and remain purely functional.

The description on partial functions might be written in a confusing way, or I might have to turn in my CS degree...

You are in luck you don't have to return your degree, this is confusing we need to rephrase that!

The order in which functions are defined matters in the implementation, which is suboptimal but fine.

That's partially based on the implementation we picked but it has the upside that it naturally protects against multi-function recursion that would potentially break the enforced recursion limit and tail-call recursion only. It's a good point that it might feel unnatural to users, I think since both Darach and I have a good bit of erlang background this felt 'natural' to us. That all said this restriction does significantly simply the logic and code, it's a good question however if that's the right tradeoff and it at least should be discussed/spelled out in the RFC!

The recursion depth is arbitrary, but the RFC also does not state what exactly the goal is.

It is, we should make this at least configurable via a parameter to the runtime instead of hard coding it. An alternative would be a #!config directive that defines it per script?

text/0010-modularity.md

c270564

darach added the enhancement New feature or request label Apr 21, 2020

darach requested a review from Licenser April 21, 2020 13:40

Pin RFC PR to rfc text

0c44975

Licenser reviewed Apr 21, 2020

View reviewed changes

text/0010-modularity.md Outdated Show resolved Hide resolved

Licenser added 2 commits April 21, 2020 18:17

Fix markdown errors

3d7857c

Fix markdown errors

94ab845

Licenser approved these changes Apr 21, 2020

View reviewed changes

Add note about script mod in trickle

57d6098

Licenser mentioned this pull request Apr 22, 2020

Modularity RFC tremor-rs/tremor-runtime#174

Merged

Licenser added 3 commits April 24, 2020 13:27

Clarify based on feedback

82a0630

Note changability of the recursion limit

2d45eda

Add section on tuple patterns

8c7c08d

darach added the active The RFC is active and the implementation is a work in progress label Apr 28, 2020

darach merged commit cc031c9 into master Apr 28, 2020

Licenser deleted the rfc-0010-modularity branch August 10, 2020 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modularity in tremor script and query #21

Modularity in tremor script and query #21

darach commented Apr 21, 2020

Licenser left a comment

m0n0chr0m3 commented Apr 23, 2020

Licenser commented Apr 23, 2020

Modularity in tremor script and query #21

Modularity in tremor script and query #21

Conversation

darach commented Apr 21, 2020

Licenser left a comment

Choose a reason for hiding this comment

m0n0chr0m3 commented Apr 23, 2020

Detailed Comments

On the Preprocessor

On side-effects in Tremor Scripts

On Partial Functions

On Recursion

On Terminating when Depth > RECURSION_LIMIT

Summary of Comments

Licenser commented Apr 23, 2020