Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When do we evaluate the local variables? #299

Closed
mihnita opened this issue Sep 15, 2022 · 16 comments
Closed

When do we evaluate the local variables? #299

mihnita opened this issue Sep 15, 2022 · 16 comments
Labels
blocker-candidate The submitter thinks this might be a block for the Technology Preview

Comments

@mihnita
Copy link
Collaborator

mihnita commented Sep 15, 2022

It does not matter if we do the evaluation in one go or in two steps, at some point one needs to go from the arguments to the final result (parts, in format to parts, or string)

And part of that is evaluating the local variables.

When do we do it?
It is not simply an implementation detail, as the evaluation might trigger errors / exceptions.
So this decision would affect the timing of these kind of errors.

Example 1:

let $exp = {$expiration :date skeleton=abcdEfGhijklmnopqRSTUVwxyz}

This would fail, because the skeleton contains invalid characters.
But only the function knows what's valid and what's not.
So you will get an exception when you invoke the function.

One might say: you should validate earlier, using some regex from the registry, without invoking the function.
But we also support this:

let $exp = {$expiration :date skeleton=$realSkeleton}

And we only know $realSkeleton at evaluation time (might even be another local variable?).

Option 1: when it is "defined"

Option 2: first time it is referred (lazy eval + memoization)

Option 3: implementation dependent.

In the ICU4J implementation I went with 2.

Some benefits for 2 (why I chose it):

Might waste time, because not all of them may be used by the selected pattern (if we have selectors).
And might prevent supporting some use cases.

For example:

let $exp = {$expirationDate :datetime skeleton=yMMMMd}
let $due = {$amount :number unit=currency precision=2 rounding=up currency=$curCode}
match {$expirationDate :isDefined}
  when true {You are due {$due} by {$exp}.}
  when * {You are due {$due}.}

As you can see $exp is only used on the true branch.
Even more, the evaluation of $exp might even fail if expirationDate is undefined (not passed as argument, or is null).
And the condition protects us from this.

Even if we ask that the functions never fail (is that a good idea? puts more restrictions on how one writes the functions).
option 1 means that we will evaluate $exp even if branch * is selected. These are wasted cycles.


Side effect: naming
Late evaluation makes "variable" a bit confusing.
In most programming languages variables are evaluated when assigned (the right-side is evaluated before assignment).

That is why in my proposals this was "macro" or "alias"

@eemeli
Copy link
Collaborator

eemeli commented Sep 17, 2022

I'd prefer failing as late as possible when formatting, as this allows for usage as in the $expirationDate example above. This probably matches "Option 2", if I've understood right.

On the other hand, I would presume that validation/linting could well present the error much earlier, so that e.g. attempting to assign an invalid literal option value could be noted.

@stasm
Copy link
Collaborator

stasm commented Sep 23, 2022

I think that in principle the evaluation should be lazy. Specifically:

  • Expressions should be evaluated right before they're formatted (to string or to parts).
  • When referred to inside other expressions, they should be passed into the function without being evaluated. The function may then choose to evaluate if needed.

I'll try to illustrate with an example message. Please bear with me, this is going to be long and complex.

The goal of the message is to be able to construct sentences like You own 3 red crayons. and You own 5 blue pencils. (I'm ignoring the singular number for the sake of the example.) In Polish, the noun (crayons, pencils) must appear in plural accusative; the adjective (red, blue) must agree with the noun on the gender, number, and case. In other words, in You own 3 red crayons, red must be a feminine, plural accusative.

The message takes three arguments:

  • $id: "OBJECT_CRAYON" | "OBJECT_PENCIL"
  • $color: "COLOR_RED" | "COLOR_BLUE"
  • $amount: float

The message:

$myCount = {$amount :number maximumFractionDigits=0}
$myName = {$id :noun case=accusative number=$myCount}
{You own {$myCount} {$color :adjective accord=$myName} {$myName}.}

Going over this step by step:

  1. $myCount = {$amount :number maximumFractionDigits=0}

    A local variable definition. No evaluation happens at this stage.

  2. $myName = {$id :noun case=accusative number=$myCount}

    Another definition of a local variable. Again, $myName doesn't have an evaluate value just yet. There's also a reference to $myCount here; it doesn't cause the evaluation of $myCount. Instead, the local variable definition is passed, to capture the intent rather than the evaluated value.

    In other words, I'd imaging the :noun function to receive one of the following dictionaries as its options bag; the implementation should be able to choose which one it is:

    a) $myCount is passed as-is, without looking up the value of $amount. It's up to :noun to look it up (when :noun is called in step 5). This implies that :noun has access to the arguments passed to the message, but also simplifies the signature of :noun's implementation because the type of the number option is known upfront.

     {
         "case": "accusative",
         "number": {
             argument: Identifier {"name": "$amount"},
             function: ":number",
             options: {"maximumFractionDigits": 0}
         }
     }
    

    b) $myCount is passed with the $amount variable already resolved.

     {
         "case": "accusative",
         "number": {
             argument: 3.0f,
             function: ":number",
             options: {"maximumFractionDigits": 0}
         }
     }
    

    Thanks to $myCount's being passed lazily, unevaluated, the :noun function will be able to inspect the value of $amount, despite its being passed wrapped inside the $myCount definition. Thus, :noun can choose the correct grammatical number for the object's name.

  3. ... You own {$myCount} ...

    Here, we're about to format $myCount, which means it must be evaluated. This is the first time that the :number function from $myCount's definition is being called.

  4. ... {$color :adjective accord=$myName} ...

    Here, $myName is passed into :adjective without being evaluated, similar to $myCount above. This allows :adjective to do 2 things:

    a) Look up the grammatical gender of the object. The gender is an inherent property of the word, so it cannot be given as an option. Instead, it must be defined in a glossary together with the translation.

    b) Inspect the case and number options of $myName's definition.

    These two steps allow :adjective to agree the translation of the color with the gender, number and case of the object.

  5. ... {$myName}. ...

    Here, we finally get to format $myName. This will call :noun, which will in turn inspect $myCount as described in step 2, and look up the case-proper and number-proper form of the translation.

@macchiati
Copy link
Member

macchiati commented Sep 23, 2022 via email

@stasm
Copy link
Collaborator

stasm commented Sep 23, 2022

Thanks for your thoughts, @macchiati.

The messages have to be usable by implementations that don't have that support, or don't have it for all their languages.

My goal here is allow message to express complex grammatical relationships. The tool I'm choosing to achieve this goal is lazy evaluation. I'm open to discussing other solutions if you think that lazy evaluation is not feasible.

My operating assumption is that for grammatical agreement to be possible we need to be able to inspect the options used in an expression inside other expressions. The two ways discussed by the group were:

  1. passing the entire lazily-evaluated binding, e.g. {$color :adjective accord=$myName}, so that the outer expression can inspect the configuration of the inner one (here, :adjective can inspect the options passed to :noun),

  2. passing specific parts of the configuration from one expression to another one by one, either directly: {$color :adjective case=accusative ...} or {$color :adjective case=$myName.case ...} (but MF2 doesn't have member expressions), or indirectly: {$color :adjective case=$myName ...}.

My opinion is that (2) is more verbose and more error-prone. Furthermore, it still doesn't solve the question of how to pass the gender of the object to format the color. The gender isn't a configuration knob; it's an inherent property of the object name, and as such there needs to be a lookup involved and it should happen inside :adjective.

Moreover, I suspect that this scenario is actually more complicated than it would appear.

Absolutely, this was just an example, and it's already longer than I wish it was, so I attempted to simplify :)

@macchiati
Copy link
Member

macchiati commented Sep 23, 2022 via email

@eemeli
Copy link
Collaborator

eemeli commented Sep 25, 2022

Would this example be perhaps easier to reason about?

let $appleCount = {$amount :number minimumFractionDigits=1}
match {$appleCount :plural}
when one {You have {$appleCount} apple}
when  *  {You have {$appleCount} apples}

My expectation would be that in English, the evaluation of that message would always select the * option. Do others share this expectation, or should e.g. the :plural fail because it's getting a "formatted number" rather than an actual number as its input?

Given that we're not intending to define :number or :plural in the MF2 spec, I would think that the spec language should not explicitly mandate how either approach should be implemented. For plurals in particular, it would even be possible to re-parse a formatted-number string argument to its component parts for the CLDR rule calculations, though that does of course depend on the locale.

In the JS implementation, I've solved this by considering the evaluated value of $appleCount to be a "MessageNumber", an object that holds its resolved argument and options values. It's not completely lazy, as its resolution has looked up the $amount value and is holding that directly rather than a getter for it -- but that's an implementation detail. This allows for the selection to be done based on the value + options needed for plural rule selection, while the actual formatting (either to a string or parts) has similar access to the value and the relevant options.

Given the wide scale of potential users, I think I agree with @macchiati that we shouldn't mandate either behaviour, and support an ecosystem where e.g. an ICU function registry and a JS function registry would behave differently with the above message.

@macchiati
Copy link
Member

macchiati commented Sep 25, 2022 via email

@stasm
Copy link
Collaborator

stasm commented Sep 26, 2022

Perhaps we're all trying to achieve the same thing, but had imagined different ways to do it? In the You have X apples example, we want :plural to be able to determine that $amount has been or will be formatted as 1.0. This can be achieved by:

  1. Lazy evaluation (@mihnita and @stasm).
  2. Intermediate result (@eemeli and @macchiati).
  3. Reparsing the formatted string result.

No one is advocating for (3) but I'm putting it in the list as a potential implementation choice. As long as the end result is You have 1.0 apples. That said, reparsing wouldn't be a good choice in the case of the example I gave, where it's unlikely it would be able to reverse engineer grammatical properties of a noun from a formatted string result.

@macchiati
Copy link
Member

macchiati commented Sep 27, 2022 via email

@mihnita
Copy link
Collaborator Author

mihnita commented Oct 16, 2022

I don't think 1 and to are orthogonal.
1 is about when you do it, 2 is about what you do.

And I don't think that the @eemeli and @macchiati options are the same.

In fact I am not even sure the @eemeli option even answers the question.
The "resolved value" is something that still does not "know" what the result is, because the formatting function was not invoked yet.

You can only make a decision after you invoke the formatting function:
"That is, the plural categorization must be performed on the formatted number, ..."

We should not think :number & :plural, but new, custom functions.

I would really like to see the JS implementation without any knowledge of plural / number / date and so on.
And then implement those in unit tests.

That would really show that the "engine" is does not hard-code any special knowledge about the functions.
It would mean that one can add any function (including new shims for the standard) without touching the engine
(which is often harder to update, for example in node.js, or the browser, or a WebWidget in some OS / framework).

@eemeli
Copy link
Collaborator

eemeli commented Oct 24, 2022

It seems to me that we're mixing here concerns from multiple different layers of the implementation. To get at something like a root of this, could we see if we could agree at least on the following statement?

In a message defined using MF2 syntax, it should be possible to ensure that the same basket of options is used when selecting and formatting a message.

A couple of clarifiers are perhaps in order here:

  1. This is referring specifically to some way in the syntax to ensure that the same options are used.
  2. A requirement that the same options must be used would be a stronger form of this statement, and is not precluded here.
  3. No reference is intended here to any specific solutions that might achieve this state of affairs.

@mihnita:
I would really like to see the JS implementation without any knowledge of plural / number / date and so on.
And then implement those in unit tests.

This is available via the runtime option of my implementation. That's used for instance by my MF1/MF2 compatibility package, as it uses a different set of runtime functions.

@mihnita mihnita added the blocker-candidate The submitter thinks this might be a block for the Technology Preview label Nov 3, 2022
@stasm
Copy link
Collaborator

stasm commented Jul 6, 2023

When I expressed my support for some sort of lazy evaluation strategy, my goal was to allow function implementations to access information about the precedeing transformations that a variable went through. An example use-case for this is: in a You have {$amount} apples message, we want the :plural selector to be able to determine that $amount has been or will be formatted as 1.0.

I saw lazy evaluation as one way to achieve this, but when I implemented https://github.com/stasm/message2 I realized that it can be also done by passing resolved values in wrappers carrying the formatting configuration on the runtime. Here's how the :plural selector inspects the formatting options of the $amount: https://github.com/stasm/message2/blob/4abf43f2023b6e20d8ee1d462684d0741ece791b/registry/plural.ts#L21-L30

@cdaringe
Copy link
Contributor

Option 2. Give the opportunity to implementations that need max performance the ability to achieve max performance.

@catamorphism
Copy link
Collaborator

Option 2. Give the opportunity to implementations that need max performance the ability to achieve max performance.

@cdaringe Re: Option 2: you might be interested in reading the comments on #413 , where I explained how lazy evaluation with memoization (i.e. call-by-need) affects the meaning of variable resolution. You might wish to chime in there if you think the added complexity is worth it to allow implementations to maximize performance. (Note: memoization itself involves overhead, so I personally don't have a good analysis at this time of which of the three possible evaluation strategies would lead to the best performance, but maybe you have more thoughts on that.)

@aphillips
Copy link
Member

Note: this might be partly addressed by #469 (which doesn't directly address when evaluation occurs, but does address mutability and does at least ensure that modify follows let. Re-reading this issue reminds me to made some modifications to that design doc.

aphillips added a commit that referenced this issue Sep 4, 2023
@aphillips
Copy link
Member

Addressed by #476

aphillips added a commit that referenced this issue Oct 9, 2023
* Design document for variable mutability and namespacing

* style: Apply Prettier

* Partly address #299

* style: Apply Prettier

* Address comments, fix sigil choice

- change `@` to `#` because we want to use `@` for annotations such as `@locale`
- Provide text that considers not making ugly local variables
- Provide use cases for static analysis
- Call out the perfidy of the author in stealing ill-baked requirements

* style: Apply Prettier

* Add @eemelie's `input` proposal as an option considered

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Add new proposed design

* Update exploration/variable-mutability.md

Co-authored-by: Addison Phillips <addison@unicode.org>

* Address @eemeli's comments

Specifically the one about forward references

* style: Apply Prettier

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
aphillips added a commit that referenced this issue Oct 11, 2023
* Create notes-2023-10-02.md (#486)

* Design document for variable mutability and namespacing (#469)

* Design document for variable mutability and namespacing

* style: Apply Prettier

* Partly address #299

* style: Apply Prettier

* Address comments, fix sigil choice

- change `@` to `#` because we want to use `@` for annotations such as `@locale`
- Provide text that considers not making ugly local variables
- Provide use cases for static analysis
- Call out the perfidy of the author in stealing ill-baked requirements

* style: Apply Prettier

* Add @eemelie's `input` proposal as an option considered

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Add new proposed design

* Update exploration/variable-mutability.md

Co-authored-by: Addison Phillips <addison@unicode.org>

* Address @eemeli's comments

Specifically the one about forward references

* style: Apply Prettier

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

* Update exploration/variable-mutability.md

Co-authored-by: Eemeli Aro <eemeli@mozilla.com>

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>

* Create notes-2023-10-09.md

* Update notes-2023-10-09.md

* Remove the Prettier push action (#491)

Remove the Prettier lint action

* Remove numbers from the existing design proposals (#490)

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Eemeli Aro <eemeli@mozilla.com>
Co-authored-by: Eemeli Aro <eemeli@gmail.com>
Co-authored-by: Stanisław Małolepszy <sta@malolepszy.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker-candidate The submitter thinks this might be a block for the Technology Preview
Projects
None yet
Development

No branches or pull requests

7 participants