Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add remaining spec text #39

Merged
merged 8 commits into from
Jan 18, 2024
Merged

Add remaining spec text #39

merged 8 commits into from
Jan 18, 2024

Conversation

eemeli
Copy link
Member

@eemeli eemeli commented Jan 12, 2024

This completes the spec draft for the proposal, so... it's a lot. This PR also presupposed that #29 is accepted.

To make reading the text easier, I've made http://tc39.es/proposal-intl-messageformat/ currently serve content from this PR's branch.

Rather than passing around multiple arguments, the mf, onError and values are now packaged together into a MessageFormatContext Record, along with a cache required for resolving declarations. That cache is required to only read input values once, and to not emit errors for the dependencies of unformatted patterns.

For example, with a message like:

.local $foo = {:broken}
.match {$x :number}
1 {{The one {$foo}}}
* {{Otherwise}}

If this is formatted with { x: 2 }, we should not even check whether the :broken function is available, as it's never used for formatting "Otherwise".

The cache gets a bit more complex as we need to allow for the right fallback behaviour when formatting functions fail. For example with:

Bare {$foo}, annotated {$foo :broken}

If this is formatted with { foo: 42 } and calling :broken fails, the expected result is "Bare 42, annotated {$foo}", so we need to retain the foo name of the variable even after its resolution has succeeded.

The :number and :string implementations are included here. Figuring out how to define anonymous built-in options that themselves create abstract closures was... a learning experience. They need to be recreated separately for each MessageFormat instance as they are available via MF.p.resolvedOptions().functions, and assigning properties on them should not be reflected in other MessageFormat instances.

The :number input handling also gets a little tricky, because we need to support e.g.

{1.234e6 :number minimumSignificantDigits=4 useGrouping=false}

where the literal values 1.234e6, 4, and false get passed in as strings.

@eemeli eemeli mentioned this pull request Jan 12, 2024
Copy link
Collaborator

@dminor dminor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good overall. The biggest question I have is whether we need to specify caching behaviour or if it can be left as implementation detail.

This is a big review, I'll do another pass tomorrow.

<td>Object</td>
<td>
Cache for local variables, which may be resolved during the formatting.
The values in the cache are Objects, each of which will have either [[MessageValue]], [[ResolvedValue]], or [[UnresolvedDeclaration]] internal slots.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this cache to be exposed to the user? This seems like this is something that could be left to implementations rather than specified.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've included this because its presence or absence has an effect on the observable error handling. Because we don't just throw on the first error, we need to account for what happens when formatting a message like:

.local $foo = {:missing-function}
{{First {$foo}, then {$foo} again}}

As currently specified, formatting that without a missing-function will emit one error for the .local declaration. If the cache is left out, then an implementation may emit the same error twice, as there are two {$foo} placeholders.

My presumption is that it's better to avoid such ambiguity, even if the cost of that is a slight increase in spec complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, it's not actually user observable, right? Just more explicit in the spec but that's not a big problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is not directly user observable.

spec.emu Show resolved Hide resolved
</h1>
<dl class="header">
<dt>description</dt>
<dd>It determines the functions available during message formatting.</dd>
</dl>

<emu-alg>
1. ...TODO
1. Let _numberSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-numberfunctions"></emu-xref>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be more readable if you added a CreateMessageFormatNumberFunction abstract operation, instead of having the _numberSteps_ followed by CreateBuiltinFunction.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I actually had them that way in an earlier draft. Happy to do that, esp. if @ryzokuken agrees.

The current function steps definition language is modelled after that used for e.g. Promise Resolve Functions and Number Format Functions. The language on this line follows what seemed like the most common way of calling CreateBuiltinFunction(), though Intl.NumberFormat.prototype.format does something a bit different:

Let F be a new built-in function object as defined in Number Format Functions (15.5.2).

My main reason for this approach was that this way the :number and :string definitions can be left without the boilerplate code that's now here. But my opinions on this are not at all strong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has me equally stumped. On the one hand I agree with @dminor that an AO would be much more readable and less arcane than this.

On the other, I understand the connection to the other similar parts of the spec. I still lean towards AOs over dynamically creating ES builtin functions because my understanding of the MF2 builtin registry functions is that they'd not be actually dynamic and therefore we'll be fine by using an AO here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added #42 to continue this.

1. ...TODO
1. Let _numberSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-numberfunctions"></emu-xref>.
1. Let _number_ be CreateBuiltinFunction(_numberSteps_, *3*, *"number"*, « »).
1. Let _stringSteps_ be the algorithm steps defined in <emu-xref href="#sec-messageformat-stringfunctions"></emu-xref>.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea here, I'd prefer a CreateMessageFormatStringFunction.

Copy link
Collaborator

@dminor dminor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me :)

spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
1. Let _type_ be ? Get(_value_, *"type"*).
1. If _type_ is *"literal"*, then
1. Let _strValue_ be ? Get(_value_, *"value"*).
1. Modify _strValue_ such that all *"\"* characters are replaced with *"\\"*.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe pedantic, but is it really possible to modify strValue? I thought we'd have to make a copy with the changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually very tempted to leave this out and add an <emu-note type="editor"> about this being a necessary step, as I'm not certain how to better express this in spec language:

strValue = strValue.replace(/[\\|]/g, '\\$&');

spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
eemeli and others added 2 commits January 17, 2024 00:06
Co-authored-by: Daniel Minor <dminor@mozilla.com>
Copy link
Member

@ryzokuken ryzokuken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for a big PR, the spec text is complex but in a way that accurately represents the complexity of this proposal. Some comments inline but nothing major.

1. Let _stringValue_ be _mv_.
1. Else,
1. For each element _el_ of _msg_, do
1. If Type(_el_) is String, then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further down below in the description of the types like Number, String and Object, they contain definitions to explicit type checks like "is not a number". From what I can observe, the verbose Type(x) syntax is always used when there's some dynamic path involved, either checking if an arbitrary type is part of a collection or if the types of two things are the same.

TL;DR this can be simplified but I'm just nitpicking at this point.

Suggested change
1. If Type(_el_) is String, then
1. If _el_ is a String, then

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I changed all of these from the "x is a String" format to "Type(x) is String" based on @dminor's review.

So rather than toggling them back, I'm going to stick with the current form for now, and once we figure out how they really ought to look like, we can do a separate editorial pass as a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added #41 to continue this.

spec.emu Show resolved Hide resolved
spec.emu Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
<li>
[[RequestedLocales]] is a List of String values
with the canonicalized language tags of the requested locales
to use for message formatting.
</li>
<li>[[Functions]] is an Object ...TODO</li>
<li>[[Functions]] is an Object with function object values.</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since [[Functions]] is an internal slot, wouldn't it be better to make it a Record containing function objects as values instead of a plain JS object?

<td>Object</td>
<td>
Cache for local variables, which may be resolved during the formatting.
The values in the cache are Objects, each of which will have either [[MessageValue]], [[ResolvedValue]], or [[UnresolvedDeclaration]] internal slots.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, it's not actually user observable, right? Just more explicit in the spec but that's not a big problem.

spec.emu Show resolved Hide resolved
spec.emu Outdated Show resolved Hide resolved
Comment on lines +955 to +956
1. Else if an appropriate notification mechanism exists, then
1. Issue a warning for _error_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say if this is doable, but let's see 🤞

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is there prior art for this? Maybe I'm just unaware.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some precedent:

Comment on lines +894 to +896
1. If _kind_ is *"open"*, return the string-concatenation of *"#"* and _name_.
1. If _kind_ is *"standalone"*, return the string-concatenation of *"#"*, _name_, and *"/"*.
1. If _kind_ is *"close"*, return the string-concatenation of *"/"* and _name_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, wasn't this + and - for open and close instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was, but the sigils were changed in unicode-org/message-format-wg#541.

@eemeli eemeli merged commit 1720950 into main Jan 18, 2024
3 checks passed
@eemeli eemeli deleted the specify branch January 18, 2024 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants