New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Editorial: Formally disambiguate the non-Annex-B grammar #1727
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10858,10 +10858,13 @@ <h2>Syntax</h2> | |
CommonToken :: | ||
IdentifierName | ||
Punctuator | ||
NumericLiteral | ||
NumericLiteral [lookahead ∉ DecimalDigit] [lookahead ∉ IdentifierStart] | ||
StringLiteral | ||
Template | ||
</emu-grammar> | ||
<emu-note> | ||
<p>The lookahead restrictions for |NumericLiteral| require that source text like `3in` is rejected rather than processed as the two input elements `3` and `in`.</p> | ||
</emu-note> | ||
<emu-note> | ||
<p>The |DivPunctuator|, |RegularExpressionLiteral|, |RightBracePunctuator|, and |TemplateSubstitutionTail| productions derive additional tokens that are not included in the |CommonToken| production.</p> | ||
</emu-note> | ||
|
@@ -11164,10 +11167,6 @@ <h2>Syntax</h2> | |
HexDigit :: one of | ||
`0` `1` `2` `3` `4` `5` `6` `7` `8` `9` `a` `b` `c` `d` `e` `f` `A` `B` `C` `D` `E` `F` | ||
</emu-grammar> | ||
<p>The |SourceCharacter| immediately following a |NumericLiteral| must not be an |IdentifierStart| or |DecimalDigit|.</p> | ||
<emu-note> | ||
<p>For example: `3in` is an error and not the two input elements `3` and `in`.</p> | ||
</emu-note> | ||
<p>A conforming implementation, when processing strict mode code, must not extend, as described in <emu-xref href="#sec-additional-syntax-numeric-literals"></emu-xref>, the syntax of |NumericLiteral| to include <emu-xref href="#prod-annexB-LegacyOctalIntegerLiteral"></emu-xref>, nor extend the syntax of |DecimalIntegerLiteral| to include <emu-xref href="#prod-annexB-NonOctalDecimalIntegerLiteral"></emu-xref>.</p> | ||
|
||
<emu-clause id="sec-static-semantics-mv"> | ||
|
@@ -30493,24 +30492,20 @@ <h2>Syntax</h2> | |
<ZWJ> | ||
|
||
RegExpUnicodeEscapeSequence[U] :: | ||
[+U] `u` LeadSurrogate `\u` TrailSurrogate | ||
[+U] `u` LeadSurrogate | ||
[+U] `u` TrailSurrogate | ||
[+U] `u` NonSurrogate | ||
[+U] RegExpUnicodeSurrogatePair | ||
[+U] [lookahead ∉ RegExpUnicodeSurrogatePair] `u` Hex4Digits | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Could this instead be written as
? That feels clearer to me, if it's equivalent. (And if it's not, I'm confused.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is invalid (see below), but even with refactoring would still be a lookahead of six code points. And it's not more clear to me, but I would be willing to switch to it if there's consensus. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Currently, in the ES grammars, a lookahead-constraint either: So the right-hand-side:
would be quite unusual, in that the lookahead-constraint has to "look through" the terminal
as it eliminates the "look through" and winds up with a fairly standard end-of-RHS constraint. (But yes, the nature of its lookahead-sequence would require tweaking 5.1.5 Grammar Notation.) Also, I think I'd prefer it to come after the Lead+Trail right-hand-side:
(The other thing I like about this solution is that just those two lines make it really obvious why the lookahead-constraint is needed.) |
||
[~U] `u` Hex4Digits | ||
[+U] `u{` CodePoint `}` | ||
</emu-grammar> | ||
<p>Each `\\u` |TrailSurrogate| for which the choice of associated `u` |LeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |LeadSurrogate| that would otherwise have no corresponding `\\u` |TrailSurrogate|.</p> | ||
<emu-grammar type="definition"> | ||
|
||
RegExpUnicodeSurrogatePair :: | ||
`u` LeadSurrogate `\u` TrailSurrogate | ||
|
||
LeadSurrogate :: | ||
Hex4Digits [> but only if the SV of |Hex4Digits| is in the inclusive range 0xD800 to 0xDBFF] | ||
|
||
TrailSurrogate :: | ||
Hex4Digits [> but only if the SV of |Hex4Digits| is in the inclusive range 0xDC00 to 0xDFFF] | ||
|
||
NonSurrogate :: | ||
Hex4Digits [> but only if the SV of |Hex4Digits| is not in the inclusive range 0xD800 to 0xDFFF] | ||
|
||
IdentityEscape[U] :: | ||
[+U] SyntaxCharacter | ||
[+U] `/` | ||
|
@@ -30866,42 +30861,26 @@ <h1>Static Semantics: CharacterValue</h1> | |
<emu-alg> | ||
1. Return the numeric value of the code unit that is the SV of |HexEscapeSequence|. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeEscapeSequence :: `u` LeadSurrogate `\u` TrailSurrogate</emu-grammar> | ||
<emu-alg> | ||
1. Let _lead_ be the CharacterValue of |LeadSurrogate|. | ||
1. Let _trail_ be the CharacterValue of |TrailSurrogate|. | ||
1. Let _cp_ be UTF16Decode(_lead_, _trail_). | ||
1. Return the code point value of _cp_. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeEscapeSequence :: `u` LeadSurrogate</emu-grammar> | ||
<emu-alg> | ||
1. Return the CharacterValue of |LeadSurrogate|. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeEscapeSequence :: `u` TrailSurrogate</emu-grammar> | ||
<emu-alg> | ||
1. Return the CharacterValue of |TrailSurrogate|. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeEscapeSequence :: `u` NonSurrogate</emu-grammar> | ||
<emu-alg> | ||
1. Return the CharacterValue of |NonSurrogate|. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeEscapeSequence :: `u` Hex4Digits</emu-grammar> | ||
<emu-alg> | ||
1. Return the Number value for the MV of |Hex4Digits|. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeEscapeSequence :: `u{` CodePoint `}`</emu-grammar> | ||
<emu-alg> | ||
1. Return the Number value for the MV of |CodePoint|. | ||
</emu-alg> | ||
<emu-grammar> | ||
RegExpUnicodeEscapeSequence :: `u` Hex4Digits | ||
|
||
LeadSurrogate :: Hex4Digits | ||
|
||
TrailSurrogate :: Hex4Digits | ||
|
||
NonSurrogate :: Hex4Digits | ||
</emu-grammar> | ||
<emu-alg> | ||
1. Return the Number value for the MV of |HexDigits|. | ||
1. Return the Number value for the MV of |Hex4Digits|. | ||
</emu-alg> | ||
<emu-grammar>RegExpUnicodeSurrogatePair :: `u` LeadSurrogate `\u` TrailSurrogate</emu-grammar> | ||
<emu-alg> | ||
1. Let _lead_ be the CharacterValue of |LeadSurrogate|. | ||
1. Let _trail_ be the CharacterValue of |TrailSurrogate|. | ||
1. Let _cp_ be UTF16Decode(_lead_, _trail_). | ||
1. Return the code point value of _cp_. | ||
</emu-alg> | ||
<emu-grammar>CharacterEscape :: IdentityEscape</emu-grammar> | ||
<emu-alg> | ||
|
@@ -41454,11 +41433,9 @@ <h1>Regular Expressions</h1> | |
<emu-prodref name=RegExpIdentifierStart></emu-prodref> | ||
<emu-prodref name=RegExpIdentifierPart></emu-prodref> | ||
<emu-prodref name=RegExpUnicodeEscapeSequence></emu-prodref> | ||
<p>Each `\\u` |TrailSurrogate| for which the choice of associated `u` |LeadSurrogate| is ambiguous shall be associated with the nearest possible `u` |LeadSurrogate| that would otherwise have no corresponding `\\u` |TrailSurrogate|.</p> | ||
<p> </p> | ||
<emu-prodref name=RegExpUnicodeSurrogatePair></emu-prodref> | ||
<emu-prodref name=LeadSurrogate></emu-prodref> | ||
<emu-prodref name=TrailSurrogate></emu-prodref> | ||
<emu-prodref name=NonSurrogate></emu-prodref> | ||
<emu-prodref name=IdentityEscape></emu-prodref> | ||
<emu-prodref name=DecimalEscape></emu-prodref> | ||
<emu-prodref name=CharacterClassEscape></emu-prodref> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RegExpUnicodeSurrogatePair
generates a set of terminal sequences, each of length 11. This doesn't fit with the current definition of lookahead-restrictions.