Skip to content

Conversation

@bhaible
Copy link
Contributor

@bhaible bhaible commented Mar 13, 2024

In all places where a 'reserved-body' can occur:

  • in a reserved-statement,
  • after a reserved-annotation or private-use-annotation, in a literal-expression, variable-expression, or annotation-expression, it can be followed by whitespace ('s' nonterminal).

A syntax ambiguity exists, because - as reported in #721 and #725 - a U+3000 character can occur as the last character of a 'reserved-body' (via a 'reserved-char') and also as first character of whitespace ('s' nonterminal).

According to the principles explained in #725, it is not desired that a 'reserved-body' ends with a U+3000 character; rather, the U+3000 character is meant to be interpreted as part of the following whitespace.

Test cases (written with \u escapes, for legibility): For reserved-statement:
.regex /foo/\u3000\u3000{xyz}{{hello}}
For reserved-annotation:
{ % foo bar \u3000\u3000 }
For private-use-annotation:
{ & foo bar \u3000\u3000 @x }

This patch removes this ambiguity, by disallowing U+3000 as the last character of a 'reserved-body'.

It thus fixes #725 and the second part of #721.

Details:

  • U+3000 gets removed from 'content-char' and 'reserved-char'.
  • Whereas simple-start-char, text-char, quoted-char stay the same (since U+3000 is already part of 's').

In all places where a 'reserved-body' can occur:
  - in a reserved-statement,
  - after a reserved-annotation or private-use-annotation,
    in a literal-expression, variable-expression, or annotation-expression,
it can be followed by whitespace ('s' nonterminal).

A syntax ambiguity exists, because - as reported in unicode-org#721 and unicode-org#725 - a U+3000
character can occur as the last character of a 'reserved-body' (via a
'reserved-char') and also as first character of whitespace ('s' nonterminal).

According to the principles explained in unicode-org#725, it is not desired that
a 'reserved-body' ends with a U+3000 character; rather, the U+3000 character
is meant to be interpreted as part of the following whitespace.

Test cases (written with \u escapes, for legibility):
For reserved-statement:
.regex   /foo/\u3000\u3000{xyz}{{hello}}
For reserved-annotation:
{ % foo bar \u3000\u3000 }
For private-use-annotation:
{ & foo bar \u3000\u3000 @x }

This patch removes this ambiguity, by disallowing U+3000 as the last character
of a 'reserved-body'.

It thus fixes unicode-org#725 and the second part of unicode-org#721.

Details:
  - U+3000 gets removed from 'content-char' and 'reserved-char'.
  - Whereas simple-start-char, text-char, quoted-char stay the same
    (since U+3000 is already part of 's').
@aphillips aphillips added syntax Issues related with syntax or ABNF fast-track Editorial change permitted to use fast-track merge rules LDML45 labels Mar 13, 2024
@aphillips
Copy link
Member

The appears to be an obvious oversight in the addition of U+3000 to whitespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fast-track Editorial change permitted to use fast-track merge rules syntax Issues related with syntax or ABNF

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEEDBACK] syntax: an ambiguity regarding resolved-body in expression

3 participants