New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Editorial: nested surrogate pairs? #969

Open
jmdyck opened this Issue Aug 6, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@jmdyck
Collaborator

jmdyck commented Aug 6, 2017

In the 'Patterns' clause, the Syntax section contains the paragraph:

Each \u TrailSurrogate for which the choice of associated u LeadSurrogate is ambiguous shall be associated with the nearest possible u LeadSurrogate that would otherwise have no corresponding \u TrailSurrogate.

This strikes me as very odd. The wording is exactly parallel to that for resolving the dangling else problem, suggesting that surrogate pairs nest somehow. I.e., given

\u LeadSurrogate \u LeadSurrogate \u TrailSurrogate \u TrailSurrogate

the last TrailSurrogate should be 'associated' with the first LeadSurrogate. But what would that even mean? (Note that the relevant semantics make no mention of "associated" or "corresponding" Surrogates.)

Now, granted, the grammar is formally ambiguous here (Alternative[+U] derives \u LeadSurrogate \u TrailSurrogate in two distinct ways), and so requires some disambiguation. But rather than the quoted paragraph, I'd prefer one of these approaches:

  • If some part of the source text matches \u LeadSurrogate \u TrailSurrogate, it must be parsed as a single Atom rather than two.
  • In cases of ambiguity, RegExpUnicodeEscapeSequence's first RHS is preferred over its second and third.
  • Change the second RHS to [+U] u LeadSurrogate [lookahead != \u TrailSurrogate]

(One objection to the last approach is that TrailSurrogate is not a terminal symbol, and so this is not a 'legal' lookahead sequence. However, this doesn't bother me much: we could enlarge the definition of lookahead sequences if we wanted.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment