New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IdentifierPart is ambiguous re '_' ? #1059

Closed
jmdyck opened this Issue Jan 4, 2018 · 7 comments

Comments

Projects
None yet
5 participants
@jmdyck
Collaborator

jmdyck commented Jan 4, 2018

As I understand it, '_' (U+005F LOW LINE) belongs to Unicode general category 'Pc' (Connector_Punctuation), and so has the 'ID_Continue' property, and so matches the ECMAScript nonterminal UnicodeIDContinue. This means that IdentifierPart derives '_' in two distinct ways (via UnicodeIDContinue and via the '_' literal), and so is technically ambiguous. I don't think this causes any semantic ambiguity (because the spec doesn't much care about how IdentifierPart matches source text), but it's odd.

@mathiasbynens

This comment has been minimized.

Show comment
Hide comment
@mathiasbynens

mathiasbynens Jan 5, 2018

Member

This is correct:

/\p{ID_Continue}/u.test('_');
// → true

We can just drop the _ from IdentifierPart.

spec.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/spec.html b/spec.html
index b552ed8..6b5339e 100644
--- a/spec.html
+++ b/spec.html
@@ -9813,7 +9813,6 @@
       IdentifierPart ::
         UnicodeIDContinue
         `$`
-        `_`
         `\` UnicodeEscapeSequence
         <ZWNJ>
         <ZWJ>
Member

mathiasbynens commented Jan 5, 2018

This is correct:

/\p{ID_Continue}/u.test('_');
// → true

We can just drop the _ from IdentifierPart.

spec.html | 1 -
 1 file changed, 1 deletion(-)

diff --git a/spec.html b/spec.html
index b552ed8..6b5339e 100644
--- a/spec.html
+++ b/spec.html
@@ -9813,7 +9813,6 @@
       IdentifierPart ::
         UnicodeIDContinue
         `$`
-        `_`
         `\` UnicodeEscapeSequence
         <ZWNJ>
         <ZWJ>
@nicolo-ribaudo

This comment has been minimized.

Show comment
Hide comment
@nicolo-ribaudo

nicolo-ribaudo Jan 5, 2018

In that case please add a note: it would be confusing for a non-expert reader to see _ in IdentifierStart but not in IdentifierPart.

nicolo-ribaudo commented Jan 5, 2018

In that case please add a note: it would be confusing for a non-expert reader to see _ in IdentifierStart but not in IdentifierPart.

@mathiasbynens

This comment has been minimized.

Show comment
Hide comment
@mathiasbynens

mathiasbynens Jan 5, 2018

Member

We could also make IdentifierPart consume IdentifierStart to avoid the duplication. IdentifierStart is a guaranteed to be a subset of IdentifierPart because ID_Start is guaranteed to be a subset of ID_Continue.

spec.html | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/spec.html b/spec.html
index b552ed8..a64ca7c 100644
--- a/spec.html
+++ b/spec.html
@@ -9811,10 +9811,8 @@
         `\` UnicodeEscapeSequence
 
       IdentifierPart ::
+        IdentifierStart
         UnicodeIDContinue
-        `$`
-        `_`
-        `\` UnicodeEscapeSequence
         <ZWNJ>
         <ZWJ>
Member

mathiasbynens commented Jan 5, 2018

We could also make IdentifierPart consume IdentifierStart to avoid the duplication. IdentifierStart is a guaranteed to be a subset of IdentifierPart because ID_Start is guaranteed to be a subset of ID_Continue.

spec.html | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/spec.html b/spec.html
index b552ed8..a64ca7c 100644
--- a/spec.html
+++ b/spec.html
@@ -9811,10 +9811,8 @@
         `\` UnicodeEscapeSequence
 
       IdentifierPart ::
+        IdentifierStart
         UnicodeIDContinue
-        `$`
-        `_`
-        `\` UnicodeEscapeSequence
         <ZWNJ>
         <ZWJ>

jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jan 5, 2018

Editorial: Remove '_' RHS for IdentifierPart
... because that RHS made it ambiguous.

(Resolves issue tc39#1059.)
@jmdyck

This comment has been minimized.

Show comment
Hide comment
@jmdyck

jmdyck Jan 5, 2018

Collaborator
  IdentifierPart ::
    IdentifierStart
    UnicodeIDContinue
    <ZWNJ>
    <ZWJ>

With that, IdentifierPart would be ambiguous on all the characters in the intersection of IdentifierStart and UnicodeIDContinue (i.e., _ and p{ID_Start}).

Collaborator

jmdyck commented Jan 5, 2018

  IdentifierPart ::
    IdentifierStart
    UnicodeIDContinue
    <ZWNJ>
    <ZWJ>

With that, IdentifierPart would be ambiguous on all the characters in the intersection of IdentifierStart and UnicodeIDContinue (i.e., _ and p{ID_Start}).

@jmdyck jmdyck referenced this issue Jan 5, 2018

Merged

Misc editorial #1053

@allenwb

This comment has been minimized.

Show comment
Hide comment
@allenwb

allenwb Jan 5, 2018

Member

It seems to me that the simplest way to eliminate this ambiguity is:

UnicodeIDContinue ::
any Unicode code point other than U+005F (LOW LINE) with the Unicode property “ID_Continue”

Member

allenwb commented Jan 5, 2018

It seems to me that the simplest way to eliminate this ambiguity is:

UnicodeIDContinue ::
any Unicode code point other than U+005F (LOW LINE) with the Unicode property “ID_Continue”

jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jan 24, 2018

Editorial: Remove '_' RHS for IdentifierPart
... because that RHS made it ambiguous.

(Resolves issue tc39#1059.)

jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jan 26, 2018

Editorial: Remove '_' RHS for IdentifierPart
... because that RHS made it ambiguous.

(Resolves issue tc39#1059.)

jmdyck added a commit to jmdyck/ecma262 that referenced this issue Jan 26, 2018

Editorial: remove `_` RHS from RegExpIdentifierPart
... because it's already derived by UnicodeIDContinue RHS.

(See issue tc39#1059.)

bterlson pushed a commit that referenced this issue Feb 13, 2018

Editorial: Remove '_' RHS for IdentifierPart
... because that RHS made it ambiguous.

(Resolves issue #1059.)

bterlson pushed a commit that referenced this issue Feb 13, 2018

Editorial: remove `_` RHS from RegExpIdentifierPart
... because it's already derived by UnicodeIDContinue RHS.

(See issue #1059.)
@ljharb

This comment has been minimized.

Show comment
Hide comment
@ljharb

ljharb Apr 1, 2018

Member

@jmdyck Since a1f915b and #1053 are landed, does that mean this can be closed?

Member

ljharb commented Apr 1, 2018

@jmdyck Since a1f915b and #1053 are landed, does that mean this can be closed?

@jmdyck

This comment has been minimized.

Show comment
Hide comment
@jmdyck

jmdyck Apr 1, 2018

Collaborator

Yup, thanks.

Collaborator

jmdyck commented Apr 1, 2018

Yup, thanks.

@jmdyck jmdyck closed this Apr 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment