New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/[\w-e]/ could have been valid #643

Closed
icefapper opened this Issue Jul 23, 2016 · 6 comments

Comments

Projects
None yet
3 participants
@icefapper

icefapper commented Jul 23, 2016

Hello
In B.1.4 In ECMAScript 2015 spec, there were a number of productions that, due to a probably minor ordering mistake, have been actually dropped in the latest version of the spec, namely ES 2017:

NonemptyClassRanges [U] ::
ClassAtom [?U]
ClassAtom [?U] NonemptyClassRangesNoDash [?U]
[~U] ClassAtomInRange - ClassAtomInRange ClassRanges

NonemptyClassRangesNoDash [U] ::
ClassAtom [?U]
ClassAtomNoDash [?U] NonemptyClassRangesNoDash [?U]
[~U] ClassAtomNoDashInRange - ClassAtomInRange ClassRanges

ClassAtomInRange ::

ClassAtomNoDashInRange

ClassAtomNoDashInRange ::
SourceCharacter but not one of \ or ] or -
\ ClassEscape but only if ClassEscape evaluates to a CharSet with exactly one character
\ IdentityEscape

parsing using the rules above, /[\w-e]/ fails (please note the absence of the 'u' flag): ClassAtom [?U] NonemptyClassRangesNoDash [?U] fails to match \w-e, and the next production to test against is this: [~U] ClassAtomInRange - ClassAtomInRange ClassRanges, which is going to fail, since ClassAtomInRange won't match \w, because it is not "evaluat[ing] to a CharSet with exactly one character."

However, /[\w-e]/ would have been a valid RE in extended mode had the first two rules been:

NonemptyClassRanges [U] ::
ClassAtom [?U]
[~U] ClassAtomInRange - ClassAtomInRange ClassRanges
ClassAtom [?U] NonemptyClassRangesNoDash [?U]

NonemptyClassRangesNoDash [U] ::
ClassAtom [?U]
[~U] ClassAtomNoDashInRange - ClassAtomInRange ClassRanges
ClassAtomNoDash [?U] NonemptyClassRangesNoDash [?U]

This way, the RE /[\w-e]/ would have been interpreted as a set consisting of \w, -, and e, not as a range which will, anyway, be a treated as a syntax error. This, indeed, is the way Chrome 51, Safari, and Firefox 47 behave ( for example, "l-e".match( /[\w-e]/g returns ["l","-","e"] ) .

This makes me wonder whether it is a bug in the implementations, or an actually mistaken ordering in the spec (along with the subsequent omission of the productions altogether.) Or it might, more probably, be that I'm miserably mistaken.

Either way, it would be great to have an answer.

Thanks a lot reading this far!

@littledan

This comment has been minimized.

Show comment
Hide comment
@littledan
Member

littledan commented Jul 25, 2016

@hashseed

This comment has been minimized.

Show comment
Hide comment
@hashseed

hashseed Jul 26, 2016

I would argue that [\w-e] should indeed be parsed the same as [-e\w].

hashseed commented Jul 26, 2016

I would argue that [\w-e] should indeed be parsed the same as [-e\w].

@icefapper

This comment has been minimized.

Show comment
Hide comment
@icefapper

icefapper Jul 26, 2016

@hashseed
that is my point actually; But I could not come up with any sub-production of ClassRanges that would lead to [\w-e] being parsed as a class consisting of ClassAtom(\w) , ClassAtom(-), and ClassAtom(e).

Thanks a lot in advance for any clarifications!

icefapper commented Jul 26, 2016

@hashseed
that is my point actually; But I could not come up with any sub-production of ClassRanges that would lead to [\w-e] being parsed as a class consisting of ClassAtom(\w) , ClassAtom(-), and ClassAtom(e).

Thanks a lot in advance for any clarifications!

@icefapper

This comment has been minimized.

Show comment
Hide comment
@icefapper

icefapper commented Jul 30, 2016

/cc @allenwb

@littledan

This comment has been minimized.

Show comment
Hide comment
@littledan

littledan Apr 13, 2017

Member

Looking back at this bug, I'm not sure what the intent was of the filing. I don't see the language "but only if ClassEscape evaluates to a CharSet with exactly one character" in the current spec text. That language was removed by fbdfda6 , which landed before this bug was filed.

In the new spec text that @anba wrote in that patch, rather than using a grammar restriction, there is a SyntaxError thrown in evaluation, as part of CharacterRange, and in Annex B 1.4, this is replaced by CharacterRangeOrUnion, which implements the behavior @icefapper observed in browsers. cc @schuay

Member

littledan commented Apr 13, 2017

Looking back at this bug, I'm not sure what the intent was of the filing. I don't see the language "but only if ClassEscape evaluates to a CharSet with exactly one character" in the current spec text. That language was removed by fbdfda6 , which landed before this bug was filed.

In the new spec text that @anba wrote in that patch, rather than using a grammar restriction, there is a SyntaxError thrown in evaluation, as part of CharacterRange, and in Annex B 1.4, this is replaced by CharacterRangeOrUnion, which implements the behavior @icefapper observed in browsers. cc @schuay

@littledan littledan closed this Apr 13, 2017

@icefapper

This comment has been minimized.

Show comment
Hide comment
@icefapper

icefapper Apr 14, 2017

Thanks for the response.
The reason I opened this issue was because I had used the (outdated?) spec on ECMA's website, rather than the one on this repository (which is getting updated on a regular basis.)

icefapper commented Apr 14, 2017

Thanks for the response.
The reason I opened this issue was because I had used the (outdated?) spec on ECMA's website, rather than the one on this repository (which is getting updated on a regular basis.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment