New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify web reality RegExp character class invalid control escape semantics #863

Closed
littledan opened this Issue Apr 4, 2017 · 0 comments

Comments

Projects
None yet
1 participant
@littledan
Member

littledan commented Apr 4, 2017

Sadness: @schuay noted in this V8 bug that some invalid character classes are permitted in browsers. The 3/4 majority semantics (minus ChakraCore) seems to be to treat /[\c%]/ as /[\\c%]/, and to treat /[\c]/ as [\\c], where currently, neither are supported in the grammar. (They are included in ChakraCore's grammar, but with different semantics.) See the bug for more details. Should we standardize the majority semantics?

littledan added a commit to littledan/ecma262 that referenced this issue Apr 4, 2017

Normative: Specify RegExp malformed character class behavior
This patch specifies a previously unspecified piece of RegExp
syntax and semantics: If a character class has a control escape
(\c) which is not followed by an extended control character
[a-ZA-Z0-9_], then four engines will accept it as syntactically
valid. V8, JSC and SpiderMonkey will treat it as the character
class [\\c]; ChakraCore's behavior differs and depends on what
follows.

This patch codifies the majority behavior within the Annex B
RegExp extensions. This change is consistent with malformed \c
behavior when it takes place outside of a character class: It
behaves like \\c, rather than an identity escape.

Closes tc39#863

littledan added a commit to littledan/ecma262 that referenced this issue Jun 13, 2017

Normative: Specify RegExp malformed character class behavior
This patch specifies a previously unspecified piece of RegExp
syntax and semantics: If a character class has a control escape
(\c) which is not followed by an extended control character
[a-ZA-Z0-9_], then four engines will accept it as syntactically
valid. V8, JSC and SpiderMonkey will treat it as the character
class [\\c]; ChakraCore's behavior differs and depends on what
follows.

This patch codifies the majority behavior within the Annex B
RegExp extensions. This change is consistent with malformed \c
behavior when it takes place outside of a character class: It
behaves like \\c, rather than an identity escape.

Closes tc39#863

bterlson added a commit that referenced this issue Jun 13, 2017

Normative: Specify RegExp malformed character class behavior (#864)
This patch specifies a previously unspecified piece of RegExp
syntax and semantics: If a character class has a control escape
(\c) which is not followed by an extended control character
[a-ZA-Z0-9_], then four engines will accept it as syntactically
valid. V8, JSC and SpiderMonkey will treat it as the character
class [\\c]; ChakraCore's behavior differs and depends on what
follows.

This patch codifies the majority behavior within the Annex B
RegExp extensions. This change is consistent with malformed \c
behavior when it takes place outside of a character class: It
behaves like \\c, rather than an identity escape.

Closes #863

* New definition recommended by @anba

* Revert "Normative: Specify RegExp malformed character class behavior"

In a follow-on patch, the new version is given

* Add a lookahead to avoid an ambiguity for [\]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment