New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of how implementations handle the required escaping in RegExp#source #578

Open
claudepache opened this Issue May 20, 2016 · 2 comments

Comments

Projects
None yet
3 participants
@claudepache
Contributor

claudepache commented May 20, 2016

This is a followup of https://bugs.ecmascript.org/show_bug.cgi?id=1470

I’ve made a first rapid analysis of how major web browsers implement the not-exactly-specified Step 2 of EscapeRegExpPattern. Recall that, for a regexp rx, we have approximatively rx.source = EscapeRegExpPattern(rx.[[OriginalSource]]). That transformation must not change the semantics of the pattern, but is required in order that

    eval("/" + rx.source + "/" + rx.flags)

produces a functionally equivalent regexp as rx.

Analysing the grammar that is used to determine the limits of a regexp literal, one can show that it suffices to:

(Note that, although /* is parsed as a beginning of multiline-comment rather of a regular expression, this is nonproblematic because a regexp cannot ever begin with *.)

The transformations used by the major browsers are detailed below, except that the line terminators are currently not escaped by Chrome (V8 Issue 1982).

Original source Transformed into
<LF>
\<LF>
\n
<CR>
\<CR>
\r
<LS>
\<LS>
\u2028
<PS>
\<PS>
\u2029
/ (outside RegularExpressionClass) \/
/ (inside RegularExpressionClass) / (Firefox, Safari)
\/ (Chrome, Edge)
empty pattern (?:)

It does not seems to me that implementations perform other transformations, but that needs confirmation.


In conclusion, the only major difference between implementations seems to be whether / is escaped everywhere or only outside RegularExpressionClass.

@claudepache

This comment has been minimized.

Show comment
Hide comment
@claudepache

claudepache May 21, 2016

Contributor

Personally, I am for not escaping / inside RegularExpressionClass, because that has the property of preserving exactly the source text when it originated from a regexp literal.

Contributor

claudepache commented May 21, 2016

Personally, I am for not escaping / inside RegularExpressionClass, because that has the property of preserving exactly the source text when it originated from a regexp literal.

@domenic domenic added the web reality label Jul 28, 2016

@ljharb

This comment has been minimized.

Show comment
Hide comment
@ljharb

ljharb Mar 21, 2018

Member

@claudepache could you perhaps prepare a PR for this?

Member

ljharb commented Mar 21, 2018

@claudepache could you perhaps prepare a PR for this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment