Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAnalysis of how implementations handle the required escaping in RegExp#source #578
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment
Hide comment
claudepache
May 21, 2016
Contributor
Personally, I am for not escaping / inside RegularExpressionClass, because that has the property of preserving exactly the source text when it originated from a regexp literal.
|
Personally, I am for not escaping |
domenic
added
the
web reality
label
Jul 28, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
Show comment
Hide comment
|
@claudepache could you perhaps prepare a PR for this? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
claudepache commentedMay 20, 2016
This is a followup of https://bugs.ecmascript.org/show_bug.cgi?id=1470
I’ve made a first rapid analysis of how major web browsers implement the not-exactly-specified Step 2 of EscapeRegExpPattern. Recall that, for a regexp
rx, we have approximativelyrx.source = EscapeRegExpPattern(rx.[[OriginalSource]]). That transformation must not change the semantics of the pattern, but is required in order thatproduces a functionally equivalent regexp as
rx.Analysing the grammar that is used to determine the limits of a regexp literal, one can show that it suffices to:
/outside RegularExpressionClass; and(Note that, although
/*is parsed as a beginning of multiline-comment rather of a regular expression, this is nonproblematic because a regexp cannot ever begin with*.)The transformations used by the major browsers are detailed below, except that the line terminators are currently not escaped by Chrome (V8 Issue 1982).
<LF>\<LF>\n<CR>\<CR>\r<LS>\<LS>\u2028<PS>\<PS>\u2029/(outside RegularExpressionClass)\//(inside RegularExpressionClass)/(Firefox, Safari)\/(Chrome, Edge)(?:)It does not seems to me that implementations perform other transformations, but that needs confirmation.
In conclusion, the only major difference between implementations seems to be whether
/is escaped everywhere or only outside RegularExpressionClass.