From bb7d2a398dc0774106b4f383c65d7483ead956c9 Mon Sep 17 00:00:00 2001 From: Dan Ehrenberg Date: Fri, 16 Oct 2015 06:54:03 -0700 Subject: [PATCH 1/3] Specialize RegExp methods to only work on real RegExp instances In ES2015, regular expression methods like RegExp.prototype[Symbol.search] invoked the receiver's "exec" method to get at the core of matching. This patch removes that extra degree of genericity to call out to the builtin RegExp exec implementation directly. Type checks are added where appropriate to ensure that this direct call is safe. --- spec.html | 34 +++++++++------------------------- 1 file changed, 9 insertions(+), 25 deletions(-) diff --git a/spec.html b/spec.html index 488a8bc087..2552744ed0 100644 --- a/spec.html +++ b/spec.html @@ -29395,35 +29395,13 @@

RegExp.prototype.exec ( _string_ )

1. If _R_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_). 1. ReturnIfAbrupt(_S_). - 1. Return RegExpBuiltinExec(_R_, _S_). + 1. Return RegExpExec(_R_, _S_). - - + +

Runtime Semantics: RegExpExec ( _R_, _S_ )

The abstract operation RegExpExec with arguments _R_ and _S_ performs the following steps:

- - 1. Assert: Type(_R_) is Object. - 1. Assert: Type(_S_) is String. - 1. Let _exec_ be Get(_R_, `"exec"`). - 1. ReturnIfAbrupt(_exec_). - 1. If IsCallable(_exec_) is *true*, then - 1. Let _result_ be Call(_exec_, _R_, «_S_»). - 1. ReturnIfAbrupt(_result_). - 1. If Type(_result_) is neither Object or Null, throw a *TypeError* exception. - 1. Return _result_. - 1. If _R_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. - 1. Return RegExpBuiltinExec(_R_, _S_). - - -

If a callable `exec` property is not found this algorithm falls back to attempting to use the built-in RegExp matching algorithm. This provides compatible behaviour for code written for prior editions where most built-in algorithms that use regular expressions did not perform a dynamic property lookup of `exec`.

-
-
- - - -

Runtime Semantics: RegExpBuiltinExec ( _R_, _S_ )

-

The abstract operation RegExpBuiltinExec with arguments _R_ and _S_ performs the following steps:

1. Assert: _R_ is an initialized RegExp instance. 1. Assert: Type(_S_) is String. @@ -29511,6 +29489,7 @@

get RegExp.prototype.flags

1. Let _R_ be the *this* value. 1. If Type(_R_) is not Object, throw a *TypeError* exception. + 1. If _rx_ does not have a [[OriginalFlags]] internal slot, throw a *TypeError* exception. 1. Let _result_ be the empty String. 1. Let _global_ be ToBoolean(Get(_R_, `"global"`)). 1. ReturnIfAbrupt(_global_). @@ -29566,6 +29545,7 @@

RegExp.prototype [ @@match ] ( _string_ )

1. Let _rx_ be the *this* value. 1. If Type(_rx_) is not Object, throw a *TypeError* exception. + 1. If _rx_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_) 1. ReturnIfAbrupt(_S_). 1. Let _global_ be ToBoolean(Get(_rx_, `"global"`)). @@ -29625,6 +29605,7 @@

RegExp.prototype [ @@replace ] ( _string_, _replaceValue_ )

1. Let _rx_ be the *this* value. 1. If Type(_rx_) is not Object, throw a *TypeError* exception. + 1. If _rx_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_). 1. ReturnIfAbrupt(_S_). 1. Let _lengthS_ be the number of code unit elements in _S_. @@ -29705,6 +29686,7 @@

RegExp.prototype [ @@search ] ( _string_ )

1. Let _rx_ be the *this* value. 1. If Type(_rx_) is not Object, throw a *TypeError* exception. + 1. If _rx_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_). 1. ReturnIfAbrupt(_S_). 1. Let _previousLastIndex_ be Get(_rx_, `"lastIndex"`). @@ -29760,6 +29742,7 @@

RegExp.prototype [ @@split ] ( _string_, _limit_ )

1. Let _rx_ be the *this* value. 1. If Type(_rx_) is not Object, throw a *TypeError* exception. + 1. If _rx_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_). 1. ReturnIfAbrupt(_S_). 1. Let _C_ be SpeciesConstructor(_rx_, %RegExp%). @@ -29849,6 +29832,7 @@

RegExp.prototype.test( _S_ )

1. Let _R_ be the *this* value. 1. If Type(_R_) is not Object, throw a *TypeError* exception. + 1. If _R_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _string_ be ToString(_S_). 1. ReturnIfAbrupt(_string_). 1. Let _match_ be RegExpExec(_R_, _string_). From 795cf17fbd291242e1bd2548d7da638906109ba8 Mon Sep 17 00:00:00 2001 From: Dan Ehrenberg Date: Fri, 16 Oct 2015 06:34:16 -0700 Subject: [PATCH 2/3] Replace internal use of flag getters with [[OriginalFlags]] In some places in the ES2015 RegExp specification, flag values are read from [[OriginalFlags]], and in other places, they are read by invoking Get() on the RegExp to get the flag value. This inconsistency does not have any obvious benefits for subclassing and may have some performance cost. This patch standardizes on reading flag values from [[OriginalFlags]]. --- spec.html | 46 ++++++++++++++++------------------------------ 1 file changed, 16 insertions(+), 30 deletions(-) diff --git a/spec.html b/spec.html index 2552744ed0..f9f2f70ee3 100644 --- a/spec.html +++ b/spec.html @@ -29408,13 +29408,11 @@

Runtime Semantics: RegExpExec ( _R_, _S_ )

1. Let _length_ be the number of code units in _S_. 1. Let _lastIndex_ be ToLength(Get(_R_,`"lastIndex"`)). 1. ReturnIfAbrupt(_lastIndex_). - 1. Let _global_ be ToBoolean(Get(_R_, `"global"`)). - 1. ReturnIfAbrupt(_global_). - 1. Let _sticky_ be ToBoolean(Get(_R_, `"sticky"`)). - 1. ReturnIfAbrupt(_sticky_). + 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. + 1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*. + 1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*. 1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0. 1. Let _matcher_ be the value of _R_'s [[RegExpMatcher]] internal slot. - 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. 1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*. 1. Let _matchSucceeded_ be *false*. 1. Repeat, while _matchSucceeded_ is *false* @@ -29491,21 +29489,12 @@

get RegExp.prototype.flags

1. If Type(_R_) is not Object, throw a *TypeError* exception. 1. If _rx_ does not have a [[OriginalFlags]] internal slot, throw a *TypeError* exception. 1. Let _result_ be the empty String. - 1. Let _global_ be ToBoolean(Get(_R_, `"global"`)). - 1. ReturnIfAbrupt(_global_). - 1. If _global_ is *true*, append `"g"` as the last code unit of _result_. - 1. Let _ignoreCase_ be ToBoolean(Get(_R_, `"ignoreCase"`)). - 1. ReturnIfAbrupt(_ignoreCase_). - 1. If _ignoreCase_ is *true*, append `"i"` as the last code unit of _result_. - 1. Let _multiline_ be ToBoolean(Get(_R_, `"multiline"`)). - 1. ReturnIfAbrupt(_multiline_). - 1. If _multiline_ is *true*, append `"m"` as the last code unit of _result_. - 1. Let _unicode_ be ToBoolean(Get(_R_, `"unicode"`)). - 1. ReturnIfAbrupt(_unicode_). - 1. If _unicode_ is *true*, append `"u"` as the last code unit of _result_. - 1. Let _sticky_ be ToBoolean(Get(_R_, `"sticky"`)). - 1. ReturnIfAbrupt(_sticky_). - 1. If _sticky_ is *true*, append `"y"` as the last code unit of _result_. + 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. + 1. If _flags_ contains `"g"`, append `"g"` as the last code unit of _result_. + 1. If _flags_ contains `"i"`, append `"i"` as the last code unit of _result_. + 1. If _flags_ contains `"m"`, append `"m"` as the last code unit of _result_. + 1. If _flags_ contains `"u"`, append `"u"` as the last code unit of _result_. + 1. If _flags_ contains `"y"`, append `"y"` as the last code unit of _result_. 1. Return _result_.
@@ -29548,13 +29537,12 @@

RegExp.prototype [ @@match ] ( _string_ )

1. If _rx_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_) 1. ReturnIfAbrupt(_S_). - 1. Let _global_ be ToBoolean(Get(_rx_, `"global"`)). - 1. ReturnIfAbrupt(_global_). + 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. + 1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*. 1. If _global_ is *false*, then 1. Return RegExpExec(_rx_, _S_). 1. Else _global_ is *true*, - 1. Let _fullUnicode_ be ToBoolean(Get(_rx_, `"unicode"`)). - 1. ReturnIfAbrupt(_fullUnicode_). + 1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*. 1. Let _setStatus_ be Set(_rx_, `"lastIndex"`, 0, *true*). 1. ReturnIfAbrupt(_setStatus_). 1. Let _A_ be ArrayCreate(0). @@ -29613,11 +29601,10 @@

RegExp.prototype [ @@replace ] ( _string_, _replaceValue_ )

1. If _functionalReplace_ is *false*, then 1. Let _replaceValue_ be ToString(_replaceValue_). 1. ReturnIfAbrupt(_replaceValue_). - 1. Let _global_ be ToBoolean(Get(_rx_, `"global"`)). - 1. ReturnIfAbrupt(_global_). + 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. + 1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*. 1. If _global_ is *true*, then - 1. Let _fullUnicode_ be ToBoolean(Get(_rx_, `"unicode"`)). - 1. ReturnIfAbrupt(_fullUnicode_). + 1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*. 1. Let _setStatus_ be Set(_rx_, `"lastIndex"`, 0, *true*). 1. ReturnIfAbrupt(_setStatus_). 1. Let _results_ be a new empty List. @@ -29747,8 +29734,7 @@

RegExp.prototype [ @@split ] ( _string_, _limit_ )

1. ReturnIfAbrupt(_S_). 1. Let _C_ be SpeciesConstructor(_rx_, %RegExp%). 1. ReturnIfAbrupt(_C_). - 1. Let _flags_ be ToString(Get(_rx_, `"flags"`)). - 1. ReturnIfAbrupt(_flags_). + 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. 1. If _flags_ contains `"u"`, let _unicodeMatching_ be *true*. 1. Else, let _unicodeMatching_ be *false*. 1. If _flags_ contains `"y"`, let _newFlags_ be _flags_. From 9acca2034deb2377fd61dacfc415a409cd3c0087 Mon Sep 17 00:00:00 2001 From: Dan Ehrenberg Date: Fri, 16 Oct 2015 09:12:43 -0700 Subject: [PATCH 3/3] Factor out a stateless InnerRegExpExec Previously, the ES spec matched RegExps using a RegExpExec internal algorithm. This algorithm has a couple properties which don't completely with its usages, leading to awkward workarounds: - It modifies lastIndex - There is no mechanism for passing additional flags Now that "exec" is not called as a method, but rather RegExpExec is directly invoked from methods like RegExp.prototype[Symbol.split], it is possible to refactor RegExpExec to create InnerRegExpExec which does not have these issues. This patch does that refactoring and takes advantage of it to remove the lastIndex "save and restore" from RegExp.prototype[Symbol.search] and, more importantly, the extra RegExp allocation from RegExp.prototype[Symbol.split]. This is a normative change, since it affects when the lastIndex property is read and written. --- spec.html | 77 ++++++++++++++++++++++++++----------------------------- 1 file changed, 36 insertions(+), 41 deletions(-) diff --git a/spec.html b/spec.html index f9f2f70ee3..6d9c53d56d 100644 --- a/spec.html +++ b/spec.html @@ -29398,17 +29398,14 @@

RegExp.prototype.exec ( _string_ )

1. Return RegExpExec(_R_, _S_). - - -

Runtime Semantics: RegExpExec ( _R_, _S_ )

-

The abstract operation RegExpExec with arguments _R_ and _S_ performs the following steps:

+ +

Runtime Semantics: InnerRegExpExec ( _R_, _S_, _lastIndex_, _extraFlags_ )

+

The abstract operation RegExpExec with arguments _R_, _S_, _lastIndex_ and _extraFlags_ performs the following steps:

1. Assert: _R_ is an initialized RegExp instance. 1. Assert: Type(_S_) is String. 1. Let _length_ be the number of code units in _S_. - 1. Let _lastIndex_ be ToLength(Get(_R_,`"lastIndex"`)). - 1. ReturnIfAbrupt(_lastIndex_). - 1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. + 1. Let _flags_ be the concatenation of _R_'s [[OriginalFlags]] internal slot and _extraFlags_. 1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*. 1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*. 1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0. @@ -29417,15 +29414,11 @@

Runtime Semantics: RegExpExec ( _R_, _S_ )

1. Let _matchSucceeded_ be *false*. 1. Repeat, while _matchSucceeded_ is *false* 1. If _lastIndex_ > _length_, then - 1. Let _setStatus_ be Set(_R_, `"lastIndex"`, 0, *true*). - 1. ReturnIfAbrupt(_setStatus_). - 1. Return *null*. + 1. Return { [[Matches]]: *null*, [[LastIndex]]: 0 } 1. Let _r_ be _matcher_(_S_, _lastIndex_). 1. If _r_ is ~failure~, then 1. If _sticky_ is *true*, then - 1. Let _setStatus_ be Set(_R_, `"lastIndex"`, 0, *true*). - 1. ReturnIfAbrupt(_setStatus_). - 1. Return *null*. + 1. Return { [[Matches]]: *null*, [[LastIndex]]: 0 } 1. Let _lastIndex_ be AdvanceStringIndex(_S_, _lastIndex_, _fullUnicode_). 1. Else, 1. Assert: _r_ is a State. @@ -29434,9 +29427,6 @@

Runtime Semantics: RegExpExec ( _R_, _S_ )

1. If _fullUnicode_ is *true*, then 1. _e_ is an index into the _Input_ character list, derived from _S_, matched by _matcher_. Let _eUTF_ be the smallest index into _S_ that corresponds to the character at element _e_ of _Input_. If _e_ is greater than or equal to the length of _Input_, then _eUTF_ is the number of code units in _S_. 1. Let _e_ be _eUTF_. - 1. If _global_ is *true* or _sticky_ is *true*, - 1. Let _setStatus_ be Set(_R_, `"lastIndex"`, _e_, *true*). - 1. ReturnIfAbrupt(_setStatus_). 1. Let _n_ be the length of _r_'s _captures_ List. (This is the same value as 's _NcapturingParens_.) 1. Let _A_ be ArrayCreate(_n_ + 1). 1. Assert: The value of _A_'s `"length"` property is _n_ + 1. @@ -29456,7 +29446,24 @@

Runtime Semantics: RegExpExec ( _R_, _S_ )

1. Assert: _captureI_ is a List of code units. 1. Let _capturedValue_ be a string consisting of the code units of _captureI_. 1. Perform CreateDataProperty(_A_, ToString(_i_) , _capturedValue_). - 1. Return _A_. + 1. If _global_ is *true* or _sticky_ is *true*, + 1. Return { [[Matches]]: _A_, [[LastIndex]]: _e_ }. + 1. Otherwise, return { [[Matches]]: _A_ }. +
+
+ + +

Runtime Semantics: RegExpExec ( _R_, _S_ )

+

The abstract operation RegExpExec with arguments _R_ and _S_ performs the following steps:

+ + 1. Let _lastIndex_ be ToLength(Get(_R_,`"lastIndex"`)). + 1. ReturnIfAbrupt(_lastIndex_). + 1. Let _result_ be InnerRegExpExec(_R_, _S_, _lastIndex_, `""`). + 1. ReturnIfAbrupt(_result_). + 1. If _result_ has a [[LastIndex]] entry, + 1. Let _setStatus_ be Set(_R_, `"lastIndex"`, _result_.[[LastIndex]], *true*). + 1. ReturnIfAbrupt(_setStatus_). + 1. Return _result_.[[Matches]].
@@ -29676,16 +29683,11 @@

RegExp.prototype [ @@search ] ( _string_ )

1. If _rx_ does not have a [[RegExpMatcher]] internal slot, throw a *TypeError* exception. 1. Let _S_ be ToString(_string_). 1. ReturnIfAbrupt(_S_). - 1. Let _previousLastIndex_ be Get(_rx_, `"lastIndex"`). - 1. ReturnIfAbrupt(_previousLastIndex_). - 1. Let _status_ be Set(_rx_, `"lastIndex"`, 0, *true*). - 1. ReturnIfAbrupt(_status_). - 1. Let _result_ be RegExpExec(_rx_, _S_). + 1. Let _result_ be InnerRegExpExec(_rx_, _S_, 0, `""`). 1. ReturnIfAbrupt(_result_). - 1. Let _status_ be Set(_rx_, `"lastIndex"`, _previousLastIndex_, *true*). - 1. ReturnIfAbrupt(_status_). - 1. If _result_ is *null*, return -1. - 1. Return Get(_result_, `"index"`). + 1. Let _matches_ = _result_.[[Matches]]. + 1. If _matches_ is *null*, return -1. + 1. Return Get(_matches_, `"index"`).

The value of the `name` property of this function is `"[Symbol.search]"`.

@@ -29737,10 +29739,6 @@

RegExp.prototype [ @@split ] ( _string_, _limit_ )

1. Let _flags_ be the value of _R_'s [[OriginalFlags]] internal slot. 1. If _flags_ contains `"u"`, let _unicodeMatching_ be *true*. 1. Else, let _unicodeMatching_ be *false*. - 1. If _flags_ contains `"y"`, let _newFlags_ be _flags_. - 1. Else, let _newFlags_ be the string that is the concatenation of _flags_ and `"y"`. - 1. Let _splitter_ be Construct(_C_, «_rx_, _newFlags_»). - 1. ReturnIfAbrupt(_splitter_). 1. Let _A_ be ArrayCreate(0). 1. Let _lengthA_ be 0. 1. If _limit_ is *undefined*, let _lim_ be 253-1; else let _lim_ be ToLength(_limit_). @@ -29749,22 +29747,19 @@

RegExp.prototype [ @@split ] ( _string_, _limit_ )

1. Let _p_ be 0. 1. If _lim_ = 0, return _A_. 1. If _size_ = 0, then - 1. Let _z_ be RegExpExec(_splitter_, _S_). + 1. Let _z_ be InnerRegExpExec(_splitter_, _S_, 0, `"y"`). 1. ReturnIfAbrupt(_z_). - 1. If _z_ is not *null*, return _A_. + 1. If _z_.[[Matches]] is not *null*, return _A_. 1. Assert: The following call will never result in an abrupt completion. 1. Perform CreateDataProperty(_A_, `"0"`, _S_). 1. Return _A_. 1. Let _q_ be _p_. 1. Repeat, while _q_ < _size_ - 1. Let _setStatus_ be Set(_splitter_, `"lastIndex"`, _q_, *true*). - 1. ReturnIfAbrupt(_setStatus_). - 1. Let _z_ be RegExpExec(_splitter_, _S_). + 1. Let _z_ be InnerRegExpExec(_splitter_, _S_, _q_, `"y"`). 1. ReturnIfAbrupt(_z_). - 1. If _z_ is *null*, let _q_ be AdvanceStringIndex(_S_, _q_, _unicodeMatching_). - 1. Else _z_ is not *null*, - 1. Let _e_ be ToLength(Get(_splitter_, `"lastIndex"`)). - 1. ReturnIfAbrupt(_e_). + 1. If _z_.[[Matches]] is *null*, let _q_ be AdvanceStringIndex(_S_, _q_, _unicodeMatching_). + 1. Else _z_.[[Matches]] is not *null*, + 1. Let _e_ be _z_.[[LastIndex]]. 1. If _e_ = _p_, let _q_ be AdvanceStringIndex(_S_, _q_, _unicodeMatching_). 1. Else _e_ ≠ _p_, 1. Let _T_ be a String value equal to the substring of _S_ consisting of the elements at indices _p_ (inclusive) through _q_ (exclusive). @@ -29773,12 +29768,12 @@

RegExp.prototype [ @@split ] ( _string_, _limit_ )

1. Let _lengthA_ be _lengthA_ +1. 1. If _lengthA_ = _lim_, return _A_. 1. Let _p_ be _e_. - 1. Let _numberOfCaptures_ be ToLength(Get(_z_, `"length"`)). + 1. Let _numberOfCaptures_ be ToLength(Get(_z_.[[Matches]], `"length"`)). 1. ReturnIfAbrupt(_numberOfCaptures_). 1. Let _numberOfCaptures_ be max(_numberOfCaptures_-1, 0). 1. Let _i_ be 1. 1. Repeat, while _i_ ≤ _numberOfCaptures_. - 1. Let _nextCapture_ be Get(_z_, ToString(_i_)). + 1. Let _nextCapture_ be Get(_z_.[[Matches]], ToString(_i_)). 1. ReturnIfAbrupt(_nextCapture_). 1. Perform CreateDataProperty(_A_, ToString(_lengthA_), _nextCapture_). 1. Let _i_ be _i_ +1.