New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#627 broke RegExpBuiltinExec? #777

Closed
anba opened this Issue Jan 23, 2017 · 5 comments

Comments

Projects
None yet
4 participants
@anba
Contributor

anba commented Jan 23, 2017

From https://bugzilla.mozilla.org/show_bug.cgi?id=1317397#c1:

The PR was accepted, but it seems to be erroneous and hence not implementable for us. Consider the following case:

var re = /(.)\1/gu;
re.lastIndex = {
 valueOf() {
   // Recompile without Unicode-flag.
   re.compile(re.source, "g");
   return 0;
 }
};
var r = re.exec("\u{D83D}\u{DCA3}\u{DCA3}");
print(r ? r[0] : null);

With the new semantics, it seems that the expected result value is null. And here's why:

In RegExpBuiltinExec, the [[OriginalFlags]] is retrieved and stored, followed by calling ToLength(Get(R, "lastIndex")). Side-effects in ToLength can recompile the RegExp object! When the [[RegExpMatcher]] tries to find a match at string index 0, a failure is returned, because the recompiled RegExp works on code units and the sequence <U+D83D, U+DCA3> doesn't match "(.)\1". AdvanceStringIndex is called to advance to the next string index, but AdvanceStringIndex uses fullUnicode=true, so it skips over the complete code point U+1F4A3 and returns 2. The matcher also won't be able to find a match at string index 2, and therefore RegExpBuiltinExec returns null.

The obvious band-aid fix is:

@@ -29385,7 +29385,11 @@ THH:mm:ss.sss
             1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
             1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0.
-            1. Else, let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
+            1. Else,
+              1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
+              1. Let _flags_ be _R_.[[OriginalFlags]].
+              1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
+              1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. Let _matcher_ be _R_.[[RegExpMatcher]].
             1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*.
             1. Let _matchSucceeded_ be *false*.

But I'm not sure that's the correct way to fix this bug.

(If anyone wonders why global and sticky are also reset, that's to avoid regressing #494, #489.)

@ljharb

This comment has been minimized.

Show comment
Hide comment
@ljharb
Member

ljharb commented Jan 23, 2017

@claudepache

This comment has been minimized.

Show comment
Hide comment
@claudepache

claudepache Jan 23, 2017

Contributor

In PR #627, I had incorrectly removed a “useless” read access to the lastIndex property for non-global-and-non-sticky regexps, not noticing that the apparently innocuous ToLength() operation may in fact produce a regexp recompilation!

The proper fix is probably to revert that part of the PR, i.e.:

             1. Assert: _R_ is an initialized RegExp instance.
             1. Assert: Type(_S_) is String.
             1. Let _length_ be the number of code units in _S_.
+            1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
             1. Let _flags_ be _R_.[[OriginalFlags]].
             1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
             1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0.
-            1. Else, let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
             1. Let _matcher_ be _R_.[[RegExpMatcher]].
             1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*.
             1. Let _matchSucceeded_ be *false*.
Contributor

claudepache commented Jan 23, 2017

In PR #627, I had incorrectly removed a “useless” read access to the lastIndex property for non-global-and-non-sticky regexps, not noticing that the apparently innocuous ToLength() operation may in fact produce a regexp recompilation!

The proper fix is probably to revert that part of the PR, i.e.:

             1. Assert: _R_ is an initialized RegExp instance.
             1. Assert: Type(_S_) is String.
             1. Let _length_ be the number of code units in _S_.
+            1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
             1. Let _flags_ be _R_.[[OriginalFlags]].
             1. If _flags_ contains `"g"`, let _global_ be *true*, else let _global_ be *false*.
             1. If _flags_ contains `"y"`, let _sticky_ be *true*, else let _sticky_ be *false*.
             1. If _global_ is *false* and _sticky_ is *false*, let _lastIndex_ be 0.
-            1. Else, let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
             1. Let _matcher_ be _R_.[[RegExpMatcher]].
             1. If _flags_ contains `"u"`, let _fullUnicode_ be *true*, else let _fullUnicode_ be *false*.
             1. Let _matchSucceeded_ be *false*.
@anba

This comment has been minimized.

Show comment
Hide comment
@anba

anba Jan 23, 2017

Contributor

In PR #627, I had incorrectly removed a “useless” read access to the lastIndex property for non-global-and-non-sticky regexps, not noticing that the apparently innocuous ToLength() operation may in fact produce a regexp recompilation!

I thought that part was required for #627, but if that's not the case, your fix is certainly the way to go!

Contributor

anba commented Jan 23, 2017

In PR #627, I had incorrectly removed a “useless” read access to the lastIndex property for non-global-and-non-sticky regexps, not noticing that the apparently innocuous ToLength() operation may in fact produce a regexp recompilation!

I thought that part was required for #627, but if that's not the case, your fix is certainly the way to go!

@claudepache

This comment has been minimized.

Show comment
Hide comment
@claudepache

claudepache Jan 23, 2017

Contributor

I thought that part was required for #627

It was not. The motivation was Issue #625, and the required part is avoiding unnecessary writing to the lastIndex property.

In ”reasonable” situations, uselessly reading the lastIndex property has zero side-effect. I was just guilty of premature optimisation.

Contributor

claudepache commented Jan 23, 2017

I thought that part was required for #627

It was not. The motivation was Issue #625, and the required part is avoiding unnecessary writing to the lastIndex property.

In ”reasonable” situations, uselessly reading the lastIndex property has zero side-effect. I was just guilty of premature optimisation.

@littledan

This comment has been minimized.

Show comment
Hide comment
@littledan
Member

littledan commented Jan 23, 2017

cc @schuay

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment