Skip to content
This repository has been archived by the owner on Jan 26, 2022. It is now read-only.

Re-evaluate IsRegExp call in MatchAllIterator? #34

Closed
ljharb opened this issue Mar 5, 2018 · 30 comments
Closed

Re-evaluate IsRegExp call in MatchAllIterator? #34

ljharb opened this issue Mar 5, 2018 · 30 comments

Comments

@ljharb
Copy link
Member

ljharb commented Mar 5, 2018

@anba's concern from #33 (comment):

And I'm not sure about the first IsRegExp call in MatchAllIterator and I wonder if it makes more sense to always take the first branch in MatchAllIterator if called from RegExp.prototype[@@matchall] for consistency with RegExp.prototype[@@split]. But that should definitely be discussed in a different issue.

(Pending merging of #33)

@anba
Copy link
Collaborator

anba commented Apr 12, 2018

Let's talk first about the second IsRegExp call in MatchAllIterator (the one directly after calling RegExpCreate).

We currently have:

  1. Else,
    a. Let flags be "g".
    b. Let matcher be ? RegExpCreate(R, flags).
    c. If ? IsRegExp(matcher) is not true, throw a TypeError exception.
    d. Let global be true.
    e. Let fullUnicode be false.
    f. If ? Get(matcher, "lastIndex") is not 0, throw a TypeError exception.

ECMA-262 never calls IsRegExp after RegExpCreate, so adding it here seems like a bad precedence. And if the IsRegExp call is removed, step 3.f can also be changed back to Assert: Get(matcher, "lastIndex") is 0.. (In general I think we should avoid adding arbitrary checks when there's no apparent reason to do so. For example MatchAllIterator could also throw an exception if %RegExpStringIteratorPrototype%.next is no longer a function object, because that indicates it's not possible to iterate over RegExp String Iterator objects. But adding such a check would be really weird! 😄 )

Now the other IsRegExp call in MatchAllIterator. I have two separate concerns here:

  1. When MatchAllIterator is called from RegExp.prototype [ @@matchAll ], we should/have to assume the user explicitly decided to treat R as a RegExp object, so having an additional IsRegExp call to change this decision seems questionable. It's also not consistent with how the other RegExp.prototype methods work.
  2. When MatchAllIterator is called from String.prototype.matchAll, I'd prefer to handle it more like the other String.prototype methods which create RegExp objects (that means String.prototype.match and String.prototype.search), because I want to avoid adding yet another way to handle RegExp sub-classes: There are already two different RegExp sub-classing/extension interfaces: The RegExp.prototype methods all call RegExpExec, which means sub-classes, or any other classes, only need to provide their own "exec" methods when they want to reuse the other RegExp.prototype methods. And in addition to that, the @@match/replace/search/split interfaces allow sub-classes to provide their own implementations for just these methods. The match-all proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the IsRegExp abstract operation).
    In my opinion we should only support RegExp sub-classing in two ways:
    1. Either the RegExp sub-class has %RegExpPrototype% on its prototype chain,
    2. Or the RegExp sub-class copies the relevant methods from %RegExpPrototype% into its prototype object.

Which means String.prototype.matchAll and RegExp.prototype[@@matchAll] should be changed to:

String.prototype.matchAll(_regexp_)
  1. Let _O_ be ? RequireObjectCoercible(*this* value).
  1. If _regexp_ is neither *undefined* nor *null*, then
    1. Let _matcher_ be ? GetMethod(_regexp_, @@matchAll).
    1. If _matcher_ is not *undefined*, then
      1. Return ? Call(_matcher_, _regexp_, « _O_ »).
  1. Let _string_ be ? ToString(_O_).
  1. Let _rx_ be ? RegExpCreate(_regexp_, `"g"`).
  1. Return ? Invoke(_rx_, @@matchAll, « _string_ »).

RegExp.prototype[@@matchAll](_string_)
  1. Let _R_ be the *this* value.
  1. If Type(_R_) is not Object, throw a *TypeError* exception.
  1. Let _S_ be ? ToString(_string_).
  1. Let _C_ be ? SpeciesConstructor(_R_, %RegExp%).
  1. Let _flags_ be ? ToString(? Get(_R_, `"flags"`)).
  1. Let _matcher_ be ? Construct(_C_, « _R_, _flags_ »).
  1. Let _global_ be ? ToBoolean(? Get(_matcher_, `"global"`)).
  1. Let _fullUnicode_ be ? ToBoolean(? Get(_matcher_, `"unicode"`).
  1. Let _lastIndex_ be ? ToLength(? Get(_R_, `"lastIndex"`)).
  1. Perform ? Set(_matcher_, `"lastIndex"`, _lastIndex_, *true*).
  1. Return ! CreateRegExpStringIterator(_matcher_, _S_, _global_, _fullUnicode_).

While this does create two RegExp objects for String.prototype.matchAll when it is naively implemented, it shouldn't be too hard to optimise in actual implementations to avoid the extra RegExp allocations. Or at least it isn't harder to optimise when compared to the other String.prototype and RegExp.prototype methods.

@littledan
Copy link
Member

Given how there are test262 tests and a V8 implementation, these questions need to be answered quickly. I share @anba's intuitions here.

@ljharb
Copy link
Member Author

ljharb commented Apr 13, 2018

Regarding the first issue, I think that reasoning makes sense - I'll prepare that change.

Regarding the second one: I'll think about your comment and provide a thorough response later today.

@ljharb
Copy link
Member Author

ljharb commented Apr 13, 2018

@anba
In general, I've tried to make matchAll consistent with match above all else - however, "match" is special, because it's the marker for IsRegExp, so there are a few places that I think it makes sense to deviate slightly.

Thus, RegExp.prototype[Symbol.match] doesn't check IsRegExp because it doesn't have to - its presence makes it a regexp. You are correct that no other RegExp.prototype method - including any of the well-known Symbol methods - check IsRegExp. However, I still think there's value in including the check, because then RegExp.prototype[Symbol.match].call(x) can actually ensure you passed a valid x. I'm not going to die on this hill tho; if you think that the implementation cost of having this extra check outweighs the "earlier error" value, then I can remove this one.

Regarding the other part: there's already, sadly, multiple ways to subclass regexes: extends RegExp, to set up the slots, and IsRegExp, which a number of methods use. As much as I'd like to remove these, they're how regexp subclassing is done in the ES6 design. I don't think we can - or should - deviate from "relying on IsRegExp" unless we can do so throughout the spec, and I think @allenwb has a number of reasons why he doesn't want that to happen.

Your suggested change makes .matchAll not be robust against delete RegExp.prototype[Symbol.matchAll], and I'm not sure why that's a good idea, nor why keeping that robustness makes things harder to optimize than with your suggestion.

@ljharb
Copy link
Member Author

ljharb commented Apr 13, 2018

Filed #35 for the first change (of the three discussed).

@anba
Copy link
Collaborator

anba commented Apr 13, 2018

Your suggested change makes .matchAll not be robust against delete RegExp.prototype[Symbol.matchAll], and I'm not sure why that's a good idea, nor why keeping that robustness makes things harder to optimize than with your suggestion.

Why does it need to be robust against deleting RegExp.prototype[@@matchAll]? We have the same situation with String.prototype.match and String.prototype.search, where we don't care if the corresponding RegExp.prototype method was deleted:

delete RegExp.prototype[Symbol.match];
"".match(/a/); // <- Throws a TypeError

delete RegExp.prototype[Symbol.search];
"".search(/a/); // <- Throws a TypeError

@ljharb
Copy link
Member Author

ljharb commented Apr 13, 2018

Indeed; that is the precedent - but I think it's a terrible one, and I wanted to do something better with this proposal than the status quo.

@anba
Copy link
Collaborator

anba commented Apr 13, 2018

If someone forcibly disrupts standard built-in methods or properties, it's their problem. For example we also thrown an error from String.prototype.match if RegExp.prototype[Symbol.match] was changed to a non-callable value (7.3.9 GetMethod throws an error for non-callable values). GetMethod could have been spec'ed to simply ignore non-callables, but we explicitly decided against that.

@ljharb
Copy link
Member Author

ljharb commented Apr 13, 2018

So why is it OK to throw on non-callable values (which I agree with), but not OK here to throw on a non-regex being passed into a method that requires a regex?

@littledan
Copy link
Member

I agree with @anba here. I think it's weird to guard against these things.

We've been discussing various guards for some months now on GitHub. If we're not able to come to agreement here, maybe it should be brought to the committee.

@ljharb
Copy link
Member Author

ljharb commented Apr 14, 2018

I think the real question is, is this an implementation concern? This wasn’t called out during all of the spec reviews prior to stage 3 - I’m not sure how the suggested change eases implementability; either case seems equally optimizeable in the scenario of an unmodified matchAll function; but as-is, it’d perform the same in the case of an absent one.

@littledan
Copy link
Member

As proposals move along between Stage 3 and Stage 4, deeper review ends up happening, and tweaks are made. This has happened with basically all large Stage 3 proposals, even for changes that are not driven my implementation concerns themselves. I am sorry I didn't call this our earlier.

@ljharb
Copy link
Member Author

ljharb commented Apr 14, 2018

Sounds like I’ll have to add this to the May agenda then (altho #35 can be merged in the meantime)

@tschneidereit
Copy link
Member

FWIW, I agree with @anba and @littledan: even agreed that the currently proposed behavior is slightly nicer, consistency beats that in importance.

Regarding the concern about this being raised late in the process: I think it would've helped to actively point this out as deviating from precedent. (AFAIK that hasn't happened - my apologies if that is incorrect!) It might make sense to make this an explicit recommendation in our various docs on championing.

@ljharb
Copy link
Member Author

ljharb commented Apr 16, 2018

I thought this came up in committee around #28, but it's possible that it was overlooked.

@mathiasbynens
Copy link
Member

+@hashseed @schuay

@ljharb
Copy link
Member Author

ljharb commented May 22, 2018

@littledan @anba so after rereading this issue: it seems like the primary remaining objection is the "IsRegExp" check after the RegExpCreate check.

My intention is to get as early an error as possible - specifically, before the iterator is created, instead of during iteration. However, the only error this might guard against is in https://tc39.github.io/ecma262/#sec-regexpexec, whose code path doesn't actually care about Symbol.match anyways.

Would you be content with replacing the IsRegExp check with https://tc39.github.io/ecma262/#sec-regexpexec step 5 - specifically, "If matcher does not have a [[RegExpMatcher]] internal slot, throw a TypeError exception."? This shouldn't be observable in the happy path, and shouldn't impact performance.

@littledan
Copy link
Member

@ljharb I don't think such a check will ever result in a TypeError thrown; that internal slot will always exist on a return value of RegExpCreate AFAICT (though it might have a different prototype or the prototype may have been mutated).

@ljharb
Copy link
Member Author

ljharb commented May 22, 2018

ah, good point - you're right.

It seems that the appropriate resolution here, then, is to just remove the check - which closes the issue and obviates the need to come back to the committee.

@littledan
Copy link
Member

Maybe it'd be worth a 30-second status update to say that you're making this change and that this resolves the issue. I like to do a quick committee presentation for changes to Stage 3 proposals, personally.

@ljharb
Copy link
Member Author

ljharb commented May 22, 2018

Sounds great, I'll do that later today.

@ljharb
Copy link
Member Author

ljharb commented May 22, 2018

Closed with #35.

@ljharb ljharb closed this as completed May 22, 2018
@anba
Copy link
Collaborator

anba commented Jun 13, 2018

#35 only removed one IsRegExp call, but left the other one. I still think the reasoning outlined in #34 (comment) applies, for example when calling RegExp.prototype[@@matchAll] with an object X should always take the RegExp path in MatchAllIterator, independent of whether or not IsRegExp(X) returns true or false.

@ljharb
Copy link
Member Author

ljharb commented Jun 13, 2018

@anba thanks, can you file a new issue for that?

@mathiasbynens mathiasbynens mentioned this issue Jun 20, 2018
28 tasks
@ljharb
Copy link
Member Author

ljharb commented Aug 8, 2018

So, I've taken another look at this, and precisely because of the different behavior in the regexp vs string paths, the remaining IsRegExp call is absolutely necessary, and it's not reasonably to assume that R is only a regex.

I'm going to keep this as-is, and implementations should continue to ship. If there's further arguments, I'd be happy to have them on a new issue.

@mathiasbynens
Copy link
Member

So, I've taken another look at this, and precisely because of the different behavior in the regexp vs string paths, the remaining IsRegExp call is absolutely necessary, and it's not reasonably to assume that R is only a regex.

Can you elaborate on why it's "absolutely necessary"? @anba had a lengthy explanation of his reasoning, including a suggested fix, in #34 (comment). Why is the suggested fix being rejected?

@mathiasbynens
Copy link
Member

Ah, the bit of info I was missing was this comment in another thread:

(the committee has consensus for the dual behavior, and the call is required to maintain that)

IIRC, the last time matchAll was brought to committee was in January. The discussion in #34 (comment) happened later, in March. Given that several folks expressed a preference for @anba's proposed semantics, it would be valuable to get explicit consensus by presenting both options to the committee. What do you think?

@ljharb
Copy link
Member Author

ljharb commented Aug 8, 2018

In January, we talked about what it should do with a string, and the committee consensus at that time (which included both options) was what we ended up with - not creating a regex with the string path, which resulted in the branching for string vs regex.

@ljharb
Copy link
Member Author

ljharb commented Aug 8, 2018

If you think it needs to come back to committee, please file a separate issue - @anba’s comment is exceeding difficult for me to unpack, and to be able to confidently discuss anything about it in committee, id need to see arguments stated separately.

mathiasbynens added a commit to mathiasbynens/proposal-string-matchall that referenced this issue Aug 8, 2018
@anba outlined two separate concerns about the remaining `IsRegExp` call in `MatchAllIterator`. Quoting from tc39#34 (comment):

1. When `MatchAllIterator` is called from `RegExp.prototype [ @@matchall ]`, we should/have to assume the user explicitly decided to treat `R` as a RegExp object, so having an additional `IsRegExp` call to change this decision seems questionable. It’s also not consistent with how the other `RegExp.prototype` methods work.

2. When `MatchAllIterator` is called from `String.prototype.matchAll`, I’d prefer to handle it more like the other `String.prototype` methods which create RegExp objects (that means `String.prototype.match` and `String.prototype.search`), because I want to avoid adding yet another way to handle RegExp sub-classes. There are already two different RegExp sub-classing/extension interfaces: the `RegExp.prototype` methods all call `RegExpExec`, which means sub-classes, or any other classes, only need to provide their own `exec` methods when they want to reuse the other `RegExp.prototype` methods. And in addition to that, the `@@match/replace/search/split` interfaces allow sub-classes to provide their own implementations for just these methods. The `matchAll` proposal in its current form adds another dimension to this by providing different code paths depending on whether or not an object is RegExp-like (as per the `IsRegExp` abstract operation). In my opinion we should only support RegExp sub-classing in two ways: 1) Either the RegExp sub-class has `%RegExpPrototype%` on its prototype chain, or 2) the RegExp sub-class copies the relevant methods from `%RegExpPrototype%` into its prototype object.

Ref. tc39#21, tc39#34.
@mathiasbynens
Copy link
Member

@ljharb I agree @anba’s comment packs a lot of information! I’ve opened #37 based on @anba’s proposed change (as a pull request instead of an issue, in the hopes it helps to make the difference clear to others). Let’s move the discussion there.

ljharb added a commit that referenced this issue Aug 8, 2018
 - `String.prototype.matchAll`:
   - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found
   - fall back to regex coercion otherwise
 - `RegExp.prototype[Symbol.matchAll]`:
   - receiver is assumed to be a regex implicitly
 - remove `MatchAllIterator` abstract operation

Thus, `IsRegExp` call no longer exists.

Addresses #21. Addresses #34. Closes #37.
ljharb added a commit that referenced this issue Aug 9, 2018
 - `String.prototype.matchAll`:
   - use `RegExpCreate` when `Symbol.prototype.matchAll` is not found
   - fall back to regex coercion otherwise
 - `RegExp.prototype[Symbol.matchAll]`:
   - receiver is assumed to be a regex implicitly
 - remove `MatchAllIterator` abstract operation

Thus, `IsRegExp` call no longer exists.

Addresses #21. Addresses #34. Closes #37.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants