Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: Always check regular expression flags by "flags" #2791

Merged
merged 1 commit into from
Aug 4, 2022

Conversation

gibson042
Copy link
Contributor

@gibson042 gibson042 commented Jun 2, 2022

Discussion at #2418 (comment) uncovered a strange inconsistency: unlike other methods on or related to regular expressions, RegExp.prototype[@@match] and RegExp.prototype[@@replace] bypass the "flags" property, instead reading directly (and only) from "global" and "unicode"—and doing so conditionally, checking the latter if and only if the former is coerced to boolean true. This is awkward for the introduction of v-mode regular expressions, which need to check a third flag in at least some cases and therefore require taking a position on when to get the property (which is observable).

But these are the only two methods that have such a problem... String.prototype.matchAll and String.prototype.replaceAll and RegExp.prototype[@@matchAll] and RegExp.prototype[@@split] get "flags", coerce to string, and then check the result for "u"/"g"/etc. (and the built-in RegExp.prototype.flags itself performs an unconditional Get of each specific flag property in a stable order that is compatible with the deviant operations [i.e., "global" before "unicode"]).

I would like to fix the inconsistency and render the negotiation of #2418 (comment) moot by updating @@match and @@replace to align with the other methods in using "flags". This is a normative change, but one that seems likely to be web-compatible because any well-behaved regular expression analog is already required to support "flags" for the other methods, and deviation would actually require special effort—an author would need to copy the built-in @@match and/or @@replace but specifically remove or override the built-in flags getter to be inconsistent with independent global/unicode access. And if we don't fix this now, then the problem is only going to get worse as more flags are added.

@gibson042 gibson042 added the normative change Affects behavior required to correctly evaluate some ECMAScript source text label Jun 2, 2022
@ljharb
Copy link
Member

ljharb commented Jun 2, 2022

This imo goes in the wrong direction - it creates more observable behavior, via the regex subclassing system that none of us want but it's too late to remove, and it may have the unfortunate side effect of causing websites using polyfills to break (I absolutely have tests, at least, that would break with this change, that rely on .flags not being observably read in these methods).

@bakkot
Copy link
Contributor

bakkot commented Jun 2, 2022

@ljharb This seems to me to reduce the amount of observable behavior? Previously @@replace could do two lookups (of .global and .unicode); now it only does one (of .flags).

@ljharb
Copy link
Member

ljharb commented Jun 2, 2022

lol i see that argument, and to be sure, the real reduced one would be checking if g and u are in the flags, fetched from the slot, rather than looking up global/unicode at all.

@mathiasbynens
Copy link
Member

I’d be stoked if we can get away with this (which as @gibson042 points out, actually seems plausible).

@gibson042
Copy link
Contributor Author

it may have the unfortunate side effect of causing websites using polyfills to break (I absolutely have tests, at least, that would break with this change, that rely on .flags not being observably read in these methods).

@ljharb can you elaborate on the purpose of verifying that built-in RegExp.prototype[@@match] and/or RegExp.prototype[@@replace] do not Get "flags"? Is it something like an indirect implementation detection? What would be the consequences of an engine that does not currently perform such a read starting to do so? I'd appreciate a reference to the code in question.

@ljharb
Copy link
Member

ljharb commented Jun 3, 2022

@gibson042 The purpose of verifying it would be that it's in the spec, and thus that's the precise behavior engines are required to have. Since no engines I'm aware of have implemented this incorrectly, i don't think I have any code that would care - but an in-progress shim I'm working on for the regex symbol methods (which isn't public yet) would need to test for this PR's change, and then it would need to patch the relevant methods in every browser prior to it, even if otherwise no patch would be needed.

@gibson042
Copy link
Contributor Author

It's rare for tests to verify that things don't happen, and the set of such (non-)events is unbounded (although in this particular case two get-unicode-error.js tests introduced by @jugglinmike in tc39/test262#352 do in fact attempt to do so).

Regardless, except for hypothetical code that that copies the built-in @@match and/or @@replace but specifically removes or overrides the built-in flags getter (which I posit is negligible in the same sense as other code affected by breaking changes, especially if the alternative is renegotiating every time a relevant new flag is introduced), the changes of this PR observable by ECMAScript code would be purely additive:

Receiver that self-reports as non-global
(ToBoolean(? Get(rx, "global")) is false)

+Get "flags"
+# Built-in "flags" getter:
 Get "global"
+Get "hasIndices"
+Get "global"
+Get "ignoreCase"
+Get "multiline"
+Get "dotAll"
+Get "unicode"
+Get "sticky"

Receiver that self-reports as global
(ToBoolean(? Get(rx, "global")) is true)

+Get "flags"
+# Built-in "flags" getter:
 Get "global"
+Get "hasIndices"
+Get "global"
+Get "ignoreCase"
+Get "multiline"
+Get "dotAll"
 Get "unicode"
+Get "sticky"

The responsible discrepancy seems to date back to ES2015, in which RegExp.prototype [ @@split ] reads "flags" but @@match and @@replace have conditional single-flag reads. The previous edition (ES5.1) didn't specify a flags property at all, and I suspect the difference was just an oversight—split is sensitive only to new "unicode" and "sticky" flags, while match and replace gained sensitivity to the new "unicode" flag in translations of preexisting sensitivity to the "global" flag like «Let global be the result of calling the [[Get]] internal method of rx with argument "global"» and «If searchValue.global is {false,true}, then...». All three exist in support of String method signatures dating back to ES3, and it would surprise me if any code depends upon exactly two of them not reading a property that was introduced later. I think we should at least try to find out.

@ljharb
Copy link
Member

ljharb commented Jun 3, 2022

I would expect that the very inconsistency you're trying to fix here is why this specific non-happening would be tested for, but I agree it's going to be rare in existing code.

@michaelficarra michaelficarra added the needs consensus This needs committee consensus before it can be eligible to be merged. label Jun 6, 2022
@ljharb
Copy link
Member

ljharb commented Jul 20, 2022

On a re-review of this PR and the comment thread, I'm completely convinced that this is a good change that reduces observable lookups.

It will likely have two negative effects for my packages:

  1. some of my tests will break, but these are easily fixed
  2. a faithful polyfill for these two methods will need to apply the shim for this purpose in browsers that ordinarily will not need it. However, none of the existing published es-shims code checks for this, so it will be unlikely to have a major impact before browsers are able to update.

imo, neither of these negative effects should obstruct consensus on this PR.

@gibson042
Copy link
Contributor Author

test262 PR: tc39/test262#3618

@michaelficarra michaelficarra added has consensus This has committee consensus. has test262 tests and removed needs consensus This needs committee consensus before it can be eligible to be merged. labels Jul 27, 2022
@michaelficarra michaelficarra requested a review from a team July 27, 2022 21:40
@michaelficarra michaelficarra added the ready to merge Editors believe this PR needs no further reviews, and is ready to land. label Aug 3, 2022
@ljharb ljharb force-pushed the 2022-06-check-regexp-by-flags branch from 7f45c77 to 35b7eb2 Compare August 3, 2022 22:20
@ljharb ljharb merged commit 35b7eb2 into tc39:main Aug 4, 2022
ljharb pushed a commit to gibson042/test262 that referenced this pull request Aug 4, 2022
DerekNonGeneric pushed a commit to DerekNonGeneric/tc39-ecma262 that referenced this pull request Aug 6, 2022
ptomato pushed a commit to gibson042/test262 that referenced this pull request Aug 9, 2022
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this pull request Sep 9, 2022
shvaikalesh pushed a commit to shvaikalesh/WebKit that referenced this pull request Dec 7, 2022
…ter only

https://bugs.webkit.org/show_bug.cgi?id=248605

Reviewed by NOBODY (OOPS!).

This change implements recent spec change [1], aligning RegExp.prototype's
@@match / @@replace with @@split / @@matchall to use only "flags" getter to
check for flags, which is observable.

Ensures that DFG watches and constant-folds all related invoked prototype getters,
which was proven to be neutral on both JetStream2 and Speedometer2.

[1]: tc39/ecma262#2791

* JSTests/test262/expectations.yaml: Mark 12 test cases as passing.
* Source/JavaScriptCore/builtins/BuiltinNames.h:
* Source/JavaScriptCore/builtins/RegExpPrototype.js:
(linkTimeConstant.hasObservableSideEffectsForRegExpMatch):
(linkTimeConstant.matchSlow):
(overriddenName.string_appeared_here.replace):
(linkTimeConstant.hasObservableSideEffectsForRegExpSplit):
* Source/JavaScriptCore/builtins/StringPrototype.js:
(linkTimeConstant.hasObservableSideEffectsForStringReplace):
* Source/JavaScriptCore/bytecode/LinkTimeConstant.h:
* Source/JavaScriptCore/dfg/DFGAbstractInterpreterInlines.h:
(JSC::DFG::AbstractInterpreter<AbstractStateType>::executeEffects):
* Source/JavaScriptCore/dfg/DFGFixupPhase.cpp:
(JSC::DFG::FixupPhase::addStringReplacePrimordialChecks):
* Source/JavaScriptCore/runtime/JSGlobalObject.cpp:
(JSC::JSGlobalObject::init):
* Source/JavaScriptCore/runtime/JSGlobalObject.h:
* Source/JavaScriptCore/runtime/JSGlobalObjectInlines.h:
(JSC::JSGlobalObject::regExpProtoFlagsGetter const):
(JSC::JSGlobalObject::regExpProtoDotAllGetter const):
(JSC::JSGlobalObject::regExpProtoHasIndicesGetter const):
(JSC::JSGlobalObject::regExpProtoIgnoreCaseGetter const):
(JSC::JSGlobalObject::regExpProtoMultilineGetter const):
(JSC::JSGlobalObject::regExpProtoStickyGetter const):
mathiasbynens added a commit to mathiasbynens/ecma262 that referenced this pull request Apr 14, 2023
mathiasbynens added a commit to mathiasbynens/ecma262 that referenced this pull request Apr 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus This has committee consensus. has test262 tests normative change Affects behavior required to correctly evaluate some ECMAScript source text ready to merge Editors believe this PR needs no further reviews, and is ready to land.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants