Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upAnnex B: Specify non-standard RegExp static properties #137
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
mathiasbynens
Oct 28, 2015
Member
After that, the only remaining non-standard-but-widely-supported JS feature involving regular expressions is their support for octal escapes, even in strict mode code:
(function() {
'use strict';
console.log(/\123/.test('S'));
}());https://javascript.spec.whatwg.org/#octal-escapes-in-regular-expression-literals
|
After that, the only remaining non-standard-but-widely-supported JS feature involving regular expressions is their support for octal escapes, even in strict mode code: (function() {
'use strict';
console.log(/\123/.test('S'));
}());https://javascript.spec.whatwg.org/#octal-escapes-in-regular-expression-literals |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Oct 28, 2015
Contributor
After that, the only remaining non-standard-but-widely-supported JS feature is octal escapes in regular expressions:
I think those are already specified in B.1.4 (search for "LegacyOctalEscapeSequence")
I think those are already specified in B.1.4 (search for "LegacyOctalEscapeSequence") |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
WebReflection
Oct 28, 2015
AFAIK re.test(str) also changes them, as well as str.split(re) or even a more classic replace so while I agree about having them as configurable accessors, I think RegExp.reset() to clean up all leaks at once would make way more sense, otherwise devs might feel safe to delete RegExp.$1 forgetting to delete RegExp['$&'] too or others.
WebReflection
commented
Oct 28, 2015
|
AFAIK |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Oct 28, 2015
Contributor
AFAIK re.test(str) also changes them, as well as str.split(re) or even a more classic replace
In ES2015, all RegExp methods and all String methods that work with regexp have been rewritten to call internally Regexp#exec, so that interesting stuff is happening in only one place.
I think RegExp.reset() to clean up all leaks at once would make way more sense, otherwise devs might feel safe to delete RegExp.$1 forgetting to delete RegExp['$&'] too or others.
The complete list of offending properties may be disclosed with Object.getOwnProperties(RegExp). Sure, a RemoveAllWarts() method might be considered, but new API should not be proposed here (that should be a separate discussion starting on es-discuss according to CONTRIBUTING.md#Feature Requests).
In ES2015, all RegExp methods and all String methods that work with regexp have been rewritten to call internally Regexp#exec, so that interesting stuff is happening in only one place.
The complete list of offending properties may be disclosed with |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
ljharb
Oct 29, 2015
Member
Please also note there's interesting variations in all these when you have more than 10 matches. For example, in IE, lastParen breaks with more than 11 capturing groups.
You're also omitting the multiline / $* pair, which is in every browser except (I think) IE 9-11.
|
Please also note there's interesting variations in all these when you have more than 10 matches. For example, in IE, You're also omitting the |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Oct 29, 2015
Contributor
You're also omitting the multiline / $* pair, which is in every browser except (I think) IE 9-11.
Yes, I've omitted them because I've noticed that they're absent from IE11 and I couldn't check Edge.
Could you confirm that they're present in Edge?
(Also, just learned what it means: https://twitter.com/bterlson/status/563540725770375168, ugh; I'd prefer to kill it.)
Yes, I've omitted them because I've noticed that they're absent from IE11 and I couldn't check Edge. Could you confirm that they're present in Edge? (Also, just learned what it means: https://twitter.com/bterlson/status/563540725770375168, ugh; I'd prefer to kill it.) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Oct 29, 2015
Contributor
(Also, just learned what it means: https://twitter.com/bterlson/status/563540725770375168, ugh; I'd prefer to kill it.)
Well, just tested it:
(function() {
RegExp.multiline = true;
if (!RegExp.multiline)
return 'RegExp.multiline cannot be set to true...';
if (/foo/.multiline)
return "RegExp.multiline forces multiline flag on new RegExp :-(";
else
return "RegExp.multiline has no effect :-)";
})()Safari and Chrome seem to ignore that misfeature, let's kill it.
Well, just tested it: (function() {
RegExp.multiline = true;
if (!RegExp.multiline)
return 'RegExp.multiline cannot be set to true...';
if (/foo/.multiline)
return "RegExp.multiline forces multiline flag on new RegExp :-(";
else
return "RegExp.multiline has no effect :-)";
})()Safari and Chrome seem to ignore that misfeature, let's kill it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
anba
Oct 29, 2015
Contributor
V8 bug report to remove multiline: https://code.google.com/p/v8/issues/detail?id=3870
|
V8 bug report to remove |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Oct 29, 2015
Contributor
And just opened Mozilla bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1219757
|
And just opened Mozilla bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1219757 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
caitp
Nov 9, 2015
Contributor
https://code.google.com/p/v8/issues/detail?id=3870#c11 << these probably do need to be annex-B-ified, since apparently Edge wasn't able to get rid of them :(
|
https://code.google.com/p/v8/issues/detail?id=3870#c11 << these probably do need to be annex-B-ified, since apparently Edge wasn't able to get rid of them :( |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bterlson
Nov 9, 2015
Member
Indeed. @goyakin is working on a complete list which includes most (maybe all?) of those things.
|
Indeed. @goyakin is working on a complete list which includes most (maybe all?) of those things. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 9, 2015
Contributor
https://code.google.com/p/v8/issues/detail?id=3870#c11 << these probably do need to be annex-B-ified, since apparently Edge wasn't able to get rid of them :(
Yes, that corresponds to the list of properties I've given in the first comment (obtained as the intersection of the results of Object.getOwnPropertyNames(RegExp) among various browsers).
Yes, that corresponds to the list of properties I've given in the first comment (obtained as the intersection of the results of |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
allenwb
Nov 9, 2015
Member
Note sure whether this was already covered somewhere above, but I would be great if the Annex B specification of the RefExp static properties was specified as only happening for direct instances of RegExp.
In other words, RegExp subclass should not get these static own properties updated and then this abomination goes away if you use a RegExp subclass. Also, the the static accessor properties for them on RegExp should have an internal check to make sure that the this value is %RegExp%.
Then devs can do:
class RegExpNoStatics extends RegExp {}; //really, that's all there is to it
match = RegExpNoStatics(/somePattern/).exec(aString); //no global state will be modified|
Note sure whether this was already covered somewhere above, but I would be great if the Annex B specification of the RefExp static properties was specified as only happening for direct instances of RegExp. In other words, RegExp subclass should not get these static own properties updated and then this abomination goes away if you use a RegExp subclass. Also, the the static accessor properties for them on RegExp should have an internal check to make sure that the Then devs can do: class RegExpNoStatics extends RegExp {}; //really, that's all there is to it
match = RegExpNoStatics(/somePattern/).exec(aString); //no global state will be modified |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 10, 2015
Contributor
I’ve written a draft here: https://github.com/claudepache/es-regexp-legacy-static-properties
It implements the suggestion of @allenwb in previous comment. The alternative design is to update the values returned by the %RegExp% static properties also for proper subclasses of RegExp. It is, I think, the only design decision that is really debatable.
|
I’ve written a draft here: https://github.com/claudepache/es-regexp-legacy-static-properties It implements the suggestion of @allenwb in previous comment. The alternative design is to update the values returned by the %RegExp% static properties also for proper subclasses of RegExp. It is, I think, the only design decision that is really debatable. |
added a commit
to tc39/proposal-regexp-legacy-features
that referenced
this issue
Nov 10, 2015
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
anba
Nov 10, 2015
Contributor
Another open design decision (related to the [[LegacyConstructor]] internal slot from @claudepache's draft): V8 seems to store the non-standard properties in current Realm whereas SpiderMonkey uses the realm of the RegExp object instance. What do other browsers do?
|
Another open design decision (related to the [[LegacyConstructor]] internal slot from @claudepache's draft): V8 seems to store the non-standard properties in current Realm whereas SpiderMonkey uses the realm of the RegExp object instance. What do other browsers do? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bterlson
Nov 10, 2015
Member
@anba does this codepen show what you're talking about? http://codepen.io/anon/pen/RWqpza I think not because everyone that I see prints "Origin realm".
|
@anba does this codepen show what you're talking about? http://codepen.io/anon/pen/RWqpza I think not because everyone that I see prints "Origin realm". |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
anba
Nov 10, 2015
Contributor
@bterlson If you replace re.exec('ttttt'); with RegExp.prototype.exec.call(re, 'ttttt');, Chrome displays "Current realm" whereas Firefox displays "Origin realm".
|
@bterlson If you replace |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bterlson
Nov 10, 2015
Member
@anba awesome, thanks, updated the pen.
And, Edge matches Chrome (ie. properties are updated on the %RegExp% from the exec method's realm not the instance's realm). I guess I'd argue for these semantics to be codified on a "majority rules" basis :)
|
@anba awesome, thanks, updated the pen. And, Edge matches Chrome (ie. properties are updated on the %RegExp% from the exec method's realm not the instance's realm). I guess I'd argue for these semantics to be codified on a "majority rules" basis :) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
allenwb
Nov 10, 2015
Member
Note that Claude's proposal cases the legacy static property to all answer the empty string when invoked upon a RegExp subclass constructor.
I think it would be better if they answered undefined in those cases (I could also argue for throwing a TypeError, but I prefer undefined so that, from a subclass class perspective, it looks more like those methods simply aren't there.)
This can be specified by making the first line of the algorithm of the getter function of each of those methods:
- If SameValue(this value, %RegExp%) is false, return undefined.
|
Note that Claude's proposal cases the legacy static property to all answer the empty string when invoked upon a RegExp subclass constructor. I think it would be better if they answered undefined in those cases (I could also argue for throwing a TypeError, but I prefer undefined so that, from a subclass class perspective, it looks more like those methods simply aren't there.) This can be specified by making the first line of the algorithm of the getter function of each of those methods:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 10, 2015
Contributor
Note that Claude's proposal cases the legacy static property to all answer the empty string when invoked upon a RegExp subclass constructor.
Worse: They return answers bound to the last match made with a direct instance of RegExp.
I think it would be better if they answered undefined in those cases (I could also argue for throwing a TypeError, but I prefer undefined so that, from a subclass class perspective, it looks more like those methods simply aren't there.)
I just didn’t try to be too smart. I'm not sure what degree of smartness to include in those inherently unsound methods. I’ve updated my proposal according to your suggestion, because it is more consistent with the "RegExp subclasses don’t use those properties" semantics.
Worse: They return answers bound to the last match made with a direct instance of RegExp.
I just didn’t try to be too smart. I'm not sure what degree of smartness to include in those inherently unsound methods. I’ve updated my proposal according to your suggestion, because it is more consistent with the "RegExp subclasses don’t use those properties" semantics. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 10, 2015
Contributor
V8 seems to store the non-standard properties in current Realm whereas SpiderMonkey uses the realm of the RegExp object instance.
I’ve spontaneously specced what appears to be SpiderMonkey’s semantics, because it is what made sense to me. I would change for the other way if there is consensus for that.
I’ve spontaneously specced what appears to be SpiderMonkey’s semantics, because it is what made sense to me. I would change for the other way if there is consensus for that. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 10, 2015
Contributor
Why would it make more sense to save it in the object's realm?
Because all the real work is made by the [[RegExpMatcher]] internal slot of that object, and because the lastIndex property of that object is updated. It was a philosophical reason, not a practical one.
crossOriginRE.exec(...);
RegExp.lastParen === ???
The "current realm" is the realm of crossOriginRE.exec (rather than the realm of its caller), which coincides with the realm of crossOriginRE, so that the code is broken anyway. It would make a difference if you try RegExp.prototype.exec.call(crossOriginRE, ...) instead.
Because all the real work is made by the [[RegExpMatcher]] internal slot of that object, and because the
The "current realm" is the realm of |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 11, 2015
Contributor
Re: behaviour on RegExp subclass constructors.
@allenwb wrote:
I think it would be better if they answered undefined in those cases (I could also argue for throwing a TypeError, but I prefer undefined so that, from a subclass class perspective, it looks more like those methods simply aren't there.)
After reflection, I think that throwing a TypeError rather than returning undefined would be both more consistent with the rest of the language (arguments.caller in strict-mode functions, Regexp#sticky on non-regexpes, etc.), and more useful for programmers (strong signal that the feature is not available, preventing latent bugs).
Other opinions?
|
Re: behaviour on RegExp subclass constructors.
After reflection, I think that throwing a TypeError rather than returning Other opinions? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 12, 2015
Contributor
Re: RegExp.prototype.exec.call(otherFrame_regexp)
V8 seems to store the non-standard properties in current Realm whereas SpiderMonkey uses the realm of the RegExp object instance.
A third option would be to disable the update of non-standard properties when Realms don't match, the same way it is disabled for instances of proper subclasses of RegExp. What might make sense: if somehow I manage to hack regexpes and/or RegExp.prototype.exec in my frame so that it avoids to litter the RegExp constructor, I might not want my hack to be defeated when another frame plays with my regexp.
As a side note, I think that it might be worthwhile to also neuter RegExp.prototype.compile for proper subclasses of RegExp: if the subclass constructor do non-trivial things, RegExp.prototype.compile might not do what you want.
I've updated my draft in that sense. I will eventually write a motivation for disabling legacy unsound features for non-direct instances of RegExp.
|
Re: RegExp.prototype.exec.call(otherFrame_regexp)
A third option would be to disable the update of non-standard properties when Realms don't match, the same way it is disabled for instances of proper subclasses of RegExp. What might make sense: if somehow I manage to hack regexpes and/or RegExp.prototype.exec in my frame so that it avoids to litter the RegExp constructor, I might not want my hack to be defeated when another frame plays with my regexp. As a side note, I think that it might be worthwhile to also neuter I've updated my draft in that sense. I will eventually write a motivation for disabling legacy unsound features for non-direct instances of RegExp. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 12, 2015
Contributor
I will eventually write a motivation for disabling legacy unsound features for non-direct instances of RegExp.
Done.
Done. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
goyakin
Nov 13, 2015
Member
The proposal diverges from browser implementations in a couple of points and, I think, should be updated to match them.
The properties aren't enumerable in the proposal, whereas most of them actually are on Chrome, Firefox, and Edge. These enumerable properties are:
inputlastMatchlastParenleftContextrightContext$1, ...,$9
The alias properties (e.g., $_ and $&) match the browsers though.
Another thing is that the setter of input (and of its alias $_) is undefined in the proposal. However, input is modifiable on Chrome, Firefox, and Edge, and it returns the string version of the value it is assigned to, e.g.
RegExp.input = 123
// RegExp.input === '123'
There seems to be disagreement among browsers regarding the setters of other properties. Firefox matches the proposal and has them as undefined. Chrome and Edge, on the other hand, implement them as no-op operations. Given that an exception would be thrown when these properties are assigned to in strict mode, I think it would be good to have them as no-op operations in the spec.
|
The proposal diverges from browser implementations in a couple of points and, I think, should be updated to match them. The properties aren't enumerable in the proposal, whereas most of them actually are on Chrome, Firefox, and Edge. These enumerable properties are:
The alias properties (e.g., Another thing is that the setter of
There seems to be disagreement among browsers regarding the setters of other properties. Firefox matches the proposal and has them as |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 14, 2015
Contributor
The properties aren't enumerable in the proposal, whereas most of them actually are on Chrome, Firefox, and Edge.
corrected.
Another thing is that the setter of input (and of its alias $_) is undefined in the proposal. However, input is modifiable on Chrome, Firefox, and Edge, and it returns the string version of the value it is assigned to,
corrected.
There seems to be disagreement among browsers regarding the setters of other properties. Firefox matches the proposal and has them as undefined. Chrome and Edge, on the other hand, implement them as no-op operations. Given that an exception would be thrown when these properties are assigned to in strict mode, I think it would be good to have them as no-op operations in the spec.
I disagree: silent failure is a bad thing; throwing an error (as do Firefox and Safari in strict mode) is better for preventing latent bugs.
corrected.
corrected.
I disagree: silent failure is a bad thing; throwing an error (as do Firefox and Safari in strict mode) is better for preventing latent bugs. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
caitp
Nov 18, 2015
Contributor
https://codereview.chromium.org/1416093006 seems to have made these accessors non-enumerable, so at least in Chrome, for the time being, they match the original proposal (re: enumerability)
If it doesn't break anything, it might be considered safe to change it to make them non-enumerable (does yangguo have a github account? I'd ping him for comment if I knew)
|
https://codereview.chromium.org/1416093006 seems to have made these accessors non-enumerable, so at least in Chrome, for the time being, they match the original proposal (re: enumerability) If it doesn't break anything, it might be considered safe to change it to make them non-enumerable (does yangguo have a github account? I'd ping him for comment if I knew) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Nov 18, 2015
Contributor
https://codereview.chromium.org/1416093006 seems to have made these accessors non-enumerable, so at least in Chrome, for the time being, they match the original proposal (re: enumerability)
If it doesn't break anything, it might be considered safe to change it to make them non-enumerable
Noted for future consideration when it'll reach stable release: tc39/proposal-regexp-legacy-features#1
Noted for future consideration when it'll reach stable release: tc39/proposal-regexp-legacy-features#1 |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
claudepache
Dec 22, 2015
Contributor
As my draft goes beyond the bare description of legacy RegExp properties, by defining a mechanism that deactivate them for e.g. subclasses, I suppose it should be pushed as proposal.
|
As my draft goes beyond the bare description of legacy RegExp properties, by defining a mechanism that deactivate them for e.g. subclasses, I suppose it should be pushed as proposal. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Feb 11, 2016
At whatwg/javascript#29 (comment) I write
The RegExp statics are mutable global state and a global communications channel.
https://github.com/google/caja/wiki/RegexpsLeakMatchGlobally
Their existence must not be normative, so that a system (like SES's repairES5) that removes them produces a state that is still considered a conformant ES implementation.
Regarding the RegExp statics, the important issue is
http://wiki.ecmascript.org/doku.php?id=conventions:make_non-standard_properties_configurable
which we should propose and make normative. I just verified that Chrome 48.0.2564.103 ships with these statics configurable, and actually deletable, demonstrating that it is web compatible to require this.
erights
commented
Feb 11, 2016
|
At whatwg/javascript#29 (comment) I write The RegExp statics are mutable global state and a global communications channel. Regarding the RegExp statics, the important issue is |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Any objection to standardizing the Chrome 48+ semantics? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
erights
Feb 12, 2016
At I write at whatwg/javascript#29 (comment)
I am certainly against specifying these as normative. As I say above:
Their existence must not be normative, so that a system (like SES's repairES5) that removes them
produces a state that is still considered a conformant ES implementation.
However, in light of Claude's point at tc39/proposal-regexp-legacy-features#3 and my reply at tc39/proposal-regexp-legacy-features#3 (comment) , if inter-realm leakage were also banned, I would see benefit to these being normative-optional rather than continuing as unspecified. i.e., where there is no normative requirement that they be present, but there is a normative requirement that, if they are present, then
- they must actually be deletable. If deleted, they do not magically reappear.
- they must not leak info cross realm.
With these, yes, I think I would support normative-optional.
erights
commented
Feb 12, 2016
|
At I write at whatwg/javascript#29 (comment) I am certainly against specifying these as normative. As I say above: Their existence must not be normative, so that a system (like SES's repairES5) that removes them
With these, yes, I think I would support normative-optional. |
bterlson
added
the
normative change
label
Feb 18, 2016
claudepache
referenced this issue
Feb 22, 2016
Closed
Should legacy RegExp features be disabled for subclasses and/or cross-realm? #2
domenic
added
the
web reality
label
Jul 28, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Since this is now a stage 3 proposal, should this issue be closed? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
annevk
Mar 20, 2018
Contributor
Maybe just close it as part of the commit that merges that into master? That'd be cleaner.
|
Maybe just close it as part of the commit that merges that into master? That'd be cleaner. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Sounds good, although that's an unspecified time in the future :-) |
ljharb
added
the
proposal
label
Mar 21, 2018
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
|
Closing the issue SGTM |
claudepache commentedOct 28, 2015
•
edited
Update: Link to proposal: https://github.com/tc39/proposal-regexp-legacy-features
All web browsers expose non-standard properties on the RegExp constructor, that are updated each time a match is done.
The list of properties that are exposed by all common browsers are the following. They typically expose other properties not mentioned here, but not shared by all implementations:
and (in pairs, where the first and the second one return the same value):
Some observations:
It would be useful to specify them. I think that the following should be required:
RegExp#execmethod. This is useful if we want, in some future, separate the part ofRegExp#execthat does the real work, and the part that does stateful updates on various objects.