Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Normative] Add RegExp named capture groups #1027

Merged
merged 1 commit into from
Jan 23, 2018

Conversation

mathiasbynens
Copy link
Member

@mathiasbynens mathiasbynens commented Nov 11, 2017

@littledan asked me to prepare the PR for his proposal adding named capture groups to ECMAScript regular expressions.

Proposal repo: https://github.com/tc39/proposal-regexp-named-groups

@littledan, I’ve added you as the commit author.

1. Let the enclosed substring be _groupName_.
1. Let _capture_ be ? Get(_namedCaptures_, _groupName_).
1. If _capture_ is *undefined*, replace the text through `>` with the empty string.
1. Otherwise, replace the text through this following `>` with ? ToString(_capture_).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should probably be worded a bit differently to ensure that ToString is only called once on capture - otherwise this could be interpreted as "call ToString for each replacement", which would be observable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about:

Otherwise, let _replacement_ be ? ToString(_capture_) and replace the text through this following `>` with _replacement_.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems completely unambiguous, LGTM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@littledan littledan Nov 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand how this change relates to what @ljharb is getting at. Get and ToString are called multiple times, before and after this editorial change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it does penalize the common case; if done eagerly, we'd need to iterate all keys on the groups object, while the replacement string may only reference a subset of them.

And since I suppose we wouldn't want to modify the given groups object, we'd have to allocate a new object somewhere to store the ToString normalized group values, even in the lazy case.

I didn't get the feeling that anyone was saying "we missed this stage 2 concern".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment about it “being a little late” seemed like that.

At any rate, allocating a new unobservanle object somewhere to store a string of Type(replacement) is not String, when in the common case it would be, seems like it would be both a cheap and a rare operation. However my concern is not for performance, which can always be improved upon later; but rather for observability, which can not be reduced later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there was a lack of review. These semantics were thought out, put out for review for 8 months ago, and now had a thorough review by two implementers. I was aware when writing the spec text that you'd get multiple observable reads. As @schuay raises, it's not clear what the alternative would be, and the downside of the current approach does not seem so high.

RegExps are extremely observable in their behavior in a way which I don't see much of a use case for. I raised this as a concern in late 2015, proposing a more minimal RegExp subclassing mechanism, which would've been much easier to implement and with fewer performance cliffs. The proposal was rejected by the committee. We have a specification with very high observability due to that decision; it will impact many new RegExp features going forward no matter whether we decide to do one or two Get/ToString calls. If we did caching, it would also be observable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point :-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misunderstanding noted, and reverted to the original spec text.

@mathiasbynens mathiasbynens added the pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. label Nov 11, 2017
Copy link
Member

@littledan littledan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though I'd appreciate other reviewers also took a look.

@anba
Copy link
Contributor

anba commented Dec 8, 2017

I think it'd be nice to resolve tc39/proposal-regexp-named-groups#34 before merging this PR, because both implementations (JSC and V8) don't implement this part correctly according to the current spec proposal (both implementations print --$<foo>-- for the test case in that issue, but the current spec proposal requires to return --bar--).

@mathiasbynens
Copy link
Member Author

tc39/proposal-regexp-named-groups#34 has been resolved in tc39/proposal-regexp-named-groups#40. I’ve updated this patch accordingly.

@mathiasbynens
Copy link
Member Author

Test262 tests for the recent groups change have landed: tc39/test262#1376

Copy link
Contributor

@schuay schuay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks :)

Copy link
Member

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but i'm not familiar enough with grammar stuff to review that part.

spec.html Outdated
1. If _s_ is *undefined*, return _c_(_x_).
1. Let _e_ be _x_'s _endIndex_.
1. Let _len_ be the number of elements in _s_.
1. Let _f_ be _e_+_len_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spaces around the + for clarity?

spec.html Outdated
1. Let _e_ be _x_'s _endIndex_.
1. Let _len_ be the number of elements in _s_.
1. Let _f_ be _e_+_len_.
1. If _f_&gt;_InputLength_, return ~failure~.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also spaces around the >?

spec.html Outdated
1. Let _len_ be the number of elements in _s_.
1. Let _f_ be _e_+_len_.
1. If _f_&gt;_InputLength_, return ~failure~.
1. If there exists an integer _i_ between 0 (inclusive) and _len_ (exclusive) such that Canonicalize(_s_[_i_]) is not the same character value as Canonicalize(_Input_[_e_+_i_]), return ~failure~.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and the e + i?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but keep in mind that this is now inconsistent with other similar parts of the RegExp section. There is an open issue to fix this throughout the spec: #925

spec.html Outdated
@@ -28558,6 +28558,8 @@ <h1>Runtime Semantics: GetSubstitution( _matched_, _str_, _position_, _captures_
1. Assert: Type(_replacement_) is String.
1. Let _tailPos_ be _position_ + _matchLength_.
1. Let _m_ be the number of elements in _captures_.
1. If _namedCaptures_ is not *undefined*,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: block form of If ends in a ", then":

spec.html Outdated
<emu-alg>
1. If _namedCaptures_ is *undefined*, the replacement text is the String `"$&lt;"`.
1. Otherwise,
1. Scan until the next `>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably could use the new single unicode codepoint convention here (U+XXXX CHAR NAME)

@bterlson
Copy link
Member

I think this is ready to merge. I only found a few nits. The grammar all looks good, editorial conventions are good, merges and builds cleanly. Nice work!!

I had a couple minor nits but I'll pull by EOD and just fix them in a subsequent commit unless you want to get to it before me :)

@ljharb
Copy link
Member

ljharb commented Jan 17, 2018

@mathiasbynens
Copy link
Member Author

mathiasbynens commented Jan 22, 2018

@bterlson Thanks for the review! Nits addressed.

@ljharb I think it’s officially stage 4 once the PR gets merged.

@littledan
Copy link
Member

emoji kazoo react

@ljharb ljharb removed the pending stage 4 This proposal has not yet achieved stage 4, but may otherwise be ready to merge. label Jan 23, 2018
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Aug 27, 2019
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Sep 17, 2019
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Dec 13, 2019
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jan 7, 2020
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jan 25, 2020
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Feb 5, 2020
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Feb 5, 2020
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jun 15, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jun 23, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 11, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 18, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Jul 24, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Aug 17, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Sep 14, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Sep 24, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
jmdyck added a commit to jmdyck/ecma262 that referenced this pull request Sep 29, 2021
Specifically, add [N] parameter to
    CharacterClass
    ClassRanges
    NonemptyClassRanges
    NonemptyClassRangesNoDash
    ClassAtom

These were implied when commit 95ec0c6 (of PR tc39#1027)...

- added [?N] to RHS occurrences of CharacterClass
  without explicitly adding [N] to the LHS occurrence CharacterClass; and

- added [N] to the LHS occurrence of ClassAtomNoDash (in Annex B)
  without adding [?N] to any RHS occurrence.

This commit propagates [N] across that gap.

(See issue tc39#1081.)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants