Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normative: add RegExp lookbehind to annex-B #1102

Merged
merged 1 commit into from
May 14, 2018

Conversation

mysticatea
Copy link
Contributor

@mysticatea mysticatea commented Feb 16, 2018

This PR adds the syntax of RegExp Lookbehind Assertions into B.1.4 Regular Expressions Patterns section.

Since #1029 didn't look to update the Annex-B section but I think it's needed (#1029 (comment)).

I apologize if it's intentional.

@mathiasbynens
Copy link
Member

@mathiasbynens
Copy link
Member

This indeed looks like an oversight to me.

spec.html Outdated
@@ -40918,7 +40922,7 @@ <h1>Pattern Semantics</h1>
1. Return _m_.
</emu-alg>

<p>Assertion (<emu-xref href="#sec-assertion"></emu-xref>) evaluation rules for the <emu-grammar>Assertion :: `(` `?` `=` Disjunction `)`</emu-grammar> and <emu-grammar>Assertion :: `(` `?` `!` Disjunction `)`</emu-grammar> productions are also used for the |QuantifiableAssertion| productions, but with |QuantifiableAssertion| substituted for |Assertion|.</p>
<p>Assertion (<emu-xref href="#sec-assertion"></emu-xref>) evaluation rules for the <emu-grammar>Assertion :: `(` `?` `=` Disjunction `)`</emu-grammar>, <emu-grammar>Assertion :: `(` `?` `!` Disjunction `)`</emu-grammar>, <emu-grammar>Assertion :: `(` `?` `&lt;=` Disjunction `)`</emu-grammar>, and <emu-grammar>Assertion :: `(` `?` `&lt;!` Disjunction `)`</emu-grammar> productions are also used for the |QuantifiableAssertion| productions, but with |QuantifiableAssertion| substituted for |Assertion|.</p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually want to enable lookbehind in QuantifiableAssertion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make sure to have a test262 test one way or the other.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: I confirmed it on Chrome 65 since Annex B is the section for browser compatibility. Lookbehind assertions are quantifiable on the Chrome.

Copy link
Member

@littledan littledan Feb 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, though I think it's still within our ability to make this choice one way or another: I bet there would be no web compatibility issues with banning this particular construct.

cc @hashseed @schuay

@claudepache
Copy link
Contributor

We have to make a conscious decision between:

(A) be consistent with other assertions of the form (?...) wrt QuantifiableAssertion; or,
(B) limit QuantifiableAssertion to what is strictly needed for legacy reason.

This PR apparently chose (A). I would prefer (B).

(Also, because a choice is needed, this PR is not ”editorial”.)

@mysticatea mysticatea changed the title Editorial: add RegExp lookbehind to annex-B Normative: add RegExp lookbehind to annex-B Feb 16, 2018
@hashseed
Copy link

hashseed commented Feb 16, 2018

So if I understand correctly, the question is whether to allow something like /(?<=a){2,3}b/.

I honestly don't see a reason why this should be allowed since these assertions do not consume any characters, and therefore seem rather pointless.

Btw. if we decide for (B), I will need to fix V8.

spec.html Outdated
[~U] QuantifiableAssertion[?N]

QuantifiableAssertion[N] ::
`(` `?` `=` Disjunction[~U, ?N] `)`
`(` `?` `!` Disjunction[~U, ?N] `)`
`(` `?` `&lt;=` Disjunction[?U, ?N] `)`
`(` `?` `&lt;!` Disjunction[?U, ?N] `)`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an error to use '?U' here, because QuantifiableAssertion doesn't have a 'U' parameter.

@littledan
Copy link
Member

OK, seems like we're pretty unanimous in this thread so far that QuantifiableAssertion should not support positive or negative lookbehinds. @mysticatea would you be up for writing a new patch with that change, and then we can discuss it at the March 2018 TC39 meeting to confirm broader consensus?

@ljharb ljharb added needs consensus This needs committee consensus before it can be eligible to be merged. spec bug web reality editorial change labels Feb 16, 2018
@mysticatea mysticatea changed the title Normative: add RegExp lookbehind to annex-B Editorial: add RegExp lookbehind to annex-B Feb 17, 2018
@mysticatea
Copy link
Contributor Author

mysticatea commented Feb 17, 2018

I updated this PR.
I have kept Editorial: prefix since I see editorial change label.

@littledan littledan added normative change Affects behavior required to correctly evaluate some ECMAScript source text and removed editorial change labels Feb 17, 2018
mysticatea added a commit to mysticatea/acorn that referenced this pull request Feb 17, 2018
@mysticatea mysticatea changed the title Editorial: add RegExp lookbehind to annex-B Normative: add RegExp lookbehind to annex-B Feb 17, 2018
@hashseed
Copy link

FWIW we have this comment in V8 implemented back in January 2016:

// With /u, lookarounds are not quantifiable.

I guess the change would be to

  • disallow lookbehinds for quantifiable assertions
  • disallow lookaheads for quantifiable assertions in unicode mode
  • allow lookahead for quantifiable assertions in non-unicode mode

@littledan
Copy link
Member

littledan commented Feb 19, 2018

@hashseed I think that's what the current patch does. The grammar here (EDIT: Namely, in Annex B section 1.4) only is relevant on non-Unicode RegExps. From the introduction to the section:

However, none of these extensions change the syntax of Unicode patterns recognized when parsing with the [U] parameter present on the goal symbol.

@bterlson and I were just puzzling over what those [?U] parameters were doing all over the Annex B grammar. Why should we ever be passing around a true parameter? Seems like some cases went for [~U] and others for [?U]. Anyway, a topic for another possible cleanup issue.

@jmdyck
Copy link
Collaborator

jmdyck commented Feb 19, 2018

I think that's what the current patch does.

I think all three of @hashseed's bullets are accomplished in the current spec. This patch does something else.

@hashseed
Copy link

@littledan
Copy link
Member

@jmdyck How does the current spec support lookbehind at all when taking the Annex B grammar into account?

@jmdyck
Copy link
Collaborator

jmdyck commented Feb 20, 2018

The current spec doesn't, this patch does. (Allowing lookbehinds wasn't one of @hashseed's bullets.)

@littledan littledan added the needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 label Feb 20, 2018
@littledan
Copy link
Member

@jmdyck I think we actually do want to enable lookbehinds, just not as a quantifiable assertion. Given that goal, do you think this patch seems good?

@jmdyck
Copy link
Collaborator

jmdyck commented Feb 21, 2018

Yes, the patch is necessary. 21.2.1 has lookbehinds, so annex B must have them too.

littledan added a commit to littledan/test262 that referenced this pull request Feb 26, 2018
- Tests that lookahead and lookbehind are not extended to
  QuantifiableAssertion, as in tc39/ecma262#1102
- Additional tests verifying some more combinations of cases for
  QuantifiableAssertion being invalid in Unicode mode.

Based on the tests in https://chromium-review.googlesource.com/c/v8/v8/+/926102

These tests pass on V8 (if the throw for early errors is removed to
work around a V8 issue where RegExps don't have early errors).
@littledan littledan added has test262 tests and removed needs test262 tests The proposal should specify how to test an implementation. Ideally via github.com/tc39/test262 labels Feb 26, 2018
@littledan
Copy link
Member

See tests at tc39/test262#1456

@littledan
Copy link
Member

littledan commented Feb 26, 2018

At this point, we have

  • Spec text, written by someone who I verified signed the IP form (thanks @mysticatea)
  • Test262 tests (thanks @leobalter for the review)
  • A full implementation in V8 (thanks @hashseed)

Do we have consensus on this normative change? I've added it to the March 2018 meeting agenda so we can confirm consensus there if necessary.

@ljharb ljharb added has consensus This has committee consensus. and removed needs consensus This needs committee consensus before it can be eligible to be merged. labels Mar 20, 2018
@bterlson bterlson merged commit e5b6605 into tc39:master May 14, 2018
@bterlson
Copy link
Member

Thanks for this PR, and sorry for the delay in getting it pulled in!

@mysticatea mysticatea deleted the annex-b-regexp branch June 16, 2018 14:45
claudepache added a commit to claudepache/ecma262 that referenced this pull request Aug 21, 2019
…rn evaluate semantics in annex b

Bugfix: When speccing lookbehind assertions (PR tc39#1102), a `direction` parameter were added, in particular to the `evaluate` semantics of Atom productions and to the  CharacterSetMatcher abstract operation. The ExtendedAtom productions found in Annex B were forgotten to be amended in the same way.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has consensus This has committee consensus. has test262 tests normative change Affects behavior required to correctly evaluate some ECMAScript source text spec bug web reality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants