Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use v regexp flag instead of u #178

Closed
nicolo-ribaudo opened this issue Jul 13, 2023 · 1 comment · Fixed by #188
Closed

Use v regexp flag instead of u #178

nicolo-ribaudo opened this issue Jul 13, 2023 · 1 comment · Fixed by #188
Assignees

Comments

@nicolo-ribaudo
Copy link

The v flags add more support to regular expression features:

The HTML <input>'s pattern attribute has also been recently updated to use v instead of u (https://html.spec.whatwg.org/#compiled-pattern-regular-expression).

@bathos
Copy link

bathos commented Jul 13, 2023

The “set operations” would be useful. I‘m pretty sure the second never will be, though, because ...

 new URLPattern({ pathname: "/🍒" }).test("https://bar.com/🍒"); true new URLPattern({ pathname: "/(\p{Emoji})" }).test("https://bar.com/🍒"); false new URLPattern({ pathname: "/(\p{Ll})" }).test("https://bar.com/a"); true new URLPattern({ pathname: "/(\p{Ll})" }).test("https://bar.com/é"); false new URLPattern({ pathname: "/%F0%9F%8D%92" }).test("https://bar.com/🍒"); true

— the illusion that non-ASCII is matchable is limited to literal input: it converts non-ASCII input to percent encoded UTF-8, but regexp pattern components aren’t likewise “translated”, so expressing things like \p{RGI_Emoji_Sequence} seems to always require input that’s similar to what transpilers might produce today for engines that don’t support multi- or single- codepoint properties at all.

@sisidovski sisidovski self-assigned this Sep 13, 2023
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Nov 8, 2023
This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Nov 8, 2023
This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4861342
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Commit-Queue: Shunya Shishido <sisidovski@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1221823}
chromium-wpt-export-bot pushed a commit to web-platform-tests/wpt that referenced this issue Nov 8, 2023
This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4861342
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Commit-Queue: Shunya Shishido <sisidovski@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1221823}
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Nov 22, 2023
…attern, a=testonly

Automatic update from web-platform-tests
Use kUnicodeSets (v regexp flag) in URLPattern

This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4861342
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Commit-Queue: Shunya Shishido <sisidovski@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1221823}

--

wpt-commits: 3ce3e9794fcd97ff24506f5c5325f91fc00ef79c
wpt-pr: 43014
moz-v2v-gh pushed a commit to mozilla/gecko-dev that referenced this issue Nov 22, 2023
…attern, a=testonly

Automatic update from web-platform-tests
Use kUnicodeSets (v regexp flag) in URLPattern

This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4861342
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Commit-Queue: Shunya Shishido <sisidovski@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1221823}

--

wpt-commits: 3ce3e9794fcd97ff24506f5c5325f91fc00ef79c
wpt-pr: 43014
vinnydiehl pushed a commit to vinnydiehl/mozilla-unified that referenced this issue Nov 24, 2023
…attern, a=testonly

Automatic update from web-platform-tests
Use kUnicodeSets (v regexp flag) in URLPattern

This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4861342
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Commit-Queue: Shunya Shishido <sisidovski@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1221823}

--

wpt-commits: 3ce3e9794fcd97ff24506f5c5325f91fc00ef79c
wpt-pr: 43014
vinnydiehl pushed a commit to vinnydiehl/mozilla-unified that referenced this issue Nov 24, 2023
…attern, a=testonly

Automatic update from web-platform-tests
Use kUnicodeSets (v regexp flag) in URLPattern

This CL follows the recent spec update in [1]. After this CL, the
regular expression works with the unicodeSet mode, which allows the API
to interpret set notations, multi-codepoint properties etc.

Actually, this CL ends up only adding the support for set notations. At
the moment the API doesn't accept unicode character class escape (\p{},
\P{}) [2], thus we wouldn't add the multi-codepoint match functionarity.

There are some incompatibility between "u" and "v", some patterns are
privously valid but now errors. However, from UMA the impact looks very
limited, it's only around 0.3% of the total constructor calls.

So this CL changes the default flag to "v". Also add a kill switch.

[1] whatwg/urlpattern#178
[2] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Regular_expressions/Unicode_character_class_escape

Change-Id: I14cc3420d57cca44c0c25867d05802a8a666cd8c
Bug: 1482263
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/4861342
Reviewed-by: Jeremy Roman <jbroman@chromium.org>
Commit-Queue: Shunya Shishido <sisidovski@chromium.org>
Cr-Commit-Position: refs/heads/main@{#1221823}

--

wpt-commits: 3ce3e9794fcd97ff24506f5c5325f91fc00ef79c
wpt-pr: 43014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging a pull request may close this issue.

3 participants