Attribute to show the regular expression usage. #191

yoshisatoyanagisawa · 2023-10-06T01:35:08Z

URLPattern are used in several proposals now:

URLPattern supports regular expressions, but it is concerned to execute arbitrary user-provided regular expressions in trusted area in the browser due to security reasons. It is suggested to prohibit regular expressions in some APIs using URLPattern:

To avoid unexpected regular expression use in URLPattern, a new attribute to show the regular expression has been considered in #182 (comment). With this flag, web developers can understand unexpected regular expression usage by themselves.

Alternative solution for this is having an option to raise when the regular expression is used in the pattern because denying regular expression depends on APIs using URLPattern. Some APIs may allow it and the other may not. We should not have such a thing in the regular path.

Jamesernator · 2023-10-09T04:33:09Z

The concern here is just ReDoS right? It seems unfortunate to limit patterns like /(one|two)/ in places like service workers, would it be possible to instead of restricting all regexps to instead only limit those that contain potential backtracking?

domenic · 2023-10-09T04:54:39Z

That's not the concern, actually. The issue is that several of these features run in browser processes (usually the network process) which don't have a JavaScript engine, and thus don't have a regular expression engine. Bringing regexp support to the network process would require bringing a whole copy of V8, which as you can imagine, is not trivial.

(I'll also note that browser security doesn't make a distinction of the sort you're discussing, between "dangerous" untrusted input and "non-dangerous" untrusted input. Any untrusted input at all falls afoul of the rule of 2, at least in Chromium.)

annevk · 2023-10-09T06:45:30Z

This concern applies to server operators too, e.g., with compression dictionaries. This again leads me to think that having some kind of subset would be a good idea.

It seems like another way of stating the constraint here is that there's no interest in implementing a regular expression engine in a safe language.

Jamesernator · 2023-10-10T00:44:49Z

in implementing a regular expression engine in a safe language.

You mean in an unsafe language right?

Any untrusted input at all falls afoul of the rule of 2, at least in Chromium.)

The article you've linked mentions that Chromium does have a trusted regex library, RE2,. Would it be viable in chromium to limit regexpes in URLPatterns to some common subset that is shared with RE2? From the RE2 docs it does seem like a fairly large subset should be viable.

@annevk I presume Firefox doesn't have this limitation as any regex engine for this purpose could just be written in Rust right?

domenic · 2023-10-10T00:58:24Z

I can't say for certain on behalf of the involved teams, but my suspicion is we're not interested in exposing two dialects of regular expressions to the web platform (the standardized JS one, and the non-standardized RE2 one).

annevk · 2023-10-10T06:42:34Z

@Jamesernator I meant what I wrote. If there was interest to redo a web platform regular expression engine in a safe language, you'd meet the rule of 2.

Last I checked SpiderMonkey uses V8's regular expression engine so they're in the same boat. WebKit has its own, but also unsafe.

And yeah, exposing two separate implementations seems very risky and not long term tenable.

jeremyroman · 2023-10-10T22:20:57Z

It probably could be done (if only because the regexp use case here is more limited, since URLs can reasonably be expected to be shorter than larger haystacks). But in the short term it doesn't seem likely that any implementer (let alone all) is interested in carving out a more tailored subset of ECMAScript regexes, and then specifying, documenting, and implementing that in a safe language.

At the moment just allowing the things outside of a regexp group (i.e., fixed parts, ? and * wildcards) seems likely to be the most pragmatic compromise, even though a larger subset is in principle possible.

domenic · 2023-10-27T01:25:47Z

Anyone want to bikeshed the name here? I think we should use "RegExp" (instead of e.g. Regex) since that's what JavaScript uses. With that as a base, some ideas so far:

hasRegExpTokens / hasRegExpGroups
containsRegExpTokens / containsRegExpGroups
requiresRegExp
usesRegExp

jeremyroman · 2023-11-01T19:40:47Z

Slight preference for referring to the parsed state rather than tokens in the input, but I could live with any of those. I think hasRegExpGroups is my narrow favorite, with requiresRegExp as a runner-up.

yoshisatoyanagisawa mentioned this issue Oct 6, 2023

Add a "using URLPattern in other APIs" section #182

Open

domenic added the addition/proposal New features or enhancements label Oct 18, 2023

annevk mentioned this issue Oct 26, 2023

URLPattern usage WICG/compression-dictionary-transport#52

Open

jeremyroman mentioned this issue Nov 21, 2023

Add a section on other specs integrating with URLPattern #199

Merged

5 tasks

jeremyroman closed this as completed Nov 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attribute to show the regular expression usage. #191

Attribute to show the regular expression usage. #191

yoshisatoyanagisawa commented Oct 6, 2023

Jamesernator commented Oct 9, 2023 •

edited

domenic commented Oct 9, 2023

annevk commented Oct 9, 2023

Jamesernator commented Oct 10, 2023 •

edited

domenic commented Oct 10, 2023

annevk commented Oct 10, 2023

jeremyroman commented Oct 10, 2023

domenic commented Oct 27, 2023

jeremyroman commented Nov 1, 2023

Attribute to show the regular expression usage. #191

Attribute to show the regular expression usage. #191

Comments

yoshisatoyanagisawa commented Oct 6, 2023

Jamesernator commented Oct 9, 2023 • edited

domenic commented Oct 9, 2023

annevk commented Oct 9, 2023

Jamesernator commented Oct 10, 2023 • edited

domenic commented Oct 10, 2023

annevk commented Oct 10, 2023

jeremyroman commented Oct 10, 2023

domenic commented Oct 27, 2023

jeremyroman commented Nov 1, 2023

Jamesernator commented Oct 9, 2023 •

edited

Jamesernator commented Oct 10, 2023 •

edited