Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attribute to show the regular expression usage. #191

Closed
yoshisatoyanagisawa opened this issue Oct 6, 2023 · 9 comments
Closed

Attribute to show the regular expression usage. #191

yoshisatoyanagisawa opened this issue Oct 6, 2023 · 9 comments
Labels
addition/proposal New features or enhancements

Comments

@yoshisatoyanagisawa
Copy link
Contributor

URLPattern are used in several proposals now:

URLPattern supports regular expressions, but it is concerned to execute arbitrary user-provided regular expressions in trusted area in the browser due to security reasons. It is suggested to prohibit regular expressions in some APIs using URLPattern:

To avoid unexpected regular expression use in URLPattern, a new attribute to show the regular expression has been considered in #182 (comment). With this flag, web developers can understand unexpected regular expression usage by themselves.

Alternative solution for this is having an option to raise when the regular expression is used in the pattern because denying regular expression depends on APIs using URLPattern. Some APIs may allow it and the other may not. We should not have such a thing in the regular path.

@Jamesernator
Copy link

Jamesernator commented Oct 9, 2023

The concern here is just ReDoS right? It seems unfortunate to limit patterns like /(one|two)/ in places like service workers, would it be possible to instead of restricting all regexps to instead only limit those that contain potential backtracking?

@domenic
Copy link
Member

domenic commented Oct 9, 2023

That's not the concern, actually. The issue is that several of these features run in browser processes (usually the network process) which don't have a JavaScript engine, and thus don't have a regular expression engine. Bringing regexp support to the network process would require bringing a whole copy of V8, which as you can imagine, is not trivial.

(I'll also note that browser security doesn't make a distinction of the sort you're discussing, between "dangerous" untrusted input and "non-dangerous" untrusted input. Any untrusted input at all falls afoul of the rule of 2, at least in Chromium.)

@annevk
Copy link
Member

annevk commented Oct 9, 2023

This concern applies to server operators too, e.g., with compression dictionaries. This again leads me to think that having some kind of subset would be a good idea.

It seems like another way of stating the constraint here is that there's no interest in implementing a regular expression engine in a safe language.

@Jamesernator
Copy link

Jamesernator commented Oct 10, 2023

in implementing a regular expression engine in a safe language.

You mean in an unsafe language right?

Any untrusted input at all falls afoul of the rule of 2, at least in Chromium.)

The article you've linked mentions that Chromium does have a trusted regex library, RE2,. Would it be viable in chromium to limit regexpes in URLPatterns to some common subset that is shared with RE2? From the RE2 docs it does seem like a fairly large subset should be viable.

@annevk I presume Firefox doesn't have this limitation as any regex engine for this purpose could just be written in Rust right?

@domenic
Copy link
Member

domenic commented Oct 10, 2023

I can't say for certain on behalf of the involved teams, but my suspicion is we're not interested in exposing two dialects of regular expressions to the web platform (the standardized JS one, and the non-standardized RE2 one).

@annevk
Copy link
Member

annevk commented Oct 10, 2023

@Jamesernator I meant what I wrote. If there was interest to redo a web platform regular expression engine in a safe language, you'd meet the rule of 2.

Last I checked SpiderMonkey uses V8's regular expression engine so they're in the same boat. WebKit has its own, but also unsafe.

And yeah, exposing two separate implementations seems very risky and not long term tenable.

@jeremyroman
Copy link
Collaborator

It probably could be done (if only because the regexp use case here is more limited, since URLs can reasonably be expected to be shorter than larger haystacks). But in the short term it doesn't seem likely that any implementer (let alone all) is interested in carving out a more tailored subset of ECMAScript regexes, and then specifying, documenting, and implementing that in a safe language.

At the moment just allowing the things outside of a regexp group (i.e., fixed parts, ? and * wildcards) seems likely to be the most pragmatic compromise, even though a larger subset is in principle possible.

@domenic
Copy link
Member

domenic commented Oct 27, 2023

Anyone want to bikeshed the name here? I think we should use "RegExp" (instead of e.g. Regex) since that's what JavaScript uses. With that as a base, some ideas so far:

  • hasRegExpTokens / hasRegExpGroups
  • containsRegExpTokens / containsRegExpGroups
  • requiresRegExp
  • usesRegExp

@jeremyroman
Copy link
Collaborator

Slight preference for referring to the parsed state rather than tokens in the input, but I could live with any of those. I think hasRegExpGroups is my narrow favorite, with requiresRegExp as a runner-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements
Development

No branches or pull requests

5 participants