Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Template tag as an alternative, that includes x plus additional improvements #8

Open
slevithan opened this issue Jun 5, 2024 · 7 comments

Comments

@slevithan
Copy link

slevithan commented Jun 5, 2024

The limitations with the x proposal (no multiline literals, and some special chars must be escaped even within comments) seem to meaningfully reduce the value of x. Several others have expressed similar concerns in previous issues. Additionally, x mode would multiply the four existing ways to interpret escaped characters and parse regex syntax (Unicode-unaware mode, named capture mode, u mode, and v mode). The value of x as a standalone feature is also moderated by the existing ability to use a poor-man's x via:

new RegExp(`
  # comment
  ...
  # comment
`.replace(/\s+|#.*/g, ''));

So, given both the limitations and the existence of multiline raw template literals, I'm questioning the value of a flag x when there might be a better path via a standardized template tag.

A tag has been repeatedly discussed in the context of proposal https://github.com/tc39/proposal-regex-escaping, but I think it's more relevant as an alternative to x.

A template tag, in addition to offering context-aware escaping, could potentially offer always-on x semantics (thereby encouraging best practices and not multiplying the number of modes to interpret escapes), without the limitations of x as a flag. And it could offer a great opportunity to clean up and modernize some aspects of JS regexes, providing a clear new happy path without JS regex footguns and legacy or the need to continually opt-in to best practices (apart from using the tag). This could be considered similar in approach to flags u and v bundling multiple changes together to modernize and future-proof the syntax. And given that the stricter errors of those modes already offer fairly strong syntax future-proofing, the risks of needing another round of opt-in modernization after adding a tag would be relatively low likelihood, although I could expound on what the potential future changes might be that would bring this into question.

Leaving that aside for now, here is what I would like to see in a standardized template tag:

  • Raw strings, for standardized dynamic regexes without unreadable \\\\.
  • Context-aware, sandboxed, and atomized interpolation of escaped strings.
  • Context-aware, sandboxed, and atomized interpolation of RegExp instances that preserve their local flags.
  • Always-on flag x, with or without flag x being available to the RegExp constructor and regex literals (not mutually exclusive).
  • Always-on flag v.
  • Always-on flag n ("no auto capture" or "explicit capture" mode from .NET, PCRE, Perl, C++, XRegExp, etc.), with or without flag n being available to RegExp and regex literals (not mutually exclusive).

I recently created the regex package that does all of these things (among other features), partly to help start a discussion around this. I think the semantics of regex (minus the syntax extensions) could serve as a starting point for a tag proposal. If there is interest in this, as well as another champion, I would be happy to contribute/collaborate.

@erights
Copy link

erights commented Jun 5, 2024

Hi @slevithan , I am interested in this, but I cannot take the time to lead, or to write the spec text.

I am also interested in your "atomic groups" (?>...) at https://github.com/slevithan/regex-make?tab=readme-ov-file#atomic-groups to help avoid redos. Is this something that could be proposed as an extension to the builtin regexp syntax? What are pros and cons?

Attn @waldemarhorwat

@rbuckton
Copy link
Collaborator

rbuckton commented Jun 5, 2024

There is plenty of value in x as a flag and as a modifier, and there are plenty of languages that have an 'x' mode flag and modifier that don't have regular expression literals, so I think there is significant value in having this as more than just a template tag. That said, I would also like to support a template tag mechanism long term.

Hi @slevithan , I am interested in this, but I cannot take the time to lead, or to write the spec text.
I am also interested in your "atomic groups" (?>...) at https://github.com/slevithan/regex-make?tab=readme-ov-file#atomic-groups to help avoid redos. Is this something that could be proposed as an extension to the builtin regexp syntax? What are pros and cons?

I am already championing a proposal for this that is at Stage 1: https://github.com/tc39/proposal-regexp-atomic-operators. I plan to invest more time in x-mode, atomic operators, and buffer boundaries once modifiers gets to Stage 4.

@erights
Copy link

erights commented Jun 5, 2024

That said, I would also like to support a template tag mechanism long term.

Good to know. Thanks.

I am already championing a proposal for this that is at Stage 1: https://github.com/tc39/proposal-regexp-atomic-operators. I plan to invest more time in x-mode, atomic operators, and buffer boundaries once modifiers gets to Stage 4.

Awesome! I look forward to this. Is it just coincidence that you both chose (?>...) as the syntax, or is there some precedent you both have in mind?

@rbuckton
Copy link
Collaborator

rbuckton commented Jun 5, 2024

Long term, I would like to propose a template tag mechanism along with reintroducing prefix flags. Prefix flags was originally part of https://github.com/tc39/proposal-regexp-modifiers, but was cut.

Prefix flags ((?gimsxduv)) are useful for a mechanism that employs tagged templates, since you don't need to inject a call before or after the template to specify flags:

const re = RegExp`(?xu)
  # comment
  \p{LC}
`;

vs

const re = RegExp.tag("xu")`
  # comment
  \p{LC}
`;
// or
const re = RegExp`
  # comment
  \p{LC}
`("xu");

The version that the modifiers proposal originally had allowed prefix modifiers anywhere in the RegExp, though some implementations in other languages only allow them at the start of the RegExp.

There was a suggestion to use / in the tag:

const re = RegExp`/
  pattern
  /xu`;

But the upshot to using a prefix (?xu) is that it is consistent with other engines, and is often used in TextMate grammars, so consistency there would be very useful.

@rbuckton
Copy link
Collaborator

rbuckton commented Jun 5, 2024

That said, I would also like to support a template tag mechanism long term.

Good to know. Thanks.

I am already championing a proposal for this that is at Stage 1: https://github.com/tc39/proposal-regexp-atomic-operators. I plan to invest more time in x-mode, atomic operators, and buffer boundaries once modifiers gets to Stage 4.

Awesome! I look forward to this. Is it just coincidence that you both chose (?>...) as the syntax, or is there some precedent you both have in mind?

There is precedent in many RegExp grammars: https://rbuckton.github.io/regexp-features/features/non-backtracking-expressions.html

Maintaining this precedent helps developers port knowledge from other languages, as well as to reuse RegExp patterns defined in external files like in JSON or YAML, especially for editors using TextMate grammars, where those editors can be written in many different languages (Java, C#, C++, JS, etc.)

@rbuckton
Copy link
Collaborator

rbuckton commented Jun 5, 2024

I've opted to spread out the RegExp proposals since a single large proposal was originally rejected. Since x mode has value independent of a tagged template mechanism, I chose to avoid tying the two proposals together. Instead, I'm waiting until I have cleared some of my current workload before bringing more RegExp proposals to committee.

@slevithan
Copy link
Author

There is plenty of value in x as a flag and as a modifier

I agree that there is value despite the limitations and the existence of multiline template strings. As I mentioned in my initial comment, a tag with always-on xv rules would not be mutually exclusive with x as a flag/modifier. My case though is that:

  1. A tag has additional advantages (which is not a new suggestion).
  2. It could potentially offer a path to x-mode insignificant whitespace and comments without or before the addition of a flag/modifier x.

That second point might be meaningful if flag x stalls for whatever reason, or if there is new energy behind a standardized tag that accelerates it.

and there are plenty of languages that have an 'x' mode flag and modifier that don't have regular expression literals, so I think there is significant value in having this as more than just a template tag.

Yes, but no other flavors require escaping special characters even within comments. That's NOT fatal to x, but it WILL get surprised reactions from people who aren't aware of TC39's motivations behind it for years to come. Always-on x coming from a tag can potentially avoid this by stripping comments before passing to RegExp.

That said, I would also like to support a template tag mechanism long term.

Awesome!

[...] along with reintroducing prefix flags. Prefix flags was originally part of https://github.com/tc39/proposal-regexp-modifiers, but was cut.

Prefix flags ((?gimsxduv)) are useful for a mechanism that employs tagged templates, since you don't need to inject a call before or after the template to specify flags:

Modifier flags have the downside that they probably shouldn't be overwritten when copying a regex with new flags, since they are part of the pattern (reinforced by how most regex flavors that allow non-enclosed mode modifiers allow them at any position). So while RegExp`(?gi)…` (as opposed to e.g. RegExp.make('gi')`…`) doesn't pose a problem on its own, new RegExp(RegExp`(?gi)…`, 'ms') might.

@erights

Hi @slevithan , I am interested in this, but I cannot take the time to lead, or to write the spec text.

Glad to know nevertheless that you're interested in the ideas!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants