Skip to content
This repository has been archived by the owner on Feb 16, 2024. It is now read-only.

Consideration for Perl-like (?[]) extended character classes instead of a flag #39

Closed
rbuckton opened this issue Aug 18, 2021 · 5 comments

Comments

@rbuckton
Copy link

I've been researching regular expression syntax in various languages and engines to inform possible future proposals to expand the ECMAScript regular expression syntax. One of the features I've been reviewing is Perl's Extended Bracketed Character Classes, which support operations such as:

  • Intersection (&)
  • Union (+ or |)
  • Subtraction (-)
  • Symmetric Difference (^)
  • Complement (!)
  • Grouping ((, ))

In this case, such a character class uses the tokens (?[ and ]). The contents of the expression can contain the above tokens, whitespace (which is ignored), character classes, metacharacters (such as \p{..}, \s, etc.), and certain escape sequences (such as \x0a, etc.). This allows you to write complex character classes like the following (based on the examples in the explainer):

# non-ASCI digits
(?[ \p{Decimal_Number} - [0-9] ])

# spans of word/identifier letters of specific scripts
(?[ \p{Script=Khmer} & [\p{Letter}\p{Mark}\p{Number}] ])

# breaking spaces
(?[ \p{White_Space} - \p{Line_Break=Glue} ])

# non-ASCII emoji
(?[ \p{Emoji} - \p{ASCII} ])

As well as classes like the following (from the perlre documentation):

# Matches digits in the Thai or Laotian scripts
(?[ ( \p{Thai} + \p{Lao} ) & \p{Digit} ])

Currently, (?[ is not valid RegExp syntax (with or without the u flag), so it provides an opportunity to add syntax to cover set notation functionality without needing to introduce a new flag.

@mathiasbynens
Copy link
Member

Previous discussion: #2

@markusicu
Copy link
Collaborator

Also, we are in the process of expanding the scope of our proposal slightly to fix problems with some existing syntax and semantics -- and that requires a new flag (which in turn gives us an opportunity to fix such problems).

We had looked at the experimental Perl syntax; it is syntactically a real outlier compared with how other regex engines have extended their syntax.

@sffc
Copy link
Collaborator

sffc commented Aug 19, 2021

I do like the Perl syntax if we were to revisit the syntax-only route.

@RunDevelopment
Copy link

@sffc I'm pretty sure the syntax-only route isn't possible because of proposed semantic changes like #16 and #37.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants