JS: Add ECMAScript 2024 `v` Flag Operators for Regex Parsing #18899

Napalys · 2025-02-28T18:20:46Z

This pull request adds support for parsing ECMAScript 2024 v flag operators, including:

Nested Classes: Enables using nested character classes in regexes.
Example: /[[abc][cz]]/v
Intersection (&&): Matches characters common to both sets.
Example: /[[abc]&&[cz]]/v
Subtraction (--): Removes characters from a set.
Example: /[[abc]--[cz]]/v
Mixing operations at the same level is not allowed:
- Invalid: /[[abc]&&[cz]--[zz]]/v
- Valid: /[[abc]&&[[cz]--[zz]]]/v
Union: Combines multiple sets.
Example: /[[abc][cz]]/v
Quoted Strings (\q{}): Allows matching exact sequences.
Example: /[\q{ab|cb|db}]/v

Commit by commit review encouraged.

Useful links:

With correct parsing, this no longer produces an false positive in Closes #18854.

Copilot

PR Overview

This pull request introduces support for ECMAScript 2024 regex constructs under the new "v" flag. Key changes include:

New AST node classes for character class operations (Subtraction, QuotedString, Intersection, Union)
Enhancements to RegExpParser to conditionally enable nested character classes, new operators, and quoted string parsing with a fallback mechanism when errors are encountered
New test inputs covering quoted strings, unions, intersections, subtractions, and nested character classes

Reviewed Changes

File	Description
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassSubtraction.java	New AST node for subtraction operator in character classes
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassQuotedString.java	New AST node for handling quoted string escapes
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassIntersection.java	New AST node for intersection operator in character classes
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassUnion.java	New AST node for union operator in character classes
javascript/extractor/src/com/semmle/js/parser/RegExpParser.java	Extended parser functionality to support the new "v" flag and corresponding regex operations
javascript/extractor/src/com/semmle/js/extractor/ASTExtractor.java and RegExpExtractor.java	Updated extraction logic to accommodate new AST node types and conditional flag handling

Copilot reviewed 31 out of 31 changed files in this pull request and generated 2 comments.

Tip: Leave feedback on Copilot's review comments with the 👎 and 👍 buttons to help improve review quality. Learn more

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java

asgerf

Excellent work! I have a couple of comments to keep you busy during the week 😄

javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassIntersection.java

javascript/extractor/src/com/semmle/js/extractor/ASTExtractor.java

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java

javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassUnion.java

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java

javascript/ql/lib/semmle/javascript/Regexp.qll

Co-authored-by: Asgerf <asgerf@github.com>

erik-krogh

Nice work 👍

I didn't look through it thoroughly, I assume Asger did that.

Did you run database creation on the latest main of https://github.com/babel/babel and https://github.com/tc39/test262?
Those projects contain all kinds of valid and invalid syntax, so it's a nice test of whether something is horribly wrong.

javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassIntersection.java

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java

javascript/ql/lib/change-notes/2025-03-03-regex-v.md

Napalys · 2025-03-11T08:24:57Z

Nice work 👍

I didn't look through it thoroughly, I assume Asger did that.

Did you run database creation on the latest main of https://github.com/babel/babel and https://github.com/tc39/test262? Those projects contain all kinds of valid and invalid syntax, so it's a nice test of whether something is horribly wrong.

For Babel, the extraction failed catastrophically on this Invalid Syntax File. However, after deleting it, the database was successfully created. I assume this is expected since the file contains invalid syntax?

The test262 database was created successfully without any issues.

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java

erik-krogh · 2025-03-11T08:39:03Z

For Babel, the extraction failed catastrophically on this Invalid Syntax File. However, after deleting it, the database was successfully created. I assume this is expected since the file contains invalid syntax?

No, that is not expected.
Database creation should succeed, but with some extracted syntax errors.

However, that seems to be unrelated to this PR.
But maybe you could look into fixing that crash later? (And bump the SHA for babel/babel in DCA in the process).

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>

asgerf

🎉

javascript/ql/lib/change-notes/2025-03-03-regex-v.md

Co-authored-by: Asger F <asgerf@github.com>

github-actions bot added the JS label Feb 28, 2025

Napalys force-pushed the js/ecma-2024-regex branch from 84fddf1 to 94adaf8 Compare March 2, 2025 15:56

Napalys added 2 commits March 2, 2025 17:08

Exposed flags to the regex parser

cb448db

Added quoted string \q parser test cases

d162acf

Napalys force-pushed the js/ecma-2024-regex branch 2 times, most recently from 605456f to f93419e Compare March 2, 2025 18:24

github-actions bot added the documentation label Mar 3, 2025

Napalys changed the title ~~JS: WIP: Ecma 2024 regex~~ JS: Add ECMAScript 2024 v Flag Operators for Regex Parsing Mar 3, 2025

Napalys force-pushed the js/ecma-2024-regex branch from 6fe7753 to 430514b Compare March 3, 2025 12:00

Napalys marked this pull request as ready for review March 3, 2025 13:17

Copilot AI review requested due to automatic review settings March 3, 2025 13:17

Napalys requested a review from a team as a code owner March 3, 2025 13:17

Copilot AI reviewed Mar 3, 2025

View reviewed changes

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java Outdated Show resolved Hide resolved

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java Outdated Show resolved Hide resolved

Napalys added 11 commits March 3, 2025 14:37

Add support for '\q{}' escape sequence in regular expressions.

ed418be

Added test cases for nested character class.

ab7e08f

Add additional test cases.

de6f3b1

Added ability to parse nested character classes while using v flag.

2333c53

Added test cases for intersection

fa5093f

Added intersection support

381b5eb

Added test cases for subtraction --.

ee83c42

Added support for -- subtraction opetor.

3664d50

Added test cases for union.

1e05f32

Added support for character class union in regex processing

fe6de2f

Updated dbscheme

c0202f6

Napalys force-pushed the js/ecma-2024-regex branch from 78aa5dc to 9e1f050 Compare March 3, 2025 13:38

asgerf reviewed Mar 4, 2025

View reviewed changes

Napalys force-pushed the js/ecma-2024-regex branch from d6df34e to 8558ead Compare March 5, 2025 08:33

Napalys added 3 commits March 5, 2025 09:34

Added change note

c7f03df

Added a test case from github#18854

9ea89cd

Renamed character class operators lists to elements.

8099423

Upgraded javascrip database schema

d884e5f

Napalys requested a review from asgerf March 5, 2025 11:10

Napalys added 2 commits March 7, 2025 08:32

Add test cases for v flag operators in RegExp library-tests.

9cc2620

Add RegExpIntersection class to support intersection terms in regex

e0f20b2

Napalys force-pushed the js/ecma-2024-regex branch from 6380ec8 to d40ff96 Compare March 10, 2025 10:17

Napalys added 2 commits March 10, 2025 11:18

Add RegExpQuotedString class to support quoted string escapes in regex

8cbc0ae

Add RegExpSubtraction class to support subtraction terms in regex

f48eab9

Napalys force-pushed the js/ecma-2024-regex branch from d40ff96 to f48eab9 Compare March 10, 2025 10:18

asgerf reviewed Mar 10, 2025

View reviewed changes

Applied changes from comments.

9c8e0a5

Co-authored-by: Asgerf <asgerf@github.com>

Napalys force-pushed the js/ecma-2024-regex branch from a337863 to 9c8e0a5 Compare March 10, 2025 12:29

Napalys requested review from asgerf and erik-krogh March 10, 2025 12:58

erik-krogh reviewed Mar 10, 2025

View reviewed changes

Improved documentation, removed union fram change note.

08c07f8

erik-krogh reviewed Mar 11, 2025

View reviewed changes

javascript/extractor/src/com/semmle/js/parser/RegExpParser.java Outdated Show resolved Hide resolved

Update javascript/extractor/src/com/semmle/js/parser/RegExpParser.java

3191b2c

Co-authored-by: Erik Krogh Kristensen <erik-krogh@github.com>

erik-krogh previously approved these changes Mar 11, 2025

View reviewed changes

asgerf previously approved these changes Mar 11, 2025

View reviewed changes

javascript/ql/lib/change-notes/2025-03-03-regex-v.md Outdated Show resolved Hide resolved

Update javascript/ql/lib/change-notes/2025-03-03-regex-v.md

a900f2c

Co-authored-by: Asger F <asgerf@github.com>

Napalys dismissed stale reviews from asgerf and erik-krogh via a900f2c March 11, 2025 10:57

asgerf approved these changes Mar 11, 2025

View reviewed changes

Napalys merged commit a4f2264 into github:main Mar 11, 2025
14 checks passed

This was referenced Mar 11, 2025

JavaScript: false positive with unicode sets for character classes that contain brackets #18854

Closed

JS: Update database.stats #18981

Merged

Napalys deleted the js/ecma-2024-regex branch March 12, 2025 16:09

Napalys mentioned this pull request Mar 14, 2025

JS: Extractor handle error instead of exiting. #18984

Merged

JS: Add ECMAScript 2024 v Flag Operators for Regex Parsing #18899

JS: Add ECMAScript 2024 v Flag Operators for Regex Parsing #18899

Uh oh!

Conversation

Napalys commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

PR Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erik-krogh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Napalys commented Mar 11, 2025

Uh oh!

Uh oh!

erik-krogh commented Mar 11, 2025

Uh oh!

asgerf left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JS: Add ECMAScript 2024 `v` Flag Operators for Regex Parsing #18899

JS: Add ECMAScript 2024 `v` Flag Operators for Regex Parsing #18899

Napalys commented Feb 28, 2025 •

edited

Loading