-
Notifications
You must be signed in to change notification settings - Fork 1.7k
JS: Add ECMAScript 2024 v
Flag Operators for Regex Parsing
#18899
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
cb448db
Exposed flags to the regex parser
Napalys d162acf
Added quoted string \q parser test cases
Napalys ed418be
Add support for '\q{}' escape sequence in regular expressions.
Napalys ab7e08f
Added test cases for nested character class.
Napalys de6f3b1
Add additional test cases.
Napalys 2333c53
Added ability to parse nested character classes while using `v` flag.
Napalys fa5093f
Added test cases for intersection
Napalys 381b5eb
Added intersection support
Napalys ee83c42
Added test cases for subtraction `--`.
Napalys 3664d50
Added support for `--` subtraction opetor.
Napalys 1e05f32
Added test cases for union.
Napalys fe6de2f
Added support for character class union in regex processing
Napalys c0202f6
Updated dbscheme
Napalys c7f03df
Added change note
Napalys 9ea89cd
Added a test case from #18854
Napalys 8099423
Renamed character class operators lists to `elements`.
Napalys 8086c25
Removed `Union` as standard character class is already an union.
Napalys 95d05ce
Now store `vFlagEnabled` instead of each time searching for it.
Napalys d884e5f
Upgraded `javascrip` database schema
Napalys 9cc2620
Add test cases for `v` flag operators in RegExp library-tests.
Napalys e0f20b2
Add RegExpIntersection class to support intersection terms in regex
Napalys 8cbc0ae
Add `RegExpQuotedString` class to support quoted string escapes in regex
Napalys f48eab9
Add `RegExpSubtraction` class to support subtraction terms in regex
Napalys 9c8e0a5
Applied changes from comments.
Napalys 08c07f8
Improved documentation, removed union fram change note.
Napalys 3191b2c
Update javascript/extractor/src/com/semmle/js/parser/RegExpParser.java
Napalys a900f2c
Update javascript/ql/lib/change-notes/2025-03-03-regex-v.md
Napalys File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
1,193 changes: 1,193 additions & 0 deletions
1,193
javascript/downgrades/5b5db607d20c7b449cef2d1c926b24d77c69bebb/old.dbscheme
Large diffs are not rendered by default.
Oops, something went wrong.
1,190 changes: 1,190 additions & 0 deletions
1,190
...script/downgrades/5b5db607d20c7b449cef2d1c926b24d77c69bebb/semmlecode.javascript.dbscheme
Large diffs are not rendered by default.
Oops, something went wrong.
2 changes: 2 additions & 0 deletions
2
javascript/downgrades/5b5db607d20c7b449cef2d1c926b24d77c69bebb/upgrade.properties
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
description: Add support for quoted string, intersection and subtraction | ||
compatibility: backwards |
26 changes: 26 additions & 0 deletions
26
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassIntersection.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
package com.semmle.js.ast.regexp; | ||
|
||
import com.semmle.js.ast.SourceLocation; | ||
import java.util.List; | ||
|
||
/** | ||
* A character class intersection in a regular expression available only with the `v` flag. | ||
* Example: [[abc]&&[ab]&&[b]] matches character `b` only. | ||
*/ | ||
public class CharacterClassIntersection extends RegExpTerm { | ||
private final List<RegExpTerm> elements; | ||
|
||
public CharacterClassIntersection(SourceLocation loc, List<RegExpTerm> elements) { | ||
super(loc, "CharacterClassIntersection"); | ||
this.elements = elements; | ||
} | ||
|
||
@Override | ||
public void accept(Visitor v) { | ||
v.visit(this); | ||
} | ||
|
||
public List<RegExpTerm> getElements() { | ||
return elements; | ||
} | ||
} |
28 changes: 28 additions & 0 deletions
28
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassQuotedString.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
package com.semmle.js.ast.regexp; | ||
|
||
import com.semmle.js.ast.SourceLocation; | ||
|
||
/** | ||
* A quoted string escape sequence '\q{}' in a regular expression. | ||
* This feature is a non-standard extension that requires the 'v' flag. | ||
* | ||
* Example: [\q{abc|def}] creates a character class that matches either the string | ||
* "abc" or "def". Within the quoted string, only the alternation operator '|' is supported. | ||
*/ | ||
public class CharacterClassQuotedString extends RegExpTerm { | ||
private final RegExpTerm term; | ||
|
||
public CharacterClassQuotedString(SourceLocation loc, RegExpTerm term) { | ||
super(loc, "CharacterClassQuotedString"); | ||
this.term = term; | ||
} | ||
|
||
public RegExpTerm getTerm() { | ||
return term; | ||
} | ||
|
||
@Override | ||
public void accept(Visitor v) { | ||
v.visit(this); | ||
} | ||
} |
26 changes: 26 additions & 0 deletions
26
javascript/extractor/src/com/semmle/js/ast/regexp/CharacterClassSubtraction.java
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
package com.semmle.js.ast.regexp; | ||
|
||
import com.semmle.js.ast.SourceLocation; | ||
import java.util.List; | ||
|
||
/** | ||
* A character class subtraction in a regular expression available only with the `v` flag. | ||
* Example: [[abc]--[a]--[b]] matches character `c` only. | ||
*/ | ||
public class CharacterClassSubtraction extends RegExpTerm { | ||
private final List<RegExpTerm> elements; | ||
|
||
public CharacterClassSubtraction(SourceLocation loc, List<RegExpTerm> elements) { | ||
super(loc, "CharacterClassSubtraction"); | ||
this.elements = elements; | ||
} | ||
|
||
@Override | ||
public void accept(Visitor v) { | ||
v.visit(this); | ||
} | ||
|
||
public List<RegExpTerm> getElements() { | ||
return elements; | ||
} | ||
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 2 additions & 0 deletions
2
javascript/extractor/tests/es2024/input/additional_test_cases.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
/^p(ost)?[ |\.]*o(ffice)?[ |\.]*(box)?[ 0-9]*[^[a-z ]]*/g; | ||
/([ ]*[a-z0-9&#*=?@\\><:,()$[\]_.{}!+%^-]+)+X/; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
/[[abc]&&[bcd]]/v; // Valid use of intersection operator, matches b or c | ||
/abc&&bcd/v; //Valid regex, but no intersection operation: Matches the literal string "abc&&bcd" | ||
/[abc]&&[bcd]/v; // Valid regex, but incorrect intersection operation: | ||
// - Matches a single character from [abc] | ||
// - Then the literal "&&" | ||
// - Then a single character from [bcd] | ||
/[[abc]&&[bcd]&&[c]]/v; // Valid use of intersection operator, matches c |
3 changes: 3 additions & 0 deletions
3
javascript/extractor/tests/es2024/input/regex_nested_character_class.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
/[[]]/v; //Previously not allowed to nest character classes now completely valid with v flag. | ||
/[[a]]/v; | ||
/[ [] [ [] [] ] ]/v; |
5 changes: 5 additions & 0 deletions
5
javascript/extractor/tests/es2024/input/regex_quoted_string.js
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
/[\q{abc}]/v; | ||
/[\q{abc|cbd|dcb}]/v; | ||
/[\q{\}}]/v; | ||
/[\q{\{}]/v; | ||
/[\q{cc|\}a|cc}]/v; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
/[\p{Script_Extensions=Greek}--\p{Letter}]/v; | ||
/[[abc]--[cbd]]/v; | ||
/[[abc]--[cbd]--[bde]]/v; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
const regex = /\b(?:https?:\/\/|mailto:|www\.)(?:[\S--[\p{P}<>]]|\/|[\S--[\[\]]]+[\S--[\p{P}<>]])+|\b[\S--[@\p{Ps}\p{Pe}<>]]+@([\S--[\p{P}<>]]+(?:\.[\S--[\p{P}<>]]+)+)/gmv; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
/[\p{Script_Extensions=Greek}\p{RGI_Emoji}]/v; | ||
/[[abc][cbd]]/v; | ||
/[\p{Emoji}\q{a&}byz]/v; | ||
/[\q{\\\}a&}byz]/v; | ||
/[\q{\\}]/v; | ||
/[\q{abc|cbd|\}}]/v; |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{ | ||
"experimental": true | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.