Skip to content

Fix Regex Character Class Escape Tests#4423

Merged
ptomato merged 2 commits intotc39:mainfrom
Aurele-Barriere:regex-character-class-escape
Mar 13, 2025
Merged

Fix Regex Character Class Escape Tests#4423
ptomato merged 2 commits intotc39:mainfrom
Aurele-Barriere:regex-character-class-escape

Conversation

@Aurele-Barriere
Copy link
Copy Markdown
Contributor

This PR is the follow-up of #4364 and #4195

For each character class escape (\d, \D, \s, \S, \w, \W), check positive cases (the escape matches all characters it's supposed to match) and negative cases (the escape doesn't match any of the characters it should not match). Each of these checks is also done in Unicode mode and with the v flag.

This uses regenerate.js from the unicode-property-escapes-tests repo to generate strings that contain exactly the characters that are supposed to be matched or not matched for each escape.

Comparison is done with regex test instead of regex replace to optimize the tests.

This is part of my work at the SYSTEMF lab at EPFL.

@Aurele-Barriere
Copy link
Copy Markdown
Contributor Author

Replying to a comment from @ptomato in #4364 here:

For future reference — when I cleaned up the original test generator I removed this file that modified the regenerate library's object prototype, and instead I turned it into regular functions in index.mjs. I would like to try to minimize the diff in this PR so that it's easier to see what changed, so I've taken the liberty of editing these functions so they don't modify the regenerate prototype and moving them back to index.mjs.

I can't see the changes you've made but I've tried replicating them in the most recent commit, and removed the file regenerate.mjs. Let me know if this is what you had in mind.

For each character class escape (\d, \D, \s, \S, \w, \W), check
positive cases (the escape matches all characters it's supposed to
match) and negative cases (the escape doesn't match any of the
characters it should not match).  Each of these checks is also done in
Unicode mode and with the v flag.

This uses regenerate.js from the unicode-property-escapes-tests
repo to generate strings that contain exactly the characters that
are supposed to be matched or not matched for each escape.

Comparison is done with regex test instead of regex replace to
optimize the tests.

This is part of my work at the SYSTEMF lab at EPFL.

Avoid modifying the regenerate library object prototype.
@ptomato ptomato force-pushed the regex-character-class-escape branch from 074b5fd to 2f8296e Compare March 13, 2025 18:38
@ptomato
Copy link
Copy Markdown
Contributor

ptomato commented Mar 13, 2025

I can't see the changes you've made but I've tried replicating them in the most recent commit, and removed the file regenerate.mjs. Let me know if this is what you had in mind.

Thanks, that's exactly what I had in mind.

I've pushed an update with some coding style fixes and split the commits into one that modifies the test generator script, and one with the resulting generated tests. I do have some comments/questions remaining so I'll reply inline.

Copy link
Copy Markdown
Contributor

@ptomato ptomato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually never mind, I answered my own questions by reading the description of the original PR you opened (#4195). I think this is ready to merge. Thank you very much for sticking with this through the long process of fixing and moving over the test generator script.

@ptomato ptomato merged commit 3f10507 into tc39:main Mar 13, 2025
11 checks passed
@Aurele-Barriere
Copy link
Copy Markdown
Contributor Author

Perfect, thanks for the review and edits! Happy to have helped!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants