Skip to content

<regex>: Cache bitmasks of negated character classes during matching #5487

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

muellerj2
Copy link
Contributor

Follow-up to #5403: Since the matcher class just got renamed, we can add some member variables to cache the bitmasks for negated character classes when such negated character classes appear in the regex. This way, we can avoid the class lookup whenever we encounter such character classes during matching.

I also removed the duplicated class-matching logic in _Skip: It was annoying to keep _Skip and _Do_class in sync, and the logic in _Skip is by far not as well-tested as _Do_class.

To fuse the duplicated logic, I changed _Do_class to no longer modify the matcher's current position. Instead, it now takes a position starting from which the class should match and returns the position just after the sequence actually matched by the class. (Note that this can be more than one character because of collating elements.) If the character class doesn't match, _Do_class just returns the starting position.

The current matcher position is now updated in _Match_pat after calling _Do_class, while _Skip just checks whether _Do_class returns the starting position or not.

@muellerj2 muellerj2 requested a review from a team as a code owner May 10, 2025 17:28
@github-project-automation github-project-automation bot moved this to Initial Review in STL Code Reviews May 10, 2025
@StephanTLavavej StephanTLavavej added performance Must go faster regex meow is a substring of homeowner labels May 11, 2025
@StephanTLavavej StephanTLavavej self-assigned this May 11, 2025
@StephanTLavavej
Copy link
Member

This is great, thank you! 😻

@StephanTLavavej StephanTLavavej removed their assignment May 15, 2025
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews May 15, 2025
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews May 16, 2025
@StephanTLavavej
Copy link
Member

I'm mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej merged commit e9912d8 into microsoft:main May 17, 2025
39 checks passed
@github-project-automation github-project-automation bot moved this from Merging to Done in STL Code Reviews May 17, 2025
@StephanTLavavej
Copy link
Member

💵 0️⃣ 1️⃣ 🤿

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster regex meow is a substring of homeowner
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

2 participants