Handle all none unicode character class escapes natively #439
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before this change we would not resolve character class escapes (with the exception of
\d
) to their possible values.This meant that a pattern like
/\w+\W+/
would be considered vulnerable even though/[a-zA-Z0-9_]+[^a-zA-Z0-9_]+/
would not, because it didn't know that there was no overlap in possible values between\w
and\W
. Note there was some logic that would cancel out exact inversions if it was written like/\w+[^\w]+/
.All none unicode character class escapes are now resolved to their possible values.
Unicode escapes are not expanded, and I'm not sure how to safely support those given the contents of different unicode properties can change over time when new unicode versions are released. Also there's currently no api to ask the browser what version of unicode it's on or what the contents of a unicode property are.
This now means
/^[\w+-]+(?:\.[\w+-]+)*@[\da-zA-Z]+(?:[.-][\da-zA-Z]+)*\.[a-zA-Z]{2,}$/u
used in some places for validating email addresses is now marked as safe, where it previously wasn't.