-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Editorial: Fix incorrect use of UnicodeMatchPropertyValue #3587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…atabse file PropertyValueAliases.txt
I have another question, if you search "scx" in https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt, you will find there is no record for it. And according to the spec,
All RegExp that have the form of |
Nice observation, @Jack-Works! Unicode property Script_Extensions (scx) is unusual in being set-valued rather than scalar-valued, and as such need special consideration in our spec. I have added editorial corrections to this PR, and opened #3590 for a potential followup. |
1. If _p_ is `Script_Extensions`, then | ||
1. Assert: _vs_ is a property value or property value alias for property “Script” listed in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>. | ||
1. Let _v_ be the Set containing the “short name”, “long name”, and any other aliases corresponding with value _vs_ for property “Script” in <a href="https://unicode.org/Public/UCD/latest/ucd/PropertyValueAliases.txt"><code>PropertyValueAliases.txt</code></a>. | ||
1. Return the CharSet containing all Unicode code points whose character database definition includes the property “Script_Extensions” with value having a non-empty intersection with _v_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to call MaybeSimpleCaseFolding
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that would affect any code point that case-folds across script (or to/from Common). I don't know if there are any, but it's easy enough to accommodate. Done.
…cript_Extensions
Fixes #3586
Also includes commits with incidental fixes in nearby algorithms and steps.