-
-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unicode 15 initial data files #171
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I regenerated the UCD files. No changes; in particular, no changes in DerivedNumericTypes/Values. |
TestInvariants fails. I sent an email discussing how to deal with incomplete data drops, such as Unihan data for new characters before even UnicodeData.txt has entries for new characters. |
Manishearth
previously approved these changes
Dec 2, 2021
Tsengtsz
previously approved these changes
Dec 2, 2021
Manishearth
previously approved these changes
Dec 3, 2021
Manishearth
approved these changes
Dec 4, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Notes from Ken
The repertoire here covers ALL of the 15.0.0 additions, synched to the Pipeline page (and matching Michel's CDAM ballot draft as of 10/31, but not yet incorporating any CDAM ballot comment dispositions from later).
Note that there is one significant departure, to deal with the name collision for U+1DF27 LATIN SMALL LETTER N WITH LEFT HOOK. I've anticipated the most likely outcome and added "RAISED" into the names of 1DF25..1DF2A.
Notes from Ken about the initial drop for PropList.txt
This includes the non-automatic new property assignments:
Added Ideographic and Unified_Ideograph for Extension H (31350..323AF)
Added Other_Alphabetic for one Kannada mark (0CF3), one Khojki vowel
sign (11241), and various Kawi signs and vowel signs (11F00.11F01,
11F03, 11F34..11F3A, 11F3E..11F40).
Added Diacritic for 3 Arabic word signs (10EFD..10EFF) and for the
Cyrillic modifier letters (1E030..1E06C).
Also added Other_Lowercase for the Cyrillic modifier letters.
Added Terminal_Punctuation and Sentence_Terminal for the two Kawi
dandas (11F43..11F44), for general consistency with the way the danda
and double danda are treated in related scripts. The rest of the Kawi
punctuation is really murky, with no real analysis presented in the
proposal, so I didn't make any assumptions that it would play in
sentence break or even be terminal in position. (Kawi is one of the SE
Asian scripts with no word spaces, so it ends up as lb=SA and requires
special handling for paragraph formatting, anyway.)
Notes from Markus
2 new sets of decimal digits