Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rare Han characters #86

Open
xfq opened this issue Oct 17, 2023 · 2 comments
Open

Rare Han characters #86

xfq opened this issue Oct 17, 2023 · 2 comments

Comments

@xfq
Copy link
Member

xfq commented Oct 17, 2023

https://w3c.github.io/typography/#charset

Maybe we could mention rare Han characters here. In China, many people are unable to open a bank account online, buy train tickets online, or even buy cars and apartments because of uncommon characters in their names.

Problems include but are not limited to:

  • The rare character is not encoded in Unicode

  • The IMEs and fonts don’t support these rare characters

  • When GBK was defined, Unicode at the time did not have those characters, so GBK used the codes in the user-defined area. When Unicode encoded these characters, different input methods output different code points.

  • Different systems use different PUA code points for some rare characters, resulting in multiple code points for one character. Different systems use different input methods and output different Unicode code points for the same character, causing name comparison across systems to fail.

  • Because the rare character is not encoded, people worked around the problem and used all-capital pinyin, first-letter-capital pinyin, lower-case pinyin, and other methods. Although the problem was temporarily solved, it will fail cross-system name comparison.

Anyway, although things are getting better and better, there are still gaps for the support of rare Han characters.

@xfq
Copy link
Member Author

xfq commented Oct 30, 2023

We might want to consider writing a gap report for this. I'll record some relevant information in this issue.

@xfq
Copy link
Member Author

xfq commented Oct 30, 2023

Recently, mobile phone manufacturers are finally starting to implement GB18030-2022 level 3. The MiSans font family added 60,340 new characters to comply with the latest GB18030-2022 national standard.

There needs to be a free and open source font that supports all characters currently used in personal names. The new MiSans L3 font is an improvement, but it's not enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant