-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GB 18030 2000 vs 2005 #22
Comments
In terms of the standard, the proposal here is to replace (7533, 0xE7C7) in https://encoding.spec.whatwg.org/index-gb18030.txt with (7533, 0x1E3F). I would be okay with that. Paging @hsivonen and @travisleithead as a heads up. |
@vyv03354 I don't understand #17 (comment) since it seems these code points round trip fine at the moment. Did you mean that if I make the change I suggested above we have a new problem unless I change something else too? |
If we only changed the mapping for 0xA8BC, the mapping table will no longer have U+E7C7. We should also change the mapping for 0x8135F437. |
You're right. And we cannot simply adjust gb18030 ranges I think so we would have to hard code it. 0x8135F437 becomes pointer 7457 so we could special case that in https://encoding.spec.whatwg.org/#index-gb18030-ranges-code-point (simply return U+E7C7 for that pointer). And then we would have to do the same in https://encoding.spec.whatwg.org/#index-gb18030-ranges-pointer if we wanted to keep round tripping this code point (if code point is U+E7C7, return 7457). So this would result in an uglier algorithm, but if you all think it's worth it that's fine with me. |
This changes a single mapping in index gb18030 and special cases a lookup in the “index gb18030 ranges code point” and “index gb18030 ranges pointer” algorithms.
This changes a single mapping in index gb18030 and special cases a lookup in the “index gb18030 ranges code point” and “index gb18030 ranges pointer” algorithms.
This changes a single mapping in index gb18030 and special cases a lookup in the “index gb18030 ranges code point” and “index gb18030 ranges pointer” algorithms.
I created a PR for my proposal in #26. I would appreciate review before landing this. |
This is the continuation of https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c11
I forgot to reply @annevk's question there:
My answer would be yes. Chrome, Safari and Opera do that. Firefox and IE do not.
My goal is to minimize the number of PUA code points after decoding partly because there'll be NO font support for those PUA code points on platforms like Android, iOS (and even on Windows 10 when additional fonts are installed for legacy compatibility. That is, old fonts like Simsun support them, but newer fonts like Microsoft Yahei do not).
https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c1 lists them and I thought that there are a bunch of PUA code point mappings that are dropped in GB 18030:2005 in favor of the regular Unicode code points.
According to Masatoshi Kimura , it's only U+1E3F for 0xA8BC that moved out of PUA area in GB 18030:2005, which is a big disappointment. (I wish GB18030 had taken a similar step to what's taken by HKSCS when it comes to PUA).
Anyway, at least one code point (0xA8BC <=> U+1E3F) should be mapped to a regular Unicode code point per GB18030:2005 instead of 2000.
The text was updated successfully, but these errors were encountered: