Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GB 18030 2000 vs 2005 #22

Closed
jungshik opened this issue Dec 10, 2015 · 5 comments
Closed

GB 18030 2000 vs 2005 #22

jungshik opened this issue Dec 10, 2015 · 5 comments

Comments

@jungshik
Copy link

This is the continuation of https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c11

I forgot to reply @annevk's question there:

Jungshik, do you mean you want to make the swap mentioned at the end of comment 5?

> GB 18030   -2005  -2000
> 0xA8BC     U+1E3F U+E7C7
> 0x8135F437 U+E7C7 U+1E3F

My answer would be yes. Chrome, Safari and Opera do that. Firefox and IE do not.

My goal is to minimize the number of PUA code points after decoding partly because there'll be NO font support for those PUA code points on platforms like Android, iOS (and even on Windows 10 when additional fonts are installed for legacy compatibility. That is, old fonts like Simsun support them, but newer fonts like Microsoft Yahei do not).

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28740#c1 lists them and I thought that there are a bunch of PUA code point mappings that are dropped in GB 18030:2005 in favor of the regular Unicode code points.

According to Masatoshi Kimura , it's only U+1E3F for 0xA8BC that moved out of PUA area in GB 18030:2005, which is a big disappointment. (I wish GB18030 had taken a similar step to what's taken by HKSCS when it comes to PUA).

Anyway, at least one code point (0xA8BC <=> U+1E3F) should be mapped to a regular Unicode code point per GB18030:2005 instead of 2000.

@annevk
Copy link
Member

annevk commented Dec 16, 2015

In terms of the standard, the proposal here is to replace (7533, 0xE7C7) in https://encoding.spec.whatwg.org/index-gb18030.txt with (7533, 0x1E3F). I would be okay with that. Paging @hsivonen and @travisleithead as a heads up.

@annevk
Copy link
Member

annevk commented Jan 6, 2016

@vyv03354 I don't understand #17 (comment) since it seems these code points round trip fine at the moment. Did you mean that if I make the change I suggested above we have a new problem unless I change something else too?

@vyv03354
Copy link
Collaborator

vyv03354 commented Jan 6, 2016

If we only changed the mapping for 0xA8BC, the mapping table will no longer have U+E7C7. We should also change the mapping for 0x8135F437.
That said, it may not be a big deal because we already do not have U+E5E5.

@annevk
Copy link
Member

annevk commented Jan 6, 2016

You're right. And we cannot simply adjust gb18030 ranges I think so we would have to hard code it. 0x8135F437 becomes pointer 7457 so we could special case that in https://encoding.spec.whatwg.org/#index-gb18030-ranges-code-point (simply return U+E7C7 for that pointer). And then we would have to do the same in https://encoding.spec.whatwg.org/#index-gb18030-ranges-pointer if we wanted to keep round tripping this code point (if code point is U+E7C7, return 7457).

So this would result in an uglier algorithm, but if you all think it's worth it that's fine with me.

annevk added a commit that referenced this issue Jan 6, 2016
This changes a single mapping in index gb18030 and special cases a
lookup in the “index gb18030 ranges code point” and “index gb18030
ranges pointer” algorithms.
annevk added a commit that referenced this issue Jan 6, 2016
This changes a single mapping in index gb18030 and special cases a
lookup in the “index gb18030 ranges code point” and “index gb18030
ranges pointer” algorithms.
annevk added a commit that referenced this issue Jan 6, 2016
This changes a single mapping in index gb18030 and special cases a
lookup in the “index gb18030 ranges code point” and “index gb18030
ranges pointer” algorithms.
@annevk
Copy link
Member

annevk commented Jan 6, 2016

I created a PR for my proposal in #26. I would appreciate review before landing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants