-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EUC-jp encoding/decoding support #59
Comments
As is the case of EUC-KR (form; misc; #62), Chromium's failure in 'form (misc)' in EUC-JP encoding is likely to be caused NOT by Chromium's encoder BUT by Blink's handling of Cf / Default_Ignorable characters. |
One test showed that the code I had written differed from the spec. However, it seems the test expectation differs from the spec, too: Curiously, the spec makes EUC-JP differ from the other two-byte encodings when it comes to the handling of a non-ASCII bogus trail byte. (0xA0 and 0xFF get prepended in addition to ASCII getting prepended.) Why is EUC-JP different? Is it intentional? |
More concretely, you're saying that
should be
which I think is reasonable and this was probably just an oversight. Confirmation from @jungshik would be good, but I'm happy to fix this. If I were to fix this, should I make sure I land web-platform-tests at the same time or are you sorting out web-platform-tests @hsivonen? |
I'm saying that that would be consistent with the other two-byte decoders. I haven't investigated legacy decoder behavior on this point, so I can't at this time say whether it should be consistent that way.
I don't have a plan to be sorting out Web Platform Tests. However, I might have to for tests that have already been imported to mozilla-central. (It seems that this one is still in the PR stage on the WPT side.) |
Firefox (uconv) and Chrome think |
Safari Technology Preview yields "X�X". |
(Chrome and Safari don't seem to pick a different font by the way, but that's a different class of bugs.) |
OK, in that case, I think the EUC-JP decoder should prepend ASCII bytes only. |
Prepared a PR. |
Firefox Nightly 56 fixed the encoder, but one decoder test still fails. |
Today and yesterday i updated the results at https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#eucjp for Firefox, FNightly, Chrome, and Canary. The latest summary is: |
See discussion upthread. Firefox is correct per spec as amended in May. |
See whatwg#59 (comment) for context.
See whatwg#59 (comment) for context.
Now that Firefox passes all these tests and a year has passed, I'm happy to consider this done. A new issue would also be less noisy at this point, were one warranted. |
Results for a series of tests for EUC-jp encoding/decoding can be found at
https://www.w3.org/International/tests/repo/results/encoding-dbl-byte.en#eucjp
The tests can be run from that page (select the link in the left-most column) or get the tests from the WPT repo. There is a PR at
web-platform-tests/wpt#3198
The tests check whether:
The following summarises the current situation according to my testing, for major desktop browsers. (I will be adding nightly results and perhaps other browsers in time.) The table lists the number of characters that were NOT successfully converted by the test.
Notes:
Can we please investigate the failures to ascertain whether:
The following tool may be helpful for investigating issues. It converts between byte sequences and characters for all encodings in the Encoding spec. http://r12a.github.io/apps/encodings/
The text was updated successfully, but these errors were encountered: