-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggest GBK instead of gb18030 for Simplified Chinese fallback #4557
Comments
What happens if the form submission contains a character that cannot be represented in GBK though, and requires GB 18030? |
The characters not representable in GBK will be converted to decimal numeric character references. Considering that failing to label the encoding is a legacy authoring error, it seems implausible that more such erroneous sites would be broken by failing to submit non-GBK characters as gb18030 than be broken by submitting non-GBK gb18030 characters to a form handler that expects only GBK. We don't have actual data for this. The Firefox experience from problems that arose from submitting 3-byte EUC-JP sequences was extrapolated to make Big5 encode asymmetric with decode and to keep GBK as a distinct encoding from gb18030 with asymmetric encode and decode. |
Anyway, if we believe that the Encoding Standard made the right call on GBK, what the HTML Standard says makes no sense. |
It's equivalent for decoding, but gives more conservative encoding that's likely to be more compatible. Fixes #4557.
It's equivalent for decoding, but gives more conservative encoding that's likely to be more compatible. Fixes #4557.
It's equivalent for decoding, but gives more conservative encoding that's likely to be more compatible. Fixes #4557.
It's equivalent for decoding, but gives more conservative encoding that's likely to be more compatible. Fixes whatwg#4557.
It's equivalent for decoding, but gives more conservative encoding that's likely to be more compatible. Fixes whatwg#4557.
In the table under step 8 of https://html.spec.whatwg.org/#determining-the-character-encoding , change gb18030 to GBK. Both decode the same way per the Encoding Standard, but GBK doesn't generate 4-byte sequences in form submission. Sites that are legacy enough to have unlabeled content might not be able to deal with the 4-byte sequences.
(Firefox uses GBK instead of gb18030.)
The text was updated successfully, but these errors were encountered: