Skip to content

Commit

Permalink
Fix #17: gb18030 cannot encode U+E5E5
Browse files Browse the repository at this point in the history
Because of deployed content index gb18030 maps 0xA3 0xA0 to U+3000
rather than U+E5E5 when decoding. Therefore encoding U+E5E5 cannot work
either.
  • Loading branch information
annevk committed Jan 20, 2016
1 parent 929a3ff commit c798941
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 5 deletions.
14 changes: 11 additions & 3 deletions Overview.html
Expand Up @@ -7,7 +7,7 @@

<p><a class="logo" href="https://whatwg.org/"><img alt="WHATWG" height="100" src="https://resources.whatwg.org/logo-encoding.svg" width="100"></a></p>
<h1>Encoding</h1>
<h2 class="no-num no-toc" id="living-standard-—-last-updated-16-december-2015">Living Standard — Last Updated 16 December 2015</h2>
<h2 class="no-num no-toc" id="living-standard-—-last-updated-20-january-2016">Living Standard — Last Updated 20 January 2016</h2>

<dl>
<dt>Participate:
Expand Down Expand Up @@ -800,8 +800,10 @@ <h2 id="indexes"><span class="secno">6 </span>Indexes</h2>
<td><a href="index-gb18030.txt">index-gb18030.txt</a>
<td>This matches the GB18030-2000 standard for code points encoded as two bytes, except
for 0xA3 0xA0 which maps to U+3000 to be compatible with deployed content.
<!-- https://bugs.webkit.org/show_bug.cgi?id=17014
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396 -->
<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=131837
https://bugs.webkit.org/show_bug.cgi?id=17014
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396
https://github.com/whatwg/encoding/issues/17 -->
<tr>
<td><dfn id="index-gb18030-ranges">index gb18030 ranges</dfn>
<td><a href="index-gb18030-ranges.txt">index-gb18030-ranges.txt</a>
Expand Down Expand Up @@ -1766,6 +1768,12 @@ <h4 id="gb18030-encoder"><span class="secno">11.2.2 </span><dfn>gb18030 encoder<
<li><p>If <var>code point</var> is an <a href="#ascii-code-point">ASCII code point</a>, return
a byte whose value is <var>code point</var>.

<li>
<p>If <var>code point</var> is U+E5E5, return <a href="#error">error</a> with <var>code point</var>.

<p class="note"><a href="#index-gb18030">Index gb18030</a> maps 0xA3 0xA0 to U+3000 rather than U+E5E5 for
compatibility with deployed content. Therefore it cannot roundtrip.

<li><p>If the <a href="#gbk-flag">gbk flag</a> is set and <var>code point</var> is
U+20AC, return byte 0x80.

Expand Down
12 changes: 10 additions & 2 deletions Overview.src.html
Expand Up @@ -714,8 +714,10 @@ <h2>Indexes</h2>
<td><a href=index-gb18030.txt>index-gb18030.txt</a>
<td>This matches the GB18030-2000 standard for code points encoded as two bytes, except
for 0xA3 0xA0 which maps to U+3000 to be compatible with deployed content.
<!-- https://bugs.webkit.org/show_bug.cgi?id=17014
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396 -->
<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=131837
https://bugs.webkit.org/show_bug.cgi?id=17014
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396
https://github.com/whatwg/encoding/issues/17 -->
<tr>
<td><dfn>index gb18030 ranges</dfn>
<td><a href=index-gb18030-ranges.txt>index-gb18030-ranges.txt</a>
Expand Down Expand Up @@ -1680,6 +1682,12 @@ <h4><dfn>gb18030 encoder</dfn></h4>
<li><p>If <var>code point</var> is an <span>ASCII code point</span>, return
a byte whose value is <var>code point</var>.

<li>
<p>If <var>code point</var> is U+E5E5, return <span>error</span> with <var>code point</var>.

<p class="note"><span>Index gb18030</span> maps 0xA3 0xA0 to U+3000 rather than U+E5E5 for
compatibility with deployed content. Therefore it cannot roundtrip.

<li><p>If the <span>gbk flag</span> is set and <var>code point</var> is
U+20AC, return byte 0x80.

Expand Down

0 comments on commit c798941

Please sign in to comment.