Skip to content

Commit

Permalink
Fix #22: align with GB18030-2005
Browse files Browse the repository at this point in the history
This changes a single mapping in index gb18030 and special cases a
lookup in the “index gb18030 ranges code point” and “index gb18030
ranges pointer” algorithms.
  • Loading branch information
annevk committed Jan 20, 2016
1 parent c798941 commit e7b9ce0
Show file tree
Hide file tree
Showing 36 changed files with 56 additions and 48 deletions.
16 changes: 10 additions & 6 deletions Overview.html
Original file line number Diff line number Diff line change
Expand Up @@ -798,19 +798,18 @@ <h2 id="indexes"><span class="secno">6 </span>Indexes</h2>
<tr>
<td><dfn id="index-gb18030">index gb18030</dfn>
<td><a href="index-gb18030.txt">index-gb18030.txt</a>
<td>This matches the GB18030-2000 standard for code points encoded as two bytes, except
for 0xA3 0xA0 which maps to U+3000 to be compatible with deployed content.
<td>This matches the GB18030-2005 standard for code points encoded as two bytes, except for
0xA3 0xA0 which maps to U+3000 to be compatible with deployed content.
<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=131837
https://bugs.webkit.org/show_bug.cgi?id=17014
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396
https://github.com/whatwg/encoding/issues/17 -->
<tr>
<td><dfn id="index-gb18030-ranges">index gb18030 ranges</dfn>
<td><a href="index-gb18030-ranges.txt">index-gb18030-ranges.txt</a>
<td>This <a href="#index">index</a> works different from all others. Listing all
code points would result in over a million items whereas they can be
represented neatly in 207 ranges combined with trivial limit checks. It
therefore only superficially matches the GB18030 standard for code points
<td>This <a href="#index">index</a> works different from all others. Listing all code points would result
in over a million items whereas they can be represented neatly in 207 ranges combined with trivial
limit checks. It therefore only superficially matches the GB18030-2005 standard for code points
encoded as four bytes. See also <a href="#index-gb18030-ranges-code-point">index gb18030 ranges code point</a> and
<a href="#index-gb18030-ranges-pointer">index gb18030 ranges pointer</a> below.
<tr>
Expand Down Expand Up @@ -841,6 +840,9 @@ <h2 id="indexes"><span class="secno">6 </span>Indexes</h2>
<li><p>If <var>pointer</var> is greater than 39419 and less than
189000, or <var>pointer</var> is greater than 1237575, return null.

<li><p>If <var>pointer</var> is 7457, return code point U+E7C7.
<!-- 7457 is 0x81 0x35 0xF4 0x37 -->

<li><p>Let <var>offset</var> be the last pointer in
<a href="#index-gb18030-ranges">index gb18030 ranges</a> that is equal to or less than
<var>pointer</var> and let <var>code point offset</var> be its
Expand All @@ -854,6 +856,8 @@ <h2 id="indexes"><span class="secno">6 </span>Indexes</h2>
the return value of these steps:

<ol>
<li><p>If <var>code point</var> is U+E7C7, return pointer 7457.

<li><p>Let <var>offset</var> be the last code point in
<a href="#index-gb18030-ranges">index gb18030 ranges</a> that is equal to or less than
<var>code point</var> and let <var>pointer offset</var> be its
Expand Down
16 changes: 10 additions & 6 deletions Overview.src.html
Original file line number Diff line number Diff line change
Expand Up @@ -712,19 +712,18 @@ <h2>Indexes</h2>
<tr>
<td><dfn>index gb18030</dfn>
<td><a href=index-gb18030.txt>index-gb18030.txt</a>
<td>This matches the GB18030-2000 standard for code points encoded as two bytes, except
for 0xA3 0xA0 which maps to U+3000 to be compatible with deployed content.
<td>This matches the GB18030-2005 standard for code points encoded as two bytes, except for
0xA3 0xA0 which maps to U+3000 to be compatible with deployed content.
<!-- https://bugzilla.mozilla.org/show_bug.cgi?id=131837
https://bugs.webkit.org/show_bug.cgi?id=17014
https://www.w3.org/Bugs/Public/show_bug.cgi?id=25396
https://github.com/whatwg/encoding/issues/17 -->
<tr>
<td><dfn>index gb18030 ranges</dfn>
<td><a href=index-gb18030-ranges.txt>index-gb18030-ranges.txt</a>
<td>This <span>index</span> works different from all others. Listing all
code points would result in over a million items whereas they can be
represented neatly in 207 ranges combined with trivial limit checks. It
therefore only superficially matches the GB18030 standard for code points
<td>This <span>index</span> works different from all others. Listing all code points would result
in over a million items whereas they can be represented neatly in 207 ranges combined with trivial
limit checks. It therefore only superficially matches the GB18030-2005 standard for code points
encoded as four bytes. See also <span>index gb18030 ranges code point</span> and
<span>index gb18030 ranges pointer</span> below.
<tr>
Expand Down Expand Up @@ -755,6 +754,9 @@ <h2>Indexes</h2>
<li><p>If <var>pointer</var> is greater than 39419 and less than
189000, or <var>pointer</var> is greater than 1237575, return null.

<li><p>If <var>pointer</var> is 7457, return code point U+E7C7.
<!-- 7457 is 0x81 0x35 0xF4 0x37 -->

<li><p>Let <var>offset</var> be the last pointer in
<span>index gb18030 ranges</span> that is equal to or less than
<var>pointer</var> and let <var>code point offset</var> be its
Expand All @@ -768,6 +770,8 @@ <h2>Indexes</h2>
the return value of these steps:

<ol>
<li><p>If <var>code point</var> is U+E7C7, return pointer 7457.

<li><p>Let <var>offset</var> be the last code point in
<span>index gb18030 ranges</span> that is equal to or less than
<var>code point</var> and let <var>pointer offset</var> be its
Expand Down
2 changes: 1 addition & 1 deletion index-big5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 8dfc771062e7be0810919082c2c06baa2236147909e0ecc235b1cb9ad782ac82
# Date: 2015-08-19
# Date: 2016-01-20

942 0x43F0 䏰 (<CJK Ideograph Extension A>)
943 0x4C32 䰲 (<CJK Ideograph Extension A>)
Expand Down
2 changes: 1 addition & 1 deletion index-euc-kr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 1d97134cbf187263585bc8f593ca4196654ed4c7a673f5672eaad4f5d9fdc4ba
# Date: 2015-08-19
# Date: 2016-01-20

0 0xAC02 갂 (HANGUL SYLLABLE GAGG)
1 0xAC03 갃 (HANGUL SYLLABLE GAGS)
Expand Down
2 changes: 1 addition & 1 deletion index-gb18030-ranges.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f963aaa1653f630c523e7b04729fb4e4458f35806c45eb5c179445623138f0c0
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080
36 0x00A5
Expand Down
6 changes: 3 additions & 3 deletions index-gb18030.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@
# For details on index index-gb18030.txt see the Encoding Standard
# https://encoding.spec.whatwg.org/
#
# Identifier: 7cc86532fd6516482e5b36c2aab29dcce5d67ebdef05b291f0ff52736172934b
# Date: 2015-08-19
# Identifier: 715f084846f5c6fc9dd31046d0a4d604bd2d88bfe3a22833cea048415e413c70
# Date: 2016-01-20

0 0x4E02 丂 (<CJK Ideograph>)
1 0x4E04 丄 (<CJK Ideograph>)
Expand Down Expand Up @@ -7540,7 +7540,7 @@
7530 0x00FC ü (LATIN SMALL LETTER U WITH DIAERESIS)
7531 0x00EA ê (LATIN SMALL LETTER E WITH CIRCUMFLEX)
7532 0x0251 ɑ (LATIN SMALL LETTER ALPHA)
7533 0xE7C7  (<Private Use>)
7533 0x1E3F ḿ (LATIN SMALL LETTER M WITH ACUTE)
7534 0x0144 ń (LATIN SMALL LETTER N WITH ACUTE)
7535 0x0148 ň (LATIN SMALL LETTER N WITH CARON)
7536 0x01F9 ǹ (LATIN SMALL LETTER N WITH GRAVE)
Expand Down
2 changes: 1 addition & 1 deletion index-ibm866.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: db6fe14a559d1601a7667338d83704773d5708dbc641e1ad3c5e21405770f05e
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0410 А (CYRILLIC CAPITAL LETTER A)
1 0x0411 Б (CYRILLIC CAPITAL LETTER BE)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-10.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 02c2b5590d8ccda9931008c471f6ee2c590b2c8fe5e6ccb3b08638115d778507
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-13.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 40736338e964ab520407cebcb01329f8d450abf6ce12bf88b74b655b60e43300
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-14.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 2c8651cfc08b1f35b17919ee5379f2fa006af3ec809f11b3b7f470785580542b
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-15.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: a560aba47bccd7510a6ac77f671fe75dca3800f05cf6d676910c311a8f8ff079
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-16.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 55676320d2d1b6e6909f5b3d741a7cf0cefc84e920aa4474afc091459111c2e3
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-2.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 9569c67f22d0b57790e1c407c6eecf227e4562322dc296de43cdab7a0152ec73
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-3.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: af8f1e12df79b768322b5e83613698cdc619438270a2fc359554331c805054a3
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-4.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 72f29c92344d351fe9e74a946e7e0468d76d542c6894ff82982cb652ebe0feb7
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-5.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: fa9b1f3f5242df43e2e7bca80e9b6997c67944f20a4af91ee06bacc4e132d9c9
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-6.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 85bb7b5c2dc75975afebe5743935ba4ed5a09c1e9e34e9bfb2ff80293f5d8bbc
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-7.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f53d8aeba36314ef950eef02ffcf11dff540638ce27dfe7a86b6ccc6875afb24
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-iso-8859-8.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 7657a9ca3fa875990da960d3f812eea28dcd0ae6ed55a18d5394303c86f5484b
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0080 € (<control>)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-jis0208.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: cbaa91f3deb7d0841faf5c33041fc15a285da0e87e64ab802c4bf04b7c4da861
# Date: 2015-08-19
# Date: 2016-01-20

0 0x3000   (IDEOGRAPHIC SPACE)
1 0x3001 、 (IDEOGRAPHIC COMMA)
Expand Down
2 changes: 1 addition & 1 deletion index-jis0212.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 83bf90dd1c591a4355730d8c4567efc499d74da7490531019ef22a879991cfb7
# Date: 2015-08-19
# Date: 2016-01-20

108 0x02D8 ˘ (BREVE)
109 0x02C7 ˇ (CARON)
Expand Down
2 changes: 1 addition & 1 deletion index-koi8-r.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: c5497cd9071cb352c0e56b219154e539badf63de40b71578f09e2e11fe7d50ae
# Date: 2015-08-19
# Date: 2016-01-20

0 0x2500 ─ (BOX DRAWINGS LIGHT HORIZONTAL)
1 0x2502 │ (BOX DRAWINGS LIGHT VERTICAL)
Expand Down
2 changes: 1 addition & 1 deletion index-koi8-u.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 19a4da2c3f245118bbc8019326f45a07832949938ff903f03d62ac4da1f61f40
# Date: 2015-08-19
# Date: 2016-01-20

0 0x2500 ─ (BOX DRAWINGS LIGHT HORIZONTAL)
1 0x2502 │ (BOX DRAWINGS LIGHT VERTICAL)
Expand Down
2 changes: 1 addition & 1 deletion index-macintosh.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: f2c6a4f6406b3e86a50a5dba4d2b7dd48e2e33c0d82aefe764535c934ec11764
# Date: 2015-08-19
# Date: 2016-01-20

0 0x00C4 Ä (LATIN CAPITAL LETTER A WITH DIAERESIS)
1 0x00C5 Å (LATIN CAPITAL LETTER A WITH RING ABOVE)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1250.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 0669455a7a1c70ba6003ea737991e8ee9adc455125c13cfe6705a361358de5fa
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1251.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 7592ef921679ba168b00a9e9afa3b4eebd67bf13dc7e84c4b6e120de856826e0
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0402 Ђ (CYRILLIC CAPITAL LETTER DJE)
1 0x0403 Ѓ (CYRILLIC CAPITAL LETTER GJE)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1252.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: e56d49d9176e9a412283cf29ac9bd613f5620462f2a080a84eceaf974cfa18b7
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1253.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 49fdc881a3488904dd1e8dfba9aef3258454249958b611bcded1d4c981ab5561
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1254.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: e80a27adf377438be8ba5bd223875ea56d6a4d47f958cce1c957a2c446825caa
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1255.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 3b3ec872752f43c348a39b3fd2040202ccd95b935e56b2f92bb9e03e220ca02a
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1256.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 161bdb381f16408e8bebcc8f5310c4190af0e359de8d9bbaa3628ce2f0875509
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x067E پ (ARABIC LETTER PEH)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1257.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: cc7256bdd10a5b8dc7fb6f994659f307dfcae60def9aa6c29d811f85e2842c47
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-1258.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 198bacedfcf24390e219240a7b776b6cec34cff070330b08a601a69c67f7eb24
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-windows-874.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: b416583ce125e38474381b31b401a98b19ecf2e57e0998e78a1e18b14894905d
# Date: 2015-08-19
# Date: 2016-01-20

0 0x20AC € (EURO SIGN)
1 0x0081  (<control>)
Expand Down
2 changes: 1 addition & 1 deletion index-x-mac-cyrillic.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# https://encoding.spec.whatwg.org/
#
# Identifier: 73e8e7642c6fa9de29d42819b47fba55b58666fb1e339faeb4a89a0bd7c24d43
# Date: 2015-08-19
# Date: 2016-01-20

0 0x0410 А (CYRILLIC CAPITAL LETTER A)
1 0x0411 Б (CYRILLIC CAPITAL LETTER BE)
Expand Down
2 changes: 1 addition & 1 deletion indexes.json

Large diffs are not rendered by default.

0 comments on commit e7b9ce0

Please sign in to comment.