From 115d5ab5a340948ff3dcbb216b720c9aa6617fed Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Sun, 19 Mar 2017 09:53:20 +0100 Subject: [PATCH 1/2] Editorial: enable indexes and fix a couple minor nits --- encoding.bs | 108 ++++++++++++++++++++++++++-------------------------- 1 file changed, 54 insertions(+), 54 deletions(-) diff --git a/encoding.bs b/encoding.bs index 7ee2459..d1c5ae8 100644 --- a/encoding.bs +++ b/encoding.bs @@ -7,7 +7,7 @@ Status: LS No Editor: true Abstract: The Encoding Standard defines encodings and their JavaScript API. Logo: https://resources.whatwg.org/logo-encoding.svg -Boilerplate: omit feedback-header, omit conformance, omit index, omit idl-index +Boilerplate: omit feedback-header, omit conformance, omit idl-index Markup Shorthands: css off !Participate: GitHub whatwg/encoding (file an issue, open issues) !Participate: IRC: #whatwg on Freenode @@ -124,7 +124,7 @@ or code point. the first token in the stream must be returned and subsequently removed, and end-of-stream must be returned otherwise. + SimonSapin thinks this is fine, blame him if not -->

When one or more tokens are prepended to a @@ -742,11 +742,11 @@ specification, excluding index single-byte, which have their own table: This is the JIS X 0212 standard. It is only used by the EUC-JP decoder due to lack of widespread support elsewhere. @@ -1526,7 +1526,7 @@ unique index. distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and -"iso-8859-6-i" as well, that is no longer true. +"ISO-8859-6-i" as well, that is no longer true.

single-byte decoder

@@ -1982,17 +1982,17 @@ consumers of content generated with GBK's encoder.

ISO-2022-JP

ISO-2022-JP decoder

ISO-2022-JP's decoder has an associated ISO-2022-JP decoder state (initially -ASCII), +ASCII), ISO-2022-JP decoder output state (initially -ASCII), +ASCII), ISO-2022-JP lead (initially 0x00), and ISO-2022-JP output flag (initially unset). @@ -2001,14 +2001,14 @@ consumers of content generated with GBK's encoder. ISO-2022-JP decoder state:

-
ASCII +
ASCII

Based on byte:

0x1B

Set ISO-2022-JP decoder state to - escape start and return + escape start and return continue.

0x00 to 0x7F, excluding 0x0E, 0x0F, and 0x1B @@ -2022,14 +2022,14 @@ consumers of content generated with GBK's encoder.

Unset the ISO-2022-JP output flag and return error.

-
Roman +
Roman

Based on byte:

0x1B

Set ISO-2022-JP decoder state to - escape start and return + escape start and return continue.

0x5C @@ -2049,19 +2049,19 @@ consumers of content generated with GBK's encoder.

Unset the ISO-2022-JP output flag and return error.

-
Katakana +
Katakana

Based on byte:

0x1B

Set ISO-2022-JP decoder state to - escape start and return + escape start and return continue.

0x21 to 0x5F

Unset the ISO-2022-JP output flag and return a code point whose value is 0xFF61 − 0x21 + byte. - +

end-of-stream

Return finished. @@ -2070,20 +2070,20 @@ consumers of content generated with GBK's encoder.

Unset the ISO-2022-JP output flag and return error.

-
Lead byte +
Lead byte

Based on byte:

0x1B

Set ISO-2022-JP decoder state to - escape start and return + escape start and return continue.

0x21 to 0x7E

Unset the ISO-2022-JP output flag, set ISO-2022-JP lead to byte, ISO-2022-JP decoder state to - trail byte, and return + trail byte, and return continue.

end-of-stream @@ -2093,21 +2093,21 @@ consumers of content generated with GBK's encoder.

Unset the ISO-2022-JP output flag and return error.

-
Trail byte +
Trail byte

Based on byte:

0x1B

Set ISO-2022-JP decoder state to - escape start and return + escape start and return error. - +

0x21 to 0x7E
  1. Set the ISO-2022-JP decoder state to - lead byte. + lead byte.

  2. Let pointer be (ISO-2022-JP lead − 0x21) × 94 + byte − 0x21. @@ -2122,24 +2122,24 @@ consumers of content generated with GBK's encoder.

    end-of-stream

    Set the ISO-2022-JP decoder state to - lead byte, + lead byte, prepend byte to stream, and return error.

    Otherwise

    Set ISO-2022-JP decoder state to - lead byte and return + lead byte and return error. - +

-
Escape start +
Escape start
  1. If byte is either 0x24 or 0x28, set ISO-2022-JP lead to byte, ISO-2022-JP decoder state to - escape, and return + escape, and return continue.

  2. Prepend byte to @@ -2150,7 +2150,7 @@ consumers of content generated with GBK's encoder. ISO-2022-JP decoder output state, and return error.

-
Escape +
Escape
  1. Let lead be ISO-2022-JP lead and set @@ -2159,17 +2159,17 @@ consumers of content generated with GBK's encoder.

  2. Let state be null.

  3. If lead is 0x28 and byte is 0x42, set - state to ASCII. + state to ASCII.

  4. If lead is 0x28 and byte is 0x4A, set - state to Roman. + state to Roman.

  5. If lead is 0x28 and byte is 0x49, set - state to Katakana. + state to Katakana.

  6. If lead is 0x24 and byte is either 0x40 or 0x42, set state to - lead byte. + lead byte.

  7. If state is non-null, run these substeps: @@ -2199,10 +2199,10 @@ consumers of content generated with GBK's encoder.

    ISO-2022-JP encoder

    ISO-2022-JP's encoder has an associated -ISO-2022-JP encoder state which is ASCII, -Roman, or -jis0208 (initially -ASCII). +ISO-2022-JP encoder state which is ASCII, +Roman, or +jis0208 (initially +ASCII).

    ISO-2022-JP's encoder's handler, given a stream and code point, runs these steps: @@ -2210,32 +2210,32 @@ consumers of content generated with GBK's encoder.

    1. If code point is end-of-stream and ISO-2022-JP encoder state is not - ASCII, + ASCII, prepend code point to stream, set ISO-2022-JP encoder state to - ASCII, and return three bytes + ASCII, and return three bytes 0x1B 0x28 0x42.

    2. If code point is end-of-stream and ISO-2022-JP encoder state is - ASCII, return finished. + ASCII, return finished.

    3. If ISO-2022-JP encoder state is - ASCII or - Roman, and code point is U+000E, U+000F, + ASCII or + Roman, and code point is U+000E, U+000F, or U+001B, return error with U+FFFD. -

      This returns U+FFFD rather than the code point to prevent attacks. +

      This returns U+FFFD rather than code point to prevent attacks.

    4. If ISO-2022-JP encoder state is - ASCII and code point is an + ASCII and code point is an ASCII code point, return a byte whose value is code point.

    5. If ISO-2022-JP encoder state is - Roman and code point is an + Roman and code point is an ASCII code point, excluding U+005C and U+007E, or is U+00A5 or U+203E, run these substeps: @@ -2250,18 +2250,18 @@ consumers of content generated with GBK's encoder.

    6. If code point is an ASCII code point, and ISO-2022-JP encoder state is not - ASCII, + ASCII, prepend code point to stream, set ISO-2022-JP encoder state to - ASCII, and return three bytes + ASCII, and return three bytes 0x1B 0x28 0x42.

    7. If code point is either U+00A5 or U+203E, and ISO-2022-JP encoder state is not - Roman, + Roman, prepend code point to stream, set ISO-2022-JP encoder state to - Roman, and return three bytes + Roman, and return three bytes 0x1B 0x28 0x4A.

    8. If code point is U+2212, set it to U+FF0D. @@ -2277,10 +2277,10 @@ consumers of content generated with GBK's encoder. code point.

    9. If ISO-2022-JP encoder state is not - jis0208, + jis0208, prepend code point to stream, set ISO-2022-JP encoder state to - jis0208, and return three bytes + jis0208, and return three bytes 0x1B 0x24 0x42.

    10. Let lead be floor(pointer / 94) + 0x21. @@ -2353,7 +2353,7 @@ consumers of content generated with GBK's encoder.

    11. If byte is in the range 0xA1 to 0xDF, inclusive, return a code point whose value is 0xFF61 − 0xA1 + byte. - +

    12. If byte is in the range 0x81 to 0x9F, inclusive, or 0xE0 to 0xFC, inclusive, set Shift_JIS lead to byte and return @@ -2586,7 +2586,7 @@ and byte, runs these steps:

    13. Prepend the bytes to stream and return error. - +

  8. If code unit is in the range U+D800 to U+DBFF, inclusive, set @@ -2595,7 +2595,7 @@ and byte, runs these steps:

  9. If code unit is in the range U+DC00 to U+DFFF, inclusive, return error. - +

  10. Return code point code unit.

From ec3829c86da0bfaff8a1e88ceb75a42606447763 Mon Sep 17 00:00:00 2001 From: Anne van Kesteren Date: Sun, 19 Mar 2017 10:07:42 +0100 Subject: [PATCH 2/2] nit --- encoding.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/encoding.bs b/encoding.bs index d1c5ae8..4535f3d 100644 --- a/encoding.bs +++ b/encoding.bs @@ -1526,7 +1526,7 @@ unique index. distinct encoding names, because ISO-8859-8 has influence on the layout direction. And although historically this might have been the case for ISO-8859-6 and -"ISO-8859-6-i" as well, that is no longer true. +"ISO-8859-6-I" as well, that is no longer true.

single-byte decoder