Skip to content

Commit

Permalink
Introduce a replacement encoding as a trial balloon to fixing various…
Browse files Browse the repository at this point in the history
… issues.

Part of https://www.w3.org/Bugs/Public/show_bug.cgi?id=21057 (Also
commit some scripts.)
  • Loading branch information
annevk committed Feb 22, 2013
1 parent 6af03d0 commit 8329a2e
Show file tree
Hide file tree
Showing 6 changed files with 109 additions and 25 deletions.
1 change: 0 additions & 1 deletion .gitignore

This file was deleted.

53 changes: 36 additions & 17 deletions Overview.html
Expand Up @@ -7,7 +7,7 @@

<p><a class="logo" href="//www.whatwg.org/"><img alt="WHATWG" height="100" src="//resources.whatwg.org/logo-encoding.svg" width="100"></a></p>
<h1>Encoding</h1>
<h2 class="no-num no-toc" id="living-standard-—-last-updated-29-january-2013">Living Standard — Last Updated 29 January 2013</h2>
<h2 class="no-num no-toc" id="living-standard-—-last-updated-22-february-2013">Living Standard — Last Updated 22 February 2013</h2>

<dl>
<dt>This Version:
Expand Down Expand Up @@ -36,7 +36,7 @@ <h2 class="no-num no-toc" id="living-standard-—-last-updated-29-january-2013">
<p class="copyright"><a href="http://creativecommons.org/publicdomain/zero/1.0/" rel="license">CC0 1.0 Universal</a>.
To the extent possible under law, the editor has waived all copyright and
related or neighboring rights to this work. In addition, as of
29 January 2013, the editor has made this specification available
22 February 2013, the editor has made this specification available
under the
<a href="http://www.openwebfoundation.org/legal/the-owf-1-0-agreements/owfa-1-0" rel="license">Open Web Foundation Agreement Version 1.0</a>,
which is available at
Expand Down Expand Up @@ -84,10 +84,11 @@ <h2 class="no-num no-toc" id="table-of-contents">Table of Contents</h2>
<li><a href="#iso-2022-kr"><span class="secno">13.2 </span>iso-2022-kr</a></ol></li>
<li><a href="#legacy-miscellaneous-encodings"><span class="secno">14 </span>Legacy miscellaneous encodings</a>
<ol>
<li><a href="#common-infrastructure-for-utf-16be-and-utf-16le"><span class="secno">14.1 </span>Common infrastructure for <span>utf-16be</span> and <span>utf-16le</span></a></li>
<li><a href="#utf-16be"><span class="secno">14.2 </span>utf-16be</a></li>
<li><a href="#utf-16le"><span class="secno">14.3 </span>utf-16le</a></li>
<li><a href="#x-user-defined"><span class="secno">14.4 </span>x-user-defined</a></ol></li>
<li><a href="#replacement"><span class="secno">14.1 </span>replacement</a></li>
<li><a href="#common-infrastructure-for-utf-16be-and-utf-16le"><span class="secno">14.2 </span>Common infrastructure for <span>utf-16be</span> and <span>utf-16le</span></a></li>
<li><a href="#utf-16be"><span class="secno">14.3 </span>utf-16be</a></li>
<li><a href="#utf-16le"><span class="secno">14.4 </span>utf-16le</a></li>
<li><a href="#x-user-defined"><span class="secno">14.5 </span>x-user-defined</a></ol></li>
<li><a class="no-num" href="#references">References</a></li>
<li><a class="no-num" href="#acknowledgments">Acknowledgments</a></ol>
<!--end-toc-->
Expand Down Expand Up @@ -222,7 +223,7 @@ <h2 id="encodings"><span class="secno">4 </span>Encodings</h2>
<p>Authors must use the <a href="#utf-8">utf-8</a> <a href="#encoding">encoding</a> and must use the
"<code title="">utf-8</code>" <a href="#label">label</a> to identify it.

<p>New protocols and formats must use the <a href="#utf-8">utf-8</a> <a href="#encoding">encoding</a>
<p>New protocols and formats must use the <a href="#utf-8">utf-8</a> <a href="#encoding">encoding</a>
exclusively. If these protocols and formats need to expose the <a href="#encoding">encoding</a>'s
<a href="#label">label</a>, they must expose it as "<code title="">utf-8</code>".

Expand Down Expand Up @@ -549,6 +550,10 @@ <h2 id="encodings"><span class="secno">4 </span>Encodings</h2>
<tr><td>"<code title="">iso-2022-kr</code>"
<tbody>
<tr><th colspan="2"><a href="#legacy-miscellaneous-encodings">Legacy miscellaneous encodings</a>
<tr>
<td rowspan="2"><a href="#replacement">replacement</a>
<td>"<code title="">iso-2022-cn</code>"
<tr><td>"<code title="">iso-2022-cn-ext</code>"
<tr>
<td><a href="#utf-16be">utf-16be</a>
<td>"<code title="">utf-16be</code>"
Expand Down Expand Up @@ -839,7 +844,8 @@ <h3 id="interface-textdecoder"><span class="secno">7.1 </span>Interface <code ti
<dt><code><var title="">decoder</var> = new <a href="#dom-textdecoder" title="dom-TextDecoder">TextDecoder</a>([<var title="">label</var> = "utf-8" [, <var title="">options</var>]])</code>
<dd>
<p>Returns a new <a href="#textdecoder"><code>TextDecoder</code></a> object.
<p>If <var title="">label</var> is not a <a href="#label">label</a>,
<p>If <var title="">label</var> is either not a <a href="#label">label</a> or is a
<a href="#label">label</a> for <a href="#replacement">replacement</a>,
<a class="external" data-anolis-spec="webidl" href="http://dev.w3.org/2006/webapi/WebIDL/#dfn-throw" title="throw">throws</a> a
<code>TypeError</code>.

Expand Down Expand Up @@ -868,7 +874,7 @@ <h3 id="interface-textdecoder"><span class="secno">7.1 </span>Interface <code ti
<a href="#concept-encoding-get" title="concept-encoding-get">getting an encoding</a> from
<var title="">label</var>.

<li><p>If <var title="">encoding</var> is failure,
<li><p>If <var title="">encoding</var> is failure or <a href="#replacement">replacement</a>,
<a class="external" data-anolis-spec="webidl" href="http://dev.w3.org/2006/webapi/WebIDL/#dfn-throw">throw</a> a <code>TypeError</code>.

<li><p>Let <var title="">dec</var> be a new <a href="#textdecoder"><code>TextDecoder</code></a> object.
Expand Down Expand Up @@ -934,10 +940,10 @@ <h3 id="interface-textdecoder"><span class="secno">7.1 </span>Interface <code ti
<a href="#eof-byte">EOF byte</a> to the <b>stream</b>.

<li>
<p>Return the output of running <b>encoding</b>'s <a href="#decoder">decoder</a> on
the <b>stream</b>. If the <b>fatal flag</b> is set and
<b>encoding</b>'s <a href="#decoder">decoder</a> emits a <a href="#decoder-error">decoder error</a>,
<a class="external" data-anolis-spec="dom" href="http://dom.spec.whatwg.org/#concept-throw" title="concept-throw">throw</a> a
<p>Return the output of running <b>encoding</b>'s <a href="#decoder">decoder</a> on
the <b>stream</b>. If the <b>fatal flag</b> is set and
<b>encoding</b>'s <a href="#decoder">decoder</a> emits a <a href="#decoder-error">decoder error</a>,
<a class="external" data-anolis-spec="dom" href="http://dom.spec.whatwg.org/#concept-throw" title="concept-throw">throw</a> a
"<code title="">EncodingError</code>".

<p class="note">In addition to the reason given above with respect to the
Expand Down Expand Up @@ -2581,7 +2587,20 @@ <h3 id="iso-2022-kr"><span class="secno">13.2 </span><dfn>iso-2022-kr</dfn></h3>
<h2 id="legacy-miscellaneous-encodings"><span class="secno">14 </span>Legacy miscellaneous encodings</h2>


<h3 id="common-infrastructure-for-utf-16be-and-utf-16le"><span class="secno">14.1 </span>Common infrastructure for <a href="#utf-16be">utf-16be</a> and <a href="#utf-16le">utf-16le</a></h3>
<h3 id="replacement"><span class="secno">14.1 </span><dfn>replacement</dfn></h3>

<p class="note">The <a href="#replacement">replacement</a> <a href="#encoding">encoding</a> exists to prevent certain
attacks that abuse a mismatch between <a href="#encoding" title="encoding">encodings</a> supported on
the server and the client.

<p>The <dfn id="replacement-decoder">replacement decoder</dfn> (<a href="#decoder">decoder</a> for <a href="#replacement">replacement</a>)
is to emit a <a href="#decoder-error">decoder error</a> followed by an <a href="#eof-code-point">EOF code point</a>.

<p>The <dfn id="replacement-encoder">replacement encoder</dfn> (<a href="#encoder">encoder</a> for <a href="#replacement">replacement</a>)
is the <a href="#utf-8-encoder">utf-8 encoder</a>.


<h3 id="common-infrastructure-for-utf-16be-and-utf-16le"><span class="secno">14.2 </span>Common infrastructure for <a href="#utf-16be">utf-16be</a> and <a href="#utf-16le">utf-16le</a></h3>

<p class="note">In violation of the Unicode standard, checking for a
byte order mark happens before an encoding to decode a byte stream is
Expand Down Expand Up @@ -2698,7 +2717,7 @@ <h3 id="common-infrastructure-for-utf-16be-and-utf-16le"><span class="secno">14.
</ol>


<h3 id="utf-16be"><span class="secno">14.2 </span><dfn>utf-16be</dfn></h3>
<h3 id="utf-16be"><span class="secno">14.3 </span><dfn>utf-16be</dfn></h3>

<p>The <dfn id="utf-16be-decoder">utf-16be decoder</dfn> (<a href="#decoder">decoder</a> for <a href="#utf-16be">utf-16be</a>)
is the <a href="#utf-16-decoder">utf-16 decoder</a> with the <a href="#utf-16be-flag">utf-16be flag</a> set.
Expand All @@ -2707,7 +2726,7 @@ <h3 id="utf-16be"><span class="secno">14.2 </span><dfn>utf-16be</dfn></h3>
is the <a href="#utf-16-encoder">utf-16 encoder</a> with the <a href="#utf-16be-flag">utf-16be flag</a> set.


<h3 id="utf-16le"><span class="secno">14.3 </span><dfn>utf-16le</dfn></h3>
<h3 id="utf-16le"><span class="secno">14.4 </span><dfn>utf-16le</dfn></h3>

<p class="note">In violation of the Unicode standard,
"<code title="">utf-16</code>" is a <a href="#label">label</a> for
Expand All @@ -2720,7 +2739,7 @@ <h3 id="utf-16le"><span class="secno">14.3 </span><dfn>utf-16le</dfn></h3>
is the <a href="#utf-16-encoder">utf-16 encoder</a>.


<h3 id="x-user-defined"><span class="secno">14.4 </span><dfn>x-user-defined</dfn></h3>
<h3 id="x-user-defined"><span class="secno">14.5 </span><dfn>x-user-defined</dfn></h3>

<p class="note">While technically this is a <a href="#single-byte-encoding">single-byte encoding</a>,
it is defined separately as it can be implemented algorithmically.
Expand Down
32 changes: 25 additions & 7 deletions Overview.src.html
Expand Up @@ -183,7 +183,7 @@ <h2>Encodings</h2>
<p>Authors must use the <span>utf-8</span> <span>encoding</span> and must use the
"<code title>utf-8</code>" <span>label</span> to identify it.

<p>New protocols and formats must use the <span>utf-8</span> <span>encoding</span>
<p>New protocols and formats must use the <span>utf-8</span> <span>encoding</span>
exclusively. If these protocols and formats need to expose the <span>encoding</span>'s
<span>label</span>, they must expose it as "<code title>utf-8</code>".

Expand Down Expand Up @@ -510,6 +510,10 @@ <h2>Encodings</h2>
<tr><td>"<code title>iso-2022-kr</code>"
<tbody>
<tr><th colspan=2><a href=#legacy-miscellaneous-encodings>Legacy miscellaneous encodings</a>
<tr>
<td rowspan=2><span>replacement</span>
<td>"<code title>iso-2022-cn</code>"
<tr><td>"<code title>iso-2022-cn-ext</code>"
<tr>
<td><span>utf-16be</span>
<td>"<code title>utf-16be</code>"
Expand Down Expand Up @@ -800,7 +804,8 @@ <h3>Interface <code title>TextDecoder</code></h3>
<dt><code><var title>decoder</var> = new <span title=dom-TextDecoder>TextDecoder</span>([<var title>label</var> = "utf-8" [, <var title>options</var>]])</code>
<dd>
<p>Returns a new <code>TextDecoder</code> object.
<p>If <var title>label</var> is not a <span>label</span>,
<p>If <var title>label</var> is either not a <span>label</span> or is a
<span>label</span> for <span>replacement</span>,
<span data-anolis-spec=webidl title=throw>throws</span> a
<code>TypeError</code>.

Expand Down Expand Up @@ -829,7 +834,7 @@ <h3>Interface <code title>TextDecoder</code></h3>
<span title=concept-encoding-get>getting an encoding</span> from
<var title>label</var>.

<li><p>If <var title>encoding</var> is failure,
<li><p>If <var title>encoding</var> is failure or <span>replacement</span>,
<span data-anolis-spec=webidl>throw</span> a <code>TypeError</code>.

<li><p>Let <var title>dec</var> be a new <code>TextDecoder</code> object.
Expand Down Expand Up @@ -895,10 +900,10 @@ <h3>Interface <code title>TextDecoder</code></h3>
<span>EOF byte</span> to the <b>stream</b>.

<li>
<p>Return the output of running <b>encoding</b>'s <span>decoder</span> on
the <b>stream</b>. If the <b>fatal flag</b> is set and
<b>encoding</b>'s <span>decoder</span> emits a <span>decoder error</span>,
<span data-anolis-spec=dom title=concept-throw>throw</span> a
<p>Return the output of running <b>encoding</b>'s <span>decoder</span> on
the <b>stream</b>. If the <b>fatal flag</b> is set and
<b>encoding</b>'s <span>decoder</span> emits a <span>decoder error</span>,
<span data-anolis-spec=dom title=concept-throw>throw</span> a
"<code title>EncodingError</code>".

<p class=note>In addition to the reason given above with respect to the
Expand Down Expand Up @@ -2542,6 +2547,19 @@ <h3><dfn>iso-2022-kr</dfn></h3>
<h2>Legacy miscellaneous encodings</h2>


<h3><dfn>replacement</dfn></h3>

<p class=note>The <span>replacement</span> <span>encoding</span> exists to prevent certain
attacks that abuse a mismatch between <span title=encoding>encodings</span> supported on
the server and the client.

<p>The <dfn>replacement decoder</dfn> (<span>decoder</span> for <span>replacement</span>)
is to emit a <span>decoder error</span> followed by an <span>EOF code point</span>.

<p>The <dfn>replacement encoder</dfn> (<span>encoder</span> for <span>replacement</span>)
is the <span>utf-8 encoder</span>.


<h3>Common infrastructure for <span>utf-16be</span> and <span>utf-16le</span></h3>

<p class=note>In violation of the Unicode standard, checking for a
Expand Down
7 changes: 7 additions & 0 deletions encodings.json
Expand Up @@ -431,6 +431,13 @@
},
{
"encodings": [
{
"labels": [
"iso-2022-cn",
"iso-2022-cn-ext"
],
"name": "replacement"
},
{
"labels": [
"utf-16be"
Expand Down
7 changes: 7 additions & 0 deletions tools-clean.py
@@ -0,0 +1,7 @@
import json
filename = "encodings.json"
data = json.loads(open(filename, "r").read())

handle = open(filename, "w")
handle.write(json.dumps(data, sort_keys=True, allow_nan=False, indent=2, separators=(',', ': ')))
handle.write("\n")
34 changes: 34 additions & 0 deletions tools-label-table.py
@@ -0,0 +1,34 @@
import json

def get_data(filename):
return json.loads(open(filename, "r").read())

def create_table():
data = get_data("encodings.json")
table = ""
labelsseen = []
for set in data:
table += " <tbody>\n <tr><th colspan=2><a href=#" + set["heading"].lower().replace(" ", "-") + ">" + set["heading"] + "</a>\n"
for encoding in set["encodings"]:
rowspan = ""
label_len = len(encoding["labels"])
if label_len > 1:
rowspan = " rowspan=" + str(label_len)

table += " <tr>\n <td" + rowspan + "><span>" + encoding["name"] + "</span>"
i = 0
labels = encoding["labels"]
labels.sort()
for label in labels:
if label in labelsseen:
raise NameError("Duplicate label: " + label)
labelsseen.append(label)
if i > 0:
table += " <tr>"
else:
table += "\n "
table += "<td>\"<code title>" + label + "</code>\"\n"
i += 1
print table

create_table()

0 comments on commit 8329a2e

Please sign in to comment.