Skip to content
Permalink
Browse files

[giow] (3) Make a BOM override HTTP headers.

Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=17810
Affected topics: HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@7360 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information...
Hixie committed Sep 16, 2012
1 parent 50bbda4 commit 947be85f5d985de120a58c7832bf428cdf36e222
Showing with 93 additions and 45 deletions.
  1. +30 −14 complete.html
  2. +30 −14 index
  3. +33 −17 source

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>The user agent may wait for more bytes of the resource to be

</li>

<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>
<li>

<!-- Doing this step before honouring HTTP is important for supporting
http://kb.dsqq.cn/html/2012-09/16/node_193.htm
which is encoded as UTF-8 but is incorrectly labeled as
Content-Type: text/html; charset=GB2312
-->

<p>For each of the rows in the following table, starting with the
first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table><thead><tr><th>Bytes in Hexadecimal
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</li>
(BOMs).</p>

<p class=note>That this step happens before the next one
honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
<a href=#willful-violation>willful violation</a> of the HTTP specification,
motivated by a desire to be maximally compatible with legacy
content. <a href=#refsHTTP>[HTTP]</a></p>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
determine its encoding">prescan the byte stream to determine its
<p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
encoding">prescan the byte stream to determine its
encoding</a>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
44 index

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>The user agent may wait for more bytes of the resource to be

</li>

<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>
<li>

<!-- Doing this step before honouring HTTP is important for supporting
http://kb.dsqq.cn/html/2012-09/16/node_193.htm
which is encoded as UTF-8 but is incorrectly labeled as
Content-Type: text/html; charset=GB2312
-->

<p>For each of the rows in the following table, starting with the
first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table><thead><tr><th>Bytes in Hexadecimal
<td>UTF-EBCDIC
-->
</table><p class=note>This step looks for Unicode Byte Order Marks
(BOMs).</li>
(BOMs).</p>

<p class=note>That this step happens before the next one
honoring the HTTP <code><a href=#content-type>Content-Type</a></code> header is a
<a href=#willful-violation>willful violation</a> of the HTTP specification,
motivated by a desire to be maximally compatible with legacy
content. <a href=#refsHTTP>[HTTP]</a></p>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <a href=#concept-encoding-confidence title=concept-encoding-confidence>confidence</a>
<i>certain</i>, and abort these steps.</li>

<li>

<p>Otherwise, optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to
determine its encoding">prescan the byte stream to determine its
<p>Optionally <a href=#prescan-a-byte-stream-to-determine-its-encoding title="prescan a byte stream to determine its
encoding">prescan the byte stream to determine its
encoding</a>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first
50 source

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps.</p></li>

<li>

<p>The user agent may wait for more bytes of the resource to be

</li>

<li><p>For each of the rows in the following table, starting with
the first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps:</p>
<li>

<!-- Doing this step before honouring HTTP is important for supporting
http://kb.dsqq.cn/html/2012-09/16/node_193.htm
which is encoded as UTF-8 but is incorrectly labeled as
Content-Type: text/html; charset=GB2312
-->

<p>For each of the rows in the following table, starting with the
first one and going down, if there are as many or more bytes
available than the number of bytes in the first column, and the
first bytes of the file match the bytes given in the first column,
then return the encoding given in the cell in the second column of
that row, with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps:</p>

<!-- this table is present in several forms in this file; keep them in sync -->
<table>
-->
</table>

<p class="note">This step looks for Unicode Byte Order Marks
(BOMs).</p></li>
<p class="note">This step looks for Unicode Byte Order Marks
(BOMs).</p>

<p class="note">That this step happens before the next one
honoring the HTTP <code>Content-Type</code> header is a
<span>willful violation</span> of the HTTP specification,
motivated by a desire to be maximally compatible with legacy
content. <a href="#refsHTTP">[HTTP]</a></p>

</li>

<li><p>If the transport layer specifies an encoding, and it is
supported, return that encoding with the <span
title="concept-encoding-confidence">confidence</span>
<i>certain</i>, and abort these steps.</p></li>

<li>

<p>Otherwise, optionally <span title="prescan a byte stream to
determine its encoding">prescan the byte stream to determine its
<p>Optionally <span title="prescan a byte stream to determine its
encoding">prescan the byte stream to determine its
encoding</span>. The <var title="">end condition</var> is that the
user agent decides that scanning further bytes would not be
efficient. User agents are encouraged to only prescan the first

0 comments on commit 947be85

Please sign in to comment.
You can’t perform that action at this time.