Skip to content
Permalink
Browse files

[e] (0) Adjust notes on encoding detection

Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=25534
Affected topics: HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@8722 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information...
Hixie committed Aug 27, 2014
1 parent 3c56312 commit 71ae100dd48e9ae072c903727d8674c4174d95a8
Showing with 36 additions and 14 deletions.
  1. +12 −5 complete.html
  2. +12 −5 index
  3. +12 −4 source
</style><link rel=stylesheet href=status.css><body onload=init()>
<header id=head class="head with-buttons">
<p><a href=//www.whatwg.org/ class=logo><img src=/images/logo width=101 alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>26 August 2014</span></h2></hgroup>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>27 August 2014</span></h2></hgroup>

<nav>
<div>
encoding, then return that encoding, with the <a href=#concept-encoding-confidence id=determining-the-character-encoding:concept-encoding-confidence-8>confidence</a> <i>tentative</i>, and abort these steps.
<a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very likely not. User-agents are
therefore encouraged to search for this common encoding. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
<p class=note>User agents are generally discouraged from attempting to autodetect encodings
for resources obtained over the network, since doing so involves inherently non-interoperable
heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
with a lot of markup rather than with text content.</p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Files from the local
file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
very likely to be UTF-8, while documents with byte sequences that do not match it are very
likely not. When a user agent can examine the whole file, rather than just the preamble,
detecting for UTF-8 specifically can be especially effective. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>

<li>

17 index
</style><link rel=stylesheet href=status.css><body onload=init()>
<header id=head class="head with-buttons">
<p><a href=//www.whatwg.org/ class=logo><img src=/images/logo width=101 alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>26 August 2014</span></h2></hgroup>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>27 August 2014</span></h2></hgroup>

<nav>
<div>
encoding, then return that encoding, with the <a href=#concept-encoding-confidence id=determining-the-character-encoding:concept-encoding-confidence-8>confidence</a> <i>tentative</i>, and abort these steps.
<a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very likely not. User-agents are
therefore encouraged to search for this common encoding. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
<p class=note>User agents are generally discouraged from attempting to autodetect encodings
for resources obtained over the network, since doing so involves inherently non-interoperable
heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
with a lot of markup rather than with text content.</p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Files from the local
file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
very likely to be UTF-8, while documents with byte sequences that do not match it are very
likely not. When a user agent can examine the whole file, rather than just the preamble,
detecting for UTF-8 specifically can be especially effective. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>

<li>

16 source
data-x="concept-encoding-confidence">confidence</span> <i>tentative</i>, and abort these steps.
<ref spec=UNIVCHARDET></p>

<p class="note">The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very likely not. User-agents are
therefore encouraged to search for this common encoding. <ref spec=PPUTF8> <ref spec=UTF8DET></p>
<p class="note">User agents are generally discouraged from attempting to autodetect encodings
for resources obtained over the network, since doing so involves inherently non-interoperable
heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
with a lot of markup rather than with text content.</p>

<p class="note">The UTF-8 encoding has a highly detectable bit pattern. Files from the local
file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
very likely to be UTF-8, while documents with byte sequences that do not match it are very
likely not. When a user agent can examine the whole file, rather than just the preamble,
detecting for UTF-8 specifically can be especially effective. <ref spec=PPUTF8> <ref
spec=UTF8DET></p>

</li>

0 comments on commit 71ae100

Please sign in to comment.
You can’t perform that action at this time.