Skip to content

Commit

Permalink
[e] (0) Adjust notes on encoding detection
Browse files Browse the repository at this point in the history
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=25534
Affected topics: HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@8722 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Aug 27, 2014
1 parent 3c56312 commit 71ae100
Show file tree
Hide file tree
Showing 3 changed files with 36 additions and 14 deletions.
17 changes: 12 additions & 5 deletions complete.html
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@
</style><link rel=stylesheet href=status.css><body onload=init()>
<header id=head class="head with-buttons">
<p><a href=//www.whatwg.org/ class=logo><img src=/images/logo width=101 alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>26 August 2014</span></h2></hgroup>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>27 August 2014</span></h2></hgroup>

<nav>
<div>
Expand Down Expand Up @@ -71079,10 +71079,17 @@ <h5 id=determining-the-character-encoding>12.2.2.2 Determining the character enc
encoding, then return that encoding, with the <a href=#concept-encoding-confidence id=determining-the-character-encoding:concept-encoding-confidence-8>confidence</a> <i>tentative</i>, and abort these steps.
<a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very likely not. User-agents are
therefore encouraged to search for this common encoding. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
<p class=note>User agents are generally discouraged from attempting to autodetect encodings
for resources obtained over the network, since doing so involves inherently non-interoperable
heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
with a lot of markup rather than with text content.</p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Files from the local
file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
very likely to be UTF-8, while documents with byte sequences that do not match it are very
likely not. When a user agent can examine the whole file, rather than just the preamble,
detecting for UTF-8 specifically can be especially effective. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>

<li>

Expand Down
17 changes: 12 additions & 5 deletions index
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,7 @@
</style><link rel=stylesheet href=status.css><body onload=init()>
<header id=head class="head with-buttons">
<p><a href=//www.whatwg.org/ class=logo><img src=/images/logo width=101 alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>26 August 2014</span></h2></hgroup>
<hgroup><h1 class=allcaps>HTML</h1><h2 id=living-standard-—-last-updated-[date:-01-jan-1901] class="no-num no-toc">Living Standard — Last Updated <span class=pubdate>27 August 2014</span></h2></hgroup>

<nav>
<div>
Expand Down Expand Up @@ -71079,10 +71079,17 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit
encoding, then return that encoding, with the <a href=#concept-encoding-confidence id=determining-the-character-encoding:concept-encoding-confidence-8>confidence</a> <i>tentative</i>, and abort these steps.
<a href=#refsUNIVCHARDET>[UNIVCHARDET]</a></p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very likely not. User-agents are
therefore encouraged to search for this common encoding. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>
<p class=note>User agents are generally discouraged from attempting to autodetect encodings
for resources obtained over the network, since doing so involves inherently non-interoperable
heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
with a lot of markup rather than with text content.</p>

<p class=note>The UTF-8 encoding has a highly detectable bit pattern. Files from the local
file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
very likely to be UTF-8, while documents with byte sequences that do not match it are very
likely not. When a user agent can examine the whole file, rather than just the preamble,
detecting for UTF-8 specifically can be especially effective. <a href=#refsPPUTF8>[PPUTF8]</a> <a href=#refsUTF8DET>[UTF8DET]</a></p>

<li>

Expand Down
16 changes: 12 additions & 4 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -95700,10 +95700,18 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
data-x="concept-encoding-confidence">confidence</span> <i>tentative</i>, and abort these steps.
<ref spec=UNIVCHARDET></p>

<p class="note">The UTF-8 encoding has a highly detectable bit pattern. Documents that contain
bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,
while documents with byte sequences that do not match it are very likely not. User-agents are
therefore encouraged to search for this common encoding. <ref spec=PPUTF8> <ref spec=UTF8DET></p>
<p class="note">User agents are generally discouraged from attempting to autodetect encodings
for resources obtained over the network, since doing so involves inherently non-interoperable
heuristics. Attempting to detect encodings based on an HTML document's preamble is especially
tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin
with a lot of markup rather than with text content.</p>

<p class="note">The UTF-8 encoding has a highly detectable bit pattern. Files from the local
file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are
very likely to be UTF-8, while documents with byte sequences that do not match it are very
likely not. When a user agent can examine the whole file, rather than just the preamble,
detecting for UTF-8 specifically can be especially effective. <ref spec=PPUTF8> <ref
spec=UTF8DET></p>

</li>

Expand Down

0 comments on commit 71ae100

Please sign in to comment.