Please sign in to comment.
[e] (0) Adjust notes on encoding detection
Fixing https://www.w3.org/Bugs/Public/show_bug.cgi?id=25534 Affected topics: HTML Syntax and Parsing git-svn-id: http://svn.whatwg.org/webapps@8722 340c8d12-0b0e-0410-8428-c7bf67bfef74
- Loading branch information...
Showing with 36 additions and 14 deletions.
|data-x="concept-encoding-confidence">confidence</span> <i>tentative</i>, and abort these steps.|
|<p class="note">The UTF-8 encoding has a highly detectable bit pattern. Documents that contain|
|bytes with values greater than 0x7F which match the UTF-8 pattern are very likely to be UTF-8,|
|while documents with byte sequences that do not match it are very likely not. User-agents are|
|therefore encouraged to search for this common encoding. <ref spec=PPUTF8> <ref spec=UTF8DET></p>|
|<p class="note">User agents are generally discouraged from attempting to autodetect encodings|
|for resources obtained over the network, since doing so involves inherently non-interoperable|
|heuristics. Attempting to detect encodings based on an HTML document's preamble is especially|
|tricky since HTML markup typically uses only ASCII characters, and HTML documents tend to begin|
|with a lot of markup rather than with text content.</p>|
|<p class="note">The UTF-8 encoding has a highly detectable bit pattern. Files from the local|
|file system that contain bytes with values greater than 0x7F which match the UTF-8 pattern are|
|very likely to be UTF-8, while documents with byte sequences that do not match it are very|
|likely not. When a user agent can examine the whole file, rather than just the preamble,|
|detecting for UTF-8 specifically can be especially effective. <ref spec=PPUTF8> <ref|