Skip to content

Commit

Permalink
[] (0) start to a definition of content-type sniffing (very WIP). add…
Browse files Browse the repository at this point in the history
… another example of what could end up not affecting the browsing context.

git-svn-id: http://svn.whatwg.org/webapps@503 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Jan 26, 2007
1 parent f2b895a commit 1bec77f
Show file tree
Hide file tree
Showing 2 changed files with 158 additions and 38 deletions.
128 changes: 96 additions & 32 deletions index
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@

<h1 id=web-applications>Web Applications 1.0</h1>

<h2 class="no-num no-toc" id=working>Working Draft &mdash; 25 January 2007</h2>
<h2 class="no-num no-toc" id=working>Working Draft &mdash; 26 January 2007</h2>

<p>You can take part in this work. <a
href="http://www.whatwg.org/mailing-list">Join the working group's
Expand Down Expand Up @@ -825,8 +825,8 @@
across documents</a>
<ul class=toc>
<li><a href="#content-type-sniffing"><span class=secno>4.1.1.
</span><dfn id=determining1 title="Content-Type sniffing">Determining
the type of a new resource in a browsing context</dfn></a>
</span>Determining the type of a new resource in a browsing
context</a>

<li><a href="#read-html"><span class=secno>4.1.2. </span><dfn
id=page-load title=navigate-html>Page load processing model for HTML
Expand Down Expand Up @@ -5422,7 +5422,7 @@ data:text/xml,<script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[ alert('test
not both).

<h5 id=determining><span class=secno>3.3.3.4. </span><dfn
id=determining2>Determining if a particular element contains block-level
id=determining1>Determining if a particular element contains block-level
elements or inline-level content</dfn></h5>

<p>Some elements are defined to have content models that allow either <a
Expand Down Expand Up @@ -5550,7 +5550,7 @@ data:text/xml,<script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[ alert('test
<code><a href="#footer0">footer</a></code>, <code><a
href="#li0">li</a></code>, and <code><a href="#dd0">dd</a></code> elements
represent paragraphs with various specific semantics when they are <a
href="#determining2" title="Determining if a particular element contains
href="#determining1" title="Determining if a particular element contains
block-level elements or inline-level content">used as inline-level content
containers</a>, the <code><a href="#figure0">figure</a></code> element
represents a paragraph in the form of <a href="#embedded0">embedded
Expand Down Expand Up @@ -6012,7 +6012,7 @@ data:text/xml,<script xmlns="http://www.w3.org/1999/xhtml"><![CDATA[ alert('test

<dd>Meaning any element described as being a <a href="#block-level1"
title="block-level elements">block-level element</a>, but only when
that element is actually <a href="#determining2" title="Determining if
that element is actually <a href="#determining1" title="Determining if
a particular element contains block-level elements or inline-level
content">being used</a> as a block-level element, and not, say, as a
structured inline-level element.
Expand Down Expand Up @@ -6672,7 +6672,7 @@ class="main"> or <div class="content">. Why do we also need a body?

<p>User agents must not consider the <code title=attr-link-type><a
href="#type">type</a></code> attribute authoritative &mdash; upon fetching
the resource, user agents must only use the <a href="#content-type1"
the resource, user agents must only use the <a href="#content-type4"
title=Content-Type>Content-Type information associated with the
resource</a> to determine its type, not metadata included in the link to
the resource.
Expand Down Expand Up @@ -7230,7 +7230,7 @@ class="main"> or <div class="content">. Why do we also need a body?
a page that links to other pages or to parts within the page: a section
with navigation links.

<p>When <a href="#determining2" title="Determining if a particular element
<p>When <a href="#determining1" title="Determining if a particular element
contains block-level elements or inline-level content">used as an
inline-level content</a> container, the element represents a <a
href="#paragraph">paragraph</a>.
Expand Down Expand Up @@ -7423,7 +7423,7 @@ XXX attributes to give the date authored, date published
and which could be considered separate from that content. Such sections
are often represented as sidebars in printed typography.

<p>When <a href="#determining2" title="Determining if a particular element
<p>When <a href="#determining1" title="Determining if a particular element
contains block-level elements or inline-level content">used as an
inline-level content</a> container, the element represents a <a
href="#paragraph">paragraph</a>.
Expand Down Expand Up @@ -7635,7 +7635,7 @@ XXX attributes to give the date authored, date published
elements (such as <code><a href="#section0">section</a></code>), as
descendants.

<p>When <a href="#determining2" title="Determining if a particular element
<p>When <a href="#determining1" title="Determining if a particular element
contains block-level elements or inline-level content">used as an
inline-level content</a> container, the element represents a <a
href="#paragraph">paragraph</a>.
Expand Down Expand Up @@ -8659,7 +8659,7 @@ Address: &lt;input name="address"&gt;&lt;/p&gt;</pre>

<dd>When the element is a child of an <code><a href="#ol0">ol</a></code>
or <code><a href="#ul0">ul</a></code> element and the grandchild of an
element that is <a href="#determining2" title="Determining if a
element that is <a href="#determining1" title="Determining if a
particular element contains block-level elements or inline-level
content">being used as an inline-level content container</a>, or, when
the element is a child of a <code><a href="#menu0">menu</a></code>
Expand Down Expand Up @@ -8702,7 +8702,7 @@ Address: &lt;input name="address"&gt;&lt;/p&gt;</pre>
or <code><a href="#ul0">ul</a></code> element, the content model of the
item depends on the way that parent element was used. If it was used as
structured inline content (i.e. if <em>that</em> element's parent was <a
href="#determining2" title="Determining if a particular element contains
href="#determining1" title="Determining if a particular element contains
block-level elements or inline-level content">used as an inline-level
content</a> container), then the <code><a href="#li0">li</a></code>
element must only contain <a href="#inline-level1">inline-level
Expand All @@ -8722,7 +8722,7 @@ Address: &lt;input name="address"&gt;&lt;/p&gt;</pre>
title="inline-level content">inline content</a> or <a
href="#block-level1">block-level elements</a>.

<p>When <a href="#determining2" title="Determining if a particular element
<p>When <a href="#determining1" title="Determining if a particular element
contains block-level elements or inline-level content">used as an
inline-level content</a> container, the list item represents a single <a
href="#paragraph">paragraph</a>.
Expand Down Expand Up @@ -8914,7 +8914,7 @@ Address: &lt;input name="address"&gt;&lt;/p&gt;</pre>
<dt>Content model:

<dd>When the element is a child of a <code><a href="#dl0">dl</a></code>
element and the grandchild of an element that is <a href="#determining2"
element and the grandchild of an element that is <a href="#determining1"
title="Determining if a particular element contains block-level elements
or inline-level content">being used as an inline-level content
container</a>: <a href="#inline-level1">inline-level content</a>.
Expand Down Expand Up @@ -8947,7 +8947,7 @@ Address: &lt;input name="address"&gt;&lt;/p&gt;</pre>
depends on the way its parent element is being used. If the parent element
is a <code><a href="#dl0">dl</a></code> element that is being used as
structured inline content (i.e. if the <code><a href="#dl0">dl</a></code>
element's parent element is being <a href="#determining2"
element's parent element is being <a href="#determining1"
title="Determining if a particular element contains block-level elements
or inline-level content">used as an inline-level content</a> container),
then the <code><a href="#dd0">dd</a></code> element must only contain <a
Expand Down Expand Up @@ -11189,7 +11189,7 @@ brighter. A &lt;b>rat&lt;/b> scurries past the corner wall.&lt;/p></pre>
href="#img0">img</a></code> element.

<p>The remote server's response metadata (e.g. an HTTP 404 status code, or
<a href="#content-type1" title=Content-Type>associated Content-Type
<a href="#content-type4" title=Content-Type>associated Content-Type
headers</a>) must be ignored when determining whether the resource
obtained is a valid image or not.

Expand Down Expand Up @@ -11465,7 +11465,7 @@ brighter. A &lt;b>rat&lt;/b> scurries past the corner wall.&lt;/p></pre>
title=attr-embed-type><a href="#type4">type</a></code> attribute is the
<span>content's type</span>.

<li>Otherwise, if the specified resource has <a href="#content-type1"
<li>Otherwise, if the specified resource has <a href="#content-type4"
title=Content-Type>explicit Content-Type metadata</a>, then that is the
<span>content's type</span>.

Expand Down Expand Up @@ -11638,10 +11638,10 @@ brighter. A &lt;b>rat&lt;/b> scurries past the corner wall.&lt;/p></pre>
<p>Determine the <em>resource type</em>, as follows:</p>

<dl class=switch>
<dt>If the resource has <a href="#content-type1"
<dt>If the resource has <a href="#content-type4"
title=Content-Type>associated Content-Type metadata</a>

<dd>The type is the type specified in <a href="#content-type1"
<dd>The type is the type specified in <a href="#content-type4"
title=Content-Type>the resource's Content-Type metadata</a>.

<dt>Otherwise, if the <code title=attr-object-type><a
Expand Down Expand Up @@ -18374,14 +18374,16 @@ XXX selection ranges -->
following:</p>

<ul class=brief>
<li>HTTP status codes (e.g. 204 No Content or 205 Reset Content)

<li>HTTP Content-Disposition headers

<li>Network errors
</ul>

<li>
<p>Let <var title="">type</var> be <a href="#determining3"
title="Content-Type sniffing">the sniffed type of the resource</a>.
<p>Let <var title="">type</var> be <a href="#sniffed" title="Content-Type
sniffing">the sniffed type of the resource</a>.

<li>
<p>If <var title="">type</var> is one of the following types, jump to the
Expand Down Expand Up @@ -18453,17 +18455,79 @@ XXX selection ranges -->
to be replaced by the new one).
</ol>

<h4 id=content-type-sniffing><span class=secno>4.1.1. </span><dfn
id=determining3 title="Content-Type sniffing">Determining the type of a
new resource in a browsing context</dfn></h4>
<h4 id=content-type-sniffing><span class=secno>4.1.1. </span>Determining
the type of a new resource in a browsing context</h4>

<p class=big-issue>...</p>
<p class=warning>It is imperative that the rules in this section be
followed exactly. When two user agents use different heuristics for
content type detection, security problems can occur. For example, if a
server believes a contributed file to be an image (and thus benign), but a
Web browser believes the content to be HTML (and thus capable of executing
script), the end user can be exposed to malicious content, making the user
vulnerable to cookie theft attacks and other cross-site scripting attacks.

<p>The <dfn id=sniffed title="Content-Type sniffing">sniffed type of a
resource</dfn> must be found as follows:

<ol>
<li>
<p>Let <var title="">official type</var> be the type given by the
<span>Content-Type metadata for the resource (in
lowercase<!-- XXX ASCII case folding -->, ignoring any parameters). If
there is no such type, jump to the <em title="content-type sniffing:
unknown type"><a href="#content-type1">unknown type</a></em> step
below.</span>

<li>
<p>If the <var title="">official type</var> ends in "+xml", or if it is
either "text/xml" or "application/xml", then jump to the <em
title="content-type sniffing: xml"><a href="#content-type2">XML</a></em>
section below.

<li>
<p>If the <var title="">official type</var> starts with "image/", then
jump to the <em title="content-type sniffing: image"><a
href="#content-type3">images</a></em> section below.

<li>
<p>If the resource was fetched over an HTTP protocol, and the HTTP
Content-Type header had a value whose bytes exactly match one of the
following three lines:</p>

<table>
<thead>
<tr>
<th>Bytes in Hexadecimal

<th>Textual representation

<tbody>
<tr>
<td>74 65 78 74 2f 70 6c 61 69 6e

<td><code title="">text/plain</code>

<tr>
<td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 49 53
4f 2d 38 38 35 39 2d 31

<td><code title="">text/plain;&nbsp;charset=ISO-8859-1</code>

<tr>
<td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 69 73
6f 2d 38 38 35 39 2d 31

<td><code title="">text/plain;&nbsp;charset=iso-8859-1</code>
<p class=big-issue>...then...</p>
</table>

<li>
<p class=big-issue><dfn id=content-type1>content-type sniffing: unknown
type</dfn>, <dfn id=content-type2>content-type sniffing: xml</dfn>, <dfn
id=content-type3>content-type sniffing: image</dfn>
</ol>
<!--
XXX HTTP Content-Type handling?
http://www.25hoursaday.com/weblog/
http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html
http://ln.hixie.ch/?start=1144794177&count=1
http://www.intertwingly.net/blog/2006/04/13/Dont-throw-charset-out-with-the-bathwater
http://blogs.msdn.com/rssteam/articles/PublishersGuide.aspx
http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#192
http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#127
Expand Down Expand Up @@ -18547,7 +18611,7 @@ XXX selection ranges -->
<p class=big-issue><dfn id=display>display a user agent page inline</dfn>,
update history object

<h4 id=content-type><span class=secno>4.1.7. </span><dfn id=content-type1
<h4 id=content-type><span class=secno>4.1.7. </span><dfn id=content-type4
title=Content-Type>Content-Type metadata</dfn></h4>
<!-- "explicit Content-Type metadata associated with the resource" -->
<!-- "the resource's type information" -->
Expand Down Expand Up @@ -19063,7 +19127,7 @@ XXX selection ranges -->
parameters. <a href="#refsRFC2046">[RFC2046]</a> User agents must not
consider the <code title=attr-hyperlink-type><a
href="#type14">type</a></code> attribute authoritative &mdash; upon
fetching the resource, user agents must only use <a href="#content-type1"
fetching the resource, user agents must only use <a href="#content-type4"
title=Content-Type>the Content-Type information associated with the
resource</a> to determine its type, not metadata included in the link to
the resource.
Expand Down Expand Up @@ -20170,7 +20234,7 @@ mpt says:
the <code title=attr-style-type><a href="#type1">type</a></code> content
attribute's value, or <code title="">text/css</code> if that is omitted.
For <code><a href="#link0">link</a></code> elements, this is the <a
href="#content-type1" title=Content-Type>Content-Type metadata of the
href="#content-type4" title=Content-Type>Content-Type metadata of the
specified resource</a>.

<dt>The location (<code title=dom-stylesheet-href>href</code> DOM
Expand Down
68 changes: 62 additions & 6 deletions source
Original file line number Diff line number Diff line change
Expand Up @@ -16373,6 +16373,7 @@ XXX selection ranges -->
<p>Such processing might be triggered by, amongst other things, the
following:</p>
<ul class="brief">
<li>HTTP status codes (e.g. 204 No Content or 205 Reset Content)</li>
<li>HTTP Content-Disposition headers</li>
<li>Network errors</li>
</ul>
Expand Down Expand Up @@ -16446,16 +16447,71 @@ XXX selection ranges -->
</ol>


<h4 id="content-type-sniffing"><dfn title="Content-Type sniffing">Determining the type of a new resource in a browsing context</dfn></h4>
<h4 id="content-type-sniffing">Determining the type of a new resource in a browsing context</h4>

<p class="big-issue">...</p>
<p class="warning">It is imperative that the rules in this section
be followed exactly. When two user agents use different heuristics
for content type detection, security problems can occur. For
example, if a server believes a contributed file to be an image (and
thus benign), but a Web browser believes the content to be HTML (and
thus capable of executing script), the end user can be exposed to
malicious content, making the user vulnerable to cookie theft
attacks and other cross-site scripting attacks.</p>

<p>The <dfn title="Content-Type sniffing">sniffed type of a
resource</dfn> must be found as follows:</p>

<ol>

<li><p>Let <var title="">official type</var> be the type given by
the <span>Content-Type metadata</code> for the resource (in
lowercase<!-- XXX ASCII case folding -->, ignoring any
parameters). If there is no such type, jump to the <em
title="content-type sniffing: unknown type">unknown type</em> step
below.</p></li>

<li><p>If the <var title="">official type</var> ends in "+xml", or
if it is either "text/xml" or "application/xml", then jump to the
<em title="content-type sniffing: xml">XML</em> section
below.</p></li>

<li><p>If the <var title="">official type</var> starts with
"image/", then jump to the <em title="content-type sniffing:
image">images</em> section below.</p></li>

<li><p>If the resource was fetched over an HTTP protocol, and the
HTTP Content-Type header had a value whose bytes exactly match one
of the following three lines:</p>

<table>
<thead>
<tr>
<th>Bytes in Hexadecimal
<th>Textual representation
<tbody>
<tr>
<td>74 65 78 74 2f 70 6c 61 69 6e
<td><code title="">text/plain</code>
<tr>
<td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 49 53 4f 2d 38 38 35 39 2d 31
<td><code title="">text/plain;&nbsp;charset=ISO-8859-1</code>
<tr>
<td>74 65 78 74 2f 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 69 73 6f 2d 38 38 35 39 2d 31
<td><code title="">text/plain;&nbsp;charset=iso-8859-1</code>
</ul>

<p class="big-issue">...then...</p>

</li>

<li><p class="big-issue"><dfn>content-type sniffing: unknown
type</dfn>, <dfn>content-type sniffing: xml</dfn>,
<dfn>content-type sniffing: image</dfn></p></li>

</ol>

<!--
XXX HTTP Content-Type handling?
http://www.25hoursaday.com/weblog/
http://www.alvestrand.no/pipermail/ietf-types/2006-April/001707.html
http://ln.hixie.ch/?start=1144794177&count=1
http://www.intertwingly.net/blog/2006/04/13/Dont-throw-charset-out-with-the-bathwater
http://blogs.msdn.com/rssteam/articles/PublishersGuide.aspx
http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#192
http://lxr.mozilla.org/seamonkey/source/browser/components/feeds/src/nsFeedSniffer.cpp#127
Expand Down

0 comments on commit 1bec77f

Please sign in to comment.