Skip to content

Commit

Permalink
[e] (0) Abstract out the innerHTML serialisation algorithm.
Browse files Browse the repository at this point in the history
git-svn-id: http://svn.whatwg.org/webapps@931 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Jun 19, 2007
1 parent c59c372 commit 3858fd3
Show file tree
Hide file tree
Showing 2 changed files with 356 additions and 337 deletions.
331 changes: 170 additions & 161 deletions index
Original file line number Diff line number Diff line change
Expand Up @@ -1518,7 +1518,10 @@

<li><a href="#namespaces"><span class=secno>8.3. </span>Namespaces</a>

<li><a href="#entities"><span class=secno>8.4. </span>Entities</a>
<li><a href="#serialising"><span class=secno>8.4. </span>Serialising
HTML fragments</a>

<li><a href="#entities"><span class=secno>8.5. </span>Entities</a>
</ul>

<li><a href="#wysiwyg"><span class=secno>9. </span>WYSIWYG editors</a>
Expand Down Expand Up @@ -3710,165 +3713,8 @@ http://software.hixie.ch/utilities/js/live-dom-viewer/?%3C%21DOCTYPE%20html%3E..

<p>On getting, the <code title=dom-innerHTML-HTML><a
href="#innerhtml0">innerHTML</a></code> DOM attribute must return the
result of running the following algorithm:

<ol>
<li>
<p>Let <var title="">s</var> be a string, and initialise it to the empty
string.

<li>
<p>For each child node <var title="">child</var>, in <a
href="#tree-order">tree order</a>, append the appropriate string from
the following list to <var title="">s</var>:</p>

<dl class=switch>
<dt>If the child node is an <code title="">Element</code>

<dd>
<p>Append a U+003C LESS-THAN SIGN (<code title="">&lt;</code>)
character, followed by the element's tag name. (For nodes created by
the <a href="#html-0">HTML parser</a>, <code
title="">Document.createElement()</code>, or <code
title="">Document.renameNode()</code>, the tag name will be
lowercase.)</p>

<p>For each attribute that the element has, append a U+0020 SPACE
character, the attribute's name (which, for attributes set by the <a
href="#html-0">HTML parser</a> or by <code
title="">Element.setAttributeNode()</code> or <code
title="">Element.setAttribute()</code>, will be lowercase), a U+003D
EQUALS SIGN (<code title="">=</code>) character, a U+0022 QUOTATION
MARK (<code title="">&quot;</code>) character, the attribute's value,
<a href="#escapingString" title="escaping a string">escaped as
described below</a>, and a second U+0022 QUOTATION MARK (<code
title="">&quot;</code>) character.</p>

<p>While the exact order of attributes is UA-defined, and may depend on
factors such as the order that the attributes were given in the
original markup, the sort order must be stable, such that consecutive
calls to <code title=dom-innerHTML-HTML><a
href="#innerhtml0">innerHTML</a></code> serialise an element's
attributes in the same order.</p>

<p>Append a U+003E GREATER-THAN SIGN (<code title="">&gt;</code>)
character.</p>

<p>If the child node is an <code><a href="#area">area</a></code>,
<code><a href="#base">base</a></code>, <code>basefont</code>,
<code>bgsound</code>, <code><a href="#br">br</a></code>, <code><a
href="#col">col</a></code>, <code><a href="#embed">embed</a></code>,
<code>frame</code>, <code><a href="#hr">hr</a></code>, <code><a
href="#img">img</a></code>, <code>input</code>, <code><a
href="#link">link</a></code>, <code><a href="#meta0">meta</a></code>,
<code><a href="#param">param</a></code>, <code>spacer</code>, or
<code>wbr</code> element, then continue on to the next child node at
this point.</p>
<!-- also, i guess:
image, isindex, and keygen, but we don't list those because we
don't consider those "elements", more "macros", and thus we
should never serialise them -->
<!-- XXX when we get around to
it, add event-source -->
<p>If the child node is a <code><a href="#pre">pre</a></code> or
<code>textarea</code> element, append a U+000A LINE FEED (LF)
character.</p>

<p>Append the value of the <var title="">child</var> element's <code
title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code>
DOM attribute (thus recursing into this algorithm for that element),
followed by a U+003C LESS-THAN SIGN (<code title="">&lt;</code>)
character, a U+002F SOLIDUS (<code title="">/</code>) character, the
element's tag name again, and finally a U+003E GREATER-THAN SIGN
(<code title="">&gt;</code>) character.</p>

<dt>If the child node is a <code title="">Text</code> or <code
title="">CDATASection</code> node

<dd>
<p>If one of the ancestors of the child node is a <code><a
href="#style">style</a></code>, <code><a
href="#script0">script</a></code>, <code>xmp</code>, <code><a
href="#iframe">iframe</a></code>, <code>noembed</code>,
<code>noframes</code>, or <code><a
href="#noscript">noscript</a></code> element, then append the value of
the <var title="">child</var> node's <code title="">data</code> DOM
attribute literally.</p>
<!-- note about noscript: because this is defining an API, it
can assume that scripting is enabled, and that thus the
<noscript> element in the DOM will have been parsed in the
scripting-enabled mode, and that thus the text node is raw
markup -->

<p>Otherwise, append the value of the <var title="">child</var> node's
<code title="">data</code> DOM attribute, <a href="#escapingString"
title="escaping a string">escaped as described below</a>.</p>

<dt>If the child node is a <code title="">Comment</code>

<dd>
<p>Append the literal string <code>&lt;!--</code> (U+003C LESS-THAN
SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS), followed by the value of the <var title="">child</var>
node's <code title="">data</code> DOM attribute, followed by the
literal string <code>--&gt;</code> (U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS, U+003E GREATER-THAN SIGN).</p>

<dt>If the child node is a <code title="">DocumentType</code>

<dd>
<p>Append the literal string <code>&lt;!DOCTYPE</code> (U+003C
LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER
D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C,
U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050
LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a
space (U+0020 SPACE), followed by the value of the <var
title="">child</var> node's <code title="">name</code> DOM attribute,
followed by the literal string <code>&gt;</code> (U+003E GREATER-THAN
SIGN).</p>
</dl>

<p>Other nodes types (e.g. <code title="">Attr</code>) cannot occur as
children of elements. If they do, the <code title=dom-innerHTML-HTML><a
href="#innerhtml0">innerHTML</a></code> attribute must raise an
<code>INVALID_STATE_ERR</code> exception.</p>

<li>
<p>The result of the algorithm is the string <var title="">s</var>.
</ol>

<p><dfn id=escapingString>Escaping a string</dfn> (for the purposes of the
algorithm above) consists of replacing any occurances of the "<code
title="">&amp;</code>" character by the string "<code
title="">&amp;amp;</code>", any occurances of the "<code
title="">&lt;</code>" character by the string "<code
title="">&amp;lt;</code>", any occurances of the "<code
title="">&gt;</code>" character by the string "<code
title="">&amp;gt;</code>", and any occurances of the "<code
title="">&quot;</code>" character by the string "<code
title="">&amp;quot;</code>".

<p class=note>Entity reference nodes are <a
href="#entity-references">assumed to be expanded</a> by the user agent,
and are therefore not covered in the algorithm above.

<p class=note>It is possible that the roundtripping through <code
title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code> will
not work. For instance, if the element is a <code>textarea</code> element
to which a <code title="">Comment</code> node has been appended, then
assigning <code title=dom-innerHTML-HTML><a
href="#innerhtml0">innerHTML</a></code> to itself will result in the
comment being displayed in the text field. Similarly, if, as a result of
DOM manipulation, the element contains a comment that contains the literal
string "<code title="">--&gt;</code>", then when the result of serialising
the element is parsed, the comment will be truncated at that point and the
rest of the comment will be interpreted as markup. More examples would be
making a <code><a href="#script0">script</a></code> element contain a text
node with the text string "<code>&lt;/script></code>", or having a
<code><a href="#p">p</a></code> element that contains a <code><a
href="#ul">ul</a></code> element (as the <code><a href="#ul">ul</a></code>
element's <span title=syntax-start-tag>start tag</span> would imply the
end tag for the <code><a href="#p">p</a></code>).
result of running the <a href="#html-fragment">HTML fragment serialisation
algorithm</a> on the node.

<p>On setting, if the node is a document, the <code
title=dom-innerHTML-HTML><a href="#innerhtml0">innerHTML</a></code> DOM
Expand Down Expand Up @@ -38631,7 +38477,170 @@ http://lxr.mozilla.org/seamonkey/search?string=nested
<p>The <dfn id=html-namespace0>HTML namespace</dfn> is:
<code>http://www.w3.org/1999/xhtml</code>

<h3 id=entities><span class=secno>8.4. </span><dfn
<h3 id=serialising><span class=secno>8.4. </span>Serialising HTML fragments</h3>

<p>The following steps form the <dfn id=html-fragment>HTML fragment
serialisation algorithm</dfn>. The algorithm takes as input a DOM
<code>Element</code> or <code>Document</code>, referred to as <var
title="">the node</var>, and either returns a string or raises an
exception.

<ol>
<li>
<p>Let <var title="">s</var> be a string, and initialise it to the empty
string.

<li>
<p>For each child node <var title="">child</var> of <var title="">the
node</var>, in <a href="#tree-order">tree order</a>, append the
appropriate string from the following list to <var title="">s</var>:</p>

<dl class=switch>
<dt>If the child node is an <code title="">Element</code>

<dd>
<p>Append a U+003C LESS-THAN SIGN (<code title="">&lt;</code>)
character, followed by the element's tag name. (For nodes created by
the <a href="#html-0">HTML parser</a>, <code
title="">Document.createElement()</code>, or <code
title="">Document.renameNode()</code>, the tag name will be
lowercase.)</p>

<p>For each attribute that the element has, append a U+0020 SPACE
character, the attribute's name (which, for attributes set by the <a
href="#html-0">HTML parser</a> or by <code
title="">Element.setAttributeNode()</code> or <code
title="">Element.setAttribute()</code>, will be lowercase), a U+003D
EQUALS SIGN (<code title="">=</code>) character, a U+0022 QUOTATION
MARK (<code title="">&quot;</code>) character, the attribute's value,
<a href="#escapingString" title="escaping a string">escaped as
described below</a>, and a second U+0022 QUOTATION MARK (<code
title="">&quot;</code>) character.</p>

<p>While the exact order of attributes is UA-defined, and may depend on
factors such as the order that the attributes were given in the
original markup, the sort order must be stable, such that consecutive
invocations of this algorithm serialise an element's attributes in the
same order.</p>

<p>Append a U+003E GREATER-THAN SIGN (<code title="">&gt;</code>)
character.</p>

<p>If the child node is an <code><a href="#area">area</a></code>,
<code><a href="#base">base</a></code>, <code>basefont</code>,
<code>bgsound</code>, <code><a href="#br">br</a></code>, <code><a
href="#col">col</a></code>, <code><a href="#embed">embed</a></code>,
<code>frame</code>, <code><a href="#hr">hr</a></code>, <code><a
href="#img">img</a></code>, <code>input</code>, <code><a
href="#link">link</a></code>, <code><a href="#meta0">meta</a></code>,
<code><a href="#param">param</a></code>, <code>spacer</code>, or
<code>wbr</code> element, then continue on to the next child node at
this point.</p>
<!-- also, i guess:
image, isindex, and keygen, but we don't list those because we
don't consider those "elements", more "macros", and thus we
should never serialise them -->
<!-- XXX when we get around to
it, add event-source -->
<p>If the child node is a <code><a href="#pre">pre</a></code> or
<code>textarea</code> element, append a U+000A LINE FEED (LF)
character.</p>

<p>Append the value of running the <a href="#html-fragment">HTML
fragment serialisation algorithm</a> on the <var title="">child</var>
element (thus recursing into this algorithm for that element),
followed by a U+003C LESS-THAN SIGN (<code title="">&lt;</code>)
character, a U+002F SOLIDUS (<code title="">/</code>) character, the
element's tag name again, and finally a U+003E GREATER-THAN SIGN
(<code title="">&gt;</code>) character.</p>

<dt>If the child node is a <code title="">Text</code> or <code
title="">CDATASection</code> node

<dd>
<p>If one of the ancestors of the child node is a <code><a
href="#style">style</a></code>, <code><a
href="#script0">script</a></code>, <code>xmp</code>, <code><a
href="#iframe">iframe</a></code>, <code>noembed</code>,
<code>noframes</code>, or <code><a
href="#noscript">noscript</a></code> element, then append the value of
the <var title="">child</var> node's <code title="">data</code> DOM
attribute literally.</p>
<!-- note about noscript: because this is defining an API, it
can assume that scripting is enabled, and that thus the
<noscript> element in the DOM will have been parsed in the
scripting-enabled mode, and that thus the text node is raw
markup -->

<p>Otherwise, append the value of the <var title="">child</var> node's
<code title="">data</code> DOM attribute, <a href="#escapingString"
title="escaping a string">escaped as described below</a>.</p>

<dt>If the child node is a <code title="">Comment</code>

<dd>
<p>Append the literal string <code>&lt;!--</code> (U+003C LESS-THAN
SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS), followed by the value of the <var title="">child</var>
node's <code title="">data</code> DOM attribute, followed by the
literal string <code>--&gt;</code> (U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS, U+003E GREATER-THAN SIGN).</p>

<dt>If the child node is a <code title="">DocumentType</code>

<dd>
<p>Append the literal string <code>&lt;!DOCTYPE</code> (U+003C
LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL LETTER
D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL LETTER C,
U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL LETTER Y, U+0050
LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL LETTER E), followed by a
space (U+0020 SPACE), followed by the value of the <var
title="">child</var> node's <code title="">name</code> DOM attribute,
followed by the literal string <code>&gt;</code> (U+003E GREATER-THAN
SIGN).</p>
</dl>

<p>Other nodes types (e.g. <code title="">Attr</code>) cannot occur as
children of elements. If they do, this algorithm must raise an
<code>INVALID_STATE_ERR</code> exception.</p>

<li>
<p>The result of the algorithm is the string <var title="">s</var>.
</ol>

<p><dfn id=escapingString>Escaping a string</dfn> (for the purposes of the
algorithm above) consists of replacing any occurances of the "<code
title="">&amp;</code>" character by the string "<code
title="">&amp;amp;</code>", any occurances of the "<code
title="">&lt;</code>" character by the string "<code
title="">&amp;lt;</code>", any occurances of the "<code
title="">&gt;</code>" character by the string "<code
title="">&amp;gt;</code>", and any occurances of the "<code
title="">&quot;</code>" character by the string "<code
title="">&amp;quot;</code>".

<p class=note>Entity reference nodes are <a
href="#entity-references">assumed to be expanded</a> by the user agent,
and are therefore not covered in the algorithm above.

<p class=note>It is possible that the output of this algorithm, if parsed
with an <a href="#html-0">HTML parser</a>, will not return the original
tree structure. For instance, if a <code>textarea</code> element to which
a <code title="">Comment</code> node has been appended is serialised and
the output is then reparsed, the comment will end up being displayed in
the text field. Similarly, if, as a result of DOM manipulation, an element
contains a comment that contains the literal string "<code
title="">--&gt;</code>", then when the result of serialising the element
is parsed, the comment will be truncated at that point and the rest of the
comment will be interpreted as markup. More examples would be making a
<code><a href="#script0">script</a></code> element contain a text node
with the text string "<code>&lt;/script></code>", or having a <code><a
href="#p">p</a></code> element that contains a <code><a
href="#ul">ul</a></code> element (as the <code><a href="#ul">ul</a></code>
element's <span title=syntax-start-tag>start tag</span> would imply the
end tag for the <code><a href="#p">p</a></code>).

<h3 id=entities><span class=secno>8.5. </span><dfn
id=entities0>Entities</dfn></h3>

<p>This table lists the entity names that are supported by HTML, and the
Expand Down
Loading

0 comments on commit 3858fd3

Please sign in to comment.