Skip to content

Commit

Permalink
Handle ambiguous ampersands of arbitrary length
Browse files Browse the repository at this point in the history
Closes #1257.
  • Loading branch information
inikulin authored and domenic committed Jun 23, 2017
1 parent dd0fb78 commit ee19894
Showing 1 changed file with 63 additions and 67 deletions.
130 changes: 63 additions & 67 deletions source
Expand Up @@ -101822,6 +101822,19 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
of the last start tag to have been emitted from this tokenizer, if any. If no start tag has been
emitted from this tokenizer, then no end tag token is appropriate.</p>

<p>A <span data-x="syntax-charref">character reference</span> is said to be <dfn
data-x="charref-in-attribute">consumed as part of an attribute</dfn> if the <var data-x="return
state">return state</var> is either <span>attribute value (double-quoted) state</span>,
<span>attribute value (single-quoted) state</span> or <span>attribute value (unquoted)
state</span>.</p>

<p>When a state says to <dfn>flush code points consumed as a character reference</dfn>, it means
that for each <span>code point</span> in the <var data-x="temporary buffer">temporary
buffer</var> (in the order they were added to the buffer) user agent must append the code point
from the buffer to the current attribute's value if the character reference was <span
data-x="charref-in-attribute">consumed as part of an attribute</span>, or emit the code point as a
character token otherwise.</p>

<p id="check-parser-pause-flag">Before each step of the tokenizer, the user agent must first check
the <span>parser pause flag</span>. If it is true, then the tokenizer must abort the processing of
any nested invocations of the tokenizer, yielding control back to the caller.</p>
Expand Down Expand Up @@ -103903,33 +103916,23 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

<h5><dfn>Character reference state</dfn></h5>

<p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string. Append a
U+0026 AMPERSAND (&amp;) character to the <var data-x="temporary buffer">temporary buffer</var>.

<p>Consume the <span>next input character</span>:</p>

<dl class="switch">

<dt>U+0009 CHARACTER TABULATION (tab)</dt>
<dt>U+000A LINE FEED (LF)</dt>
<dt>U+000C FORM FEED (FF)</dt>
<!--<dt>U+000D CARRIAGE RETURN (CR)</dt>-->
<dt>U+0020 SPACE</dt>
<dt>U+003C LESS-THAN SIGN</dt>
<dt>U+0026 AMPERSAND</dt>
<dt>EOF</dt>

<dd><span>Reconsume</span> in the <span>character reference end state</span>.</dd>
<dt><span data-x="ASCII alphanumeric">ASCII alphanumeric</span></dt>
<dd><p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string. Append
a U+0026 AMPERSAND (&amp;) character to the <var data-x="temporary buffer">temporary
buffer</var>. <span>Reconsume</span> in the <span>named character reference state</span>.</dd>

<dt>U+0023 NUMBER SIGN (#)</dt>

<dd>Append the <span>current input character</span> to the <var data-x="temporary
buffer">temporary buffer</var>. Switch to the <span>numeric character reference
<dd><p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string. Append
a U+0026 AMPERSAND (&amp;) character and the <span>current input character</span> to the <var
data-x="temporary buffer">temporary buffer</var>. Switch to the <span>numeric character reference
state</span>.</dd>

<dt>Anything else</dt>

<dd><span>Reconsume</span> in the <span>named character reference state</span>.</dd>
<dd><span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

Expand All @@ -103946,13 +103949,12 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<dt>If there is a match</dt>

<dd>
<p>If the character reference was consumed as part of an attribute (<var data-x="return
state">return state</var> is either <span>attribute value (double-quoted) state</span>,
<span>attribute value (single-quoted) state</span> or <span>attribute value (unquoted)
state</span>), and the last character matched is not a U+003B SEMICOLON character (;), and the
<span>next input character</span> is either a U+003D EQUALS SIGN character (=) or an
<span>ASCII alphanumeric</span>, then, for historical reasons, switch to the <span>character
reference end state</span>.</p>
<p>If the character reference was <span data-x="charref-in-attribute">consumed as part of an
attribute</span>, and the last character matched is not a U+003B SEMICOLON character (;), and
the <span>next input character</span> is either a U+003D EQUALS SIGN character (=) or an
<span>ASCII alphanumeric</span>, then, for historical reasons, <span>flush code points consumed
as a character reference</span> and switch to the <var data-x="return state">return state</var>.
</p>
<!-- "=" added because of https://www.w3.org/Bugs/Public/show_bug.cgi?id=9207#c5 -->

<p>Otherwise:</p>
Expand All @@ -103967,21 +103969,19 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
Append one or two characters corresponding to the character reference name (as given by the
second column of the <span>named character references</span> table) to the <var
data-x="temporary buffer">temporary buffer</var>.</p></li>

<li><span>Flush code points consumed as a character reference</span>. Switch to the <var
data-x="return state">return state</var>.</li>
</ol>
</dd>

<dt>Otherwise</dt>

<dd>If the <var data-x="temporary buffer">temporary buffer</var> consists of a U+0026 AMPERSAND
character (&amp;) followed by a sequence of one or more <span data-x="ASCII alphanumeric">ASCII
alphanumerics</span> and a U+003B SEMICOLON character (;), then this is an <span
data-x="parse-error-unknown-named-character-reference">unknown-named-character-reference</span>
<span>parse error</span>.</dd>
<dd><span>Flush code points consumed as a character reference</span>. Switch to the
<span>ambiguous ampersand state</span>.</dd>

</dl>

<p>Switch to the <span>character reference end state</span>.</p>

<div class="example">

<p>If the markup contains (not in an attribute) the string <code data-x="">I'm &amp;notit; I
Expand All @@ -103997,6 +103997,29 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
</div>


<h5><dfn>Ambiguous ampersand state</dfn></h5>

<p>Consume the <span>next input character</span>:</p>

<dl class="switch">

<dt><span data-x="ASCII alphanumeric">ASCII alphanumeric</span></dt>
<dd>If the character reference was <span data-x="charref-in-attribute">consumed as part of an
attribute</span>, then append the <span>current input character</span> to the current
attribute's value. Otherwise, emit the <span>current input character</span> as a character
token.</dd>

<dt>U+003B SEMICOLON (;)</dt>
<dd>This is an <span
data-x="parse-error-unknown-named-character-reference">unknown-named-character-reference</span>
<span>parse error</span>. <span>Reconsume</span> in the <var data-x="return state">return
state</var>.

<dt>Anything else</dt>
<dd><span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

<h5><dfn>Numeric character reference state</dfn></h5>

<p>Set the <dfn><var data-x="character reference code">character reference code</var></dfn> to
Expand Down Expand Up @@ -104030,8 +104053,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<dt>Anything else</dt>
<dd>This is an <span
data-x="parse-error-absence-of-digits-in-numeric-character-reference">absence-of-digits-in-numeric-character-reference</span>
<span>parse error</span>. <span>Reconsume</span> in the <span>character reference end
state</span>.</dd>
<span>parse error</span>. <span>Flush code points consumed as a character reference</span>.
<span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

Expand All @@ -104048,8 +104071,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<dt>Anything else</dt>
<dd>This is an <span
data-x="parse-error-absence-of-digits-in-numeric-character-reference">absence-of-digits-in-numeric-character-reference</span>
<span>parse error</span>. <span>Reconsume</span> in the <span>character reference end
state</span>.</dd>
<span>parse error</span>. <span>Flush code points consumed as a character reference</span>.
<span>Reconsume</span> in the <var data-x="return state">return state</var>.</dd>

</dl>

Expand Down Expand Up @@ -104141,10 +104164,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<li><p>If the number is 0x0D<!-- CR is not allowed --> or a
<span data-x="control">control</span>, but not <span>ASCII whitespace</span>, then this is a
<span data-x="parse-error-control-character-reference">control-character-reference</span>
<span>parse error</span>.</p>

<p>If the number is one of the numbers in the first column of the following table, then find the
row with that number in the first column, and set the <var
<span>parse error</span>. If the number is one of the numbers in the first column of the
following table, then find the row with that number in the first column, and set the <var
data-x="character reference code">character reference code</var> to the number in the second
column of that row.</p>
<!-- these are Unicode C1 control characters -->
Expand Down Expand Up @@ -104191,33 +104212,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

<p>Set the <var data-x="temporary buffer">temporary buffer</var> to the empty string. Append a
code point equal to the <var data-x="character reference code">character reference code</var> to
the <var data-x="temporary buffer">temporary buffer</var>. Switch to the <span>character reference
end state</span>.</p>


<h5><dfn>Character reference end state</dfn></h5>

<p>Consume the <span>next input character</span>.</p>

<p>Check the <var data-x="return state">return state</var>:</p>

<dl class="switch">

<dt><span>Attribute value (double-quoted) state</span></dt>
<dt><span>Attribute value (single-quoted) state</span></dt>
<dt><span>Attribute value (unquoted) state</span></dt>

<dd>Append each character in the <var data-x="temporary buffer">temporary buffer</var> (in the
order they were added to the buffer) to the current attribute's value.</dd>

<dt>Anything else</dt>

<dd>For each of the characters in the <var data-x="temporary buffer">temporary buffer</var> (in
the order they were added to the buffer), emit the character as a character token.</dd>

</dl>

<p><span>Reconsume</span> in the <var data-x="return state">return state</var>.</p>
the <var data-x="temporary buffer">temporary buffer</var>. <span>Flush code points consumed as a
character reference</span>. Switch to the <var data-x="return state">return state</var>.</p>

</div>

Expand Down

0 comments on commit ee19894

Please sign in to comment.