Skip to content
Permalink
Browse files

Editorial: use noncharacter and control from Infra

See whatwg/infra#114 for the change to Infra.
  • Loading branch information...
annevk committed Apr 3, 2017
1 parent c32971d commit 70925237a88d9802bfe7224fe9c78b146af615be
Showing with 20 additions and 30 deletions.
  1. +20 −30 source
50 source
different than its previous value; setting an attribute to a value it already has does not change
it.</p>

<p>The term <dfn data-x="">empty</dfn>, when used for an attribute value, <code>Text</code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or <span>control
characters</span>).</p>
<p>The term <dfn data-x="">empty</dfn>, when used for an attribute value, <code>Text</code> node,
or string, means that the length of the text is zero (i.e., not even containing <span
data-x="control">controls</span> or U+0020 SPACE).</p>

<p>An element's <dfn data-export="">child text content</dfn> is the concatenation of the <span
data-x="concept-cd-data">data</span> of all the <code>Text</code> nodes that are children of the
<dfn data-x-href="https://infra.spec.whatwg.org/#code-point">character</dfn></li>
<li><dfn data-x-href="https://infra.spec.whatwg.org/#surrogate">surrogate</dfn></li>
<li><dfn data-x-href="https://infra.spec.whatwg.org/#scalar-value">scalar value</dfn></li>
<li><dfn data-x-href="https://infra.spec.whatwg.org/#noncharacter">noncharacter</dfn></li>
<li><dfn data-x-href="https://infra.spec.whatwg.org/#javascript-string-length">JavaScript string length</dfn></li>
<li><dfn data-x-href="https://infra.spec.whatwg.org/#string-length">string length</dfn></li>
<li><dfn id="space-characters" data-x-href="https://infra.spec.whatwg.org/#ascii-whitespace">ASCII whitespace</dfn></li>
<li><dfn data-x-href="https://infra.spec.whatwg.org/#control">control</dfn></li>
<li><dfn data-x="ASCII digits" data-x-href="https://infra.spec.whatwg.org/#ascii-digit">ASCII digit</dfn></li>
<li><dfn id="uppercase-ascii-hex-digits" data-x-href="https://infra.spec.whatwg.org/#ascii-upper-hex-digit">ASCII upper hex digit</dfn></li>
<li><dfn id="lowercase-ascii-hex-digits" data-x-href="https://infra.spec.whatwg.org/#ascii-lower-hex-digit">ASCII lower hex digit</dfn></li>
<p class="note">This is not to be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code data-x="">Unicode.txt</code> data file.</p>

<p>The <dfn>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code data-x="">UnicodeData.txt</code> data file. <ref spec=UNICODE></p>

<div w-nodev>

<p>Some of the micro-parsers described below follow the pattern of having an <var>input</var>
whitespace</span>).</p>

<p><code>Text</code> nodes and attribute values must consist of <span data-x="scalar value">scalar
values</span>, must not contain U+0000 characters, must not contain permanently undefined
characters (noncharacters), and must not contain <span>control characters</span> other than
<span>ASCII whitespace</span>.
values</span>, excluding <span data-x="noncharacter">noncharacters</span>, and <span
data-x="control">controls</span> other than <span>ASCII whitespace</span>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
character</i>.--> <!-- commented out since nothing disallows it currently, so it's implicit;
element's start tag.</p>

<p>Attributes have a name and a value. <dfn data-x="syntax-attribute-name">Attribute names</dfn>
must consist of one or more characters other than the <span>ASCII whitespace</span>, U+0000 NULL,
U+0022 QUOTATION MARK (&#x22;), U+0027 APOSTROPHE (&#x27;), U+003E GREATER-THAN SIGN (&gt;),
U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <span>control characters</span>,
and any characters that are not defined by Unicode. In the HTML syntax, attribute names, even
those for <span>foreign elements</span>, may be written with any mix of lower- and uppercase
letters that are an <span>ASCII case-insensitive</span> match for the attribute's name.</p>
must consist of one or more characters other than <span data-x="control">controls</span>,
U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (&gt;), U+002F (/), U+003D (=), and <span
data-x="noncharacter">noncharacters</span>. In the HTML syntax, attribute names, even those for
<span>foreign elements</span>, may be written with any mix of <span data-x="ASCII lower
alpha">ASCII lower</span> and <span data-x="ASCII upper alpha">ASCII upper alphas</span>.</p>

<p><dfn data-x="syntax-attribute-value">Attribute values</dfn> are a mixture of <span
data-x="syntax-text">text</span> and <span data-x="syntax-charref">character references</span>,
</dl>

<p>The numeric character reference forms described above are allowed to reference any code point
other than U+0000, U+000D, permanently undefined characters (noncharacters), <span
data-x="surrogate">surrogates</span>, and <span>control characters</span> other than <span>ASCII
whitespace</span>.</p>
excluding U+000D CR, <span data-x="noncharacter">noncharacters</span>, and <span
data-x="control">controls</span> other than <span>ASCII whitespace</span>.</p>

<p>An <dfn data-x="syntax-ambiguous-ampersand">ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <span data-x="ASCII alphanumeric">ASCII
<p>The <dfn>input stream</dfn> consists of the characters pushed into it as the <span>input byte
stream</span> is decoded or from the various APIs that directly manipulate the input stream.</p>

<p>Any occurrences of any characters in the ranges U+0001 to U+0008, <!-- HT, LF allowed --> <!--
U+000B is in the next list --> <!-- FF, CR allowed --> U+000E to U+001F, <!-- ASCII allowed -->
U+007F <!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+FDD0 to U+FDEF, and
characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE,
U+4FFFF, U+5FFFE, U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE, U+9FFFF,
U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE,
U+FFFFF, U+10FFFE, and U+10FFFF are <span data-x="parse error">parse errors</span>. These are all
<span>control characters</span> or permanently undefined characters (noncharacters).</p>

<p>Any <span>character</span> that is a not a <span>scalar value</span>, i.e. any isolated
surrogate, is a <span>parse error</span>. (These can only find their way into the input stream via
script APIs such as <code data-x="dom-document-write">document.write()</code>.)</p>
<p>Any occurrences of <span data-x="surrogate">surrogates</span>, <span
data-x="noncharacter">noncharacters</span>, or <span data-x="control">controls</span> other than
<span>ASCII whitespace</span> are <span data-x="parse error">parse errors</span>.</p>

<p class="note">Isolated surrogates can only find their way into the input stream via script APIs
such as <code data-x="dom-document-write">document.write()</code>.</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) characters are treated
specially. Any LF character that immediately follows a CR character must be ignored, and all CR

0 comments on commit 7092523

Please sign in to comment.
You can’t perform that action at this time.