Skip to content
Permalink
Browse files
[c] (0) Define 'control characters' formally.
Affected topics: HTML, HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@8173 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Sep 5, 2013
1 parent 3aad91e commit ee409213cad625133655dcf2607706040d5de9e0
Showing 3 changed files with 36 additions and 26 deletions.
@@ -256,7 +256,7 @@

<header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 4 September 2013</h2>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 5 September 2013</h2>
</hgroup><dl><dt><strong>Web developer edition:</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
@@ -2968,8 +2968,8 @@ <h4 id=dom-trees><span class=secno>2.1.3 </span>DOM trees</h4>
it.</p>

<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or control
characters).</p>
string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
characters</a>).</p>


<h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>
@@ -3051,7 +3051,7 @@ <h4 id=plugins><span class=secno>2.1.5 </span>Plugins</h4>


<h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>

xxxxx
<p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
@@ -4159,6 +4159,9 @@ <h4 id=common-parser-idioms><span class=secno>2.4.1 </span>Common parser idioms<
<p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>

<p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>

<p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>

@@ -10676,7 +10679,7 @@ <h6 id=heading-content-0><span class=secno>3.2.5.1.4 </span>Heading content</h6>

<p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
character">Unicode characters</a>, must not contain U+0000 characters, must not contain
permanently undefined Unicode characters (noncharacters), and must not contain control characters
permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
other than <a href=#space-character title="space character">space characters</a>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -84882,8 +84885,8 @@ <h5 id=start-tags><span class=secno>12.1.2.1 </span>Start tags</h5>
<p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
must consist of one or more characters other than the <a href=#space-character title="space character">space
characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
name.</p>
@@ -85280,7 +85283,7 @@ <h4 id=character-references><span class=secno>12.1.4 </span>Character references

</dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
control characters other than <a href=#space-character title="space character">space characters</a>.</p>
<a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>

<p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
@@ -86373,7 +86376,7 @@ <h5 id=changing-the-encoding-while-parsing><span class=secno>12.2.2.4 </span>Cha
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
errors</a>. These are all control characters or permanently
errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
undefined Unicode characters (noncharacters).</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
21 index
@@ -256,7 +256,7 @@

<header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 4 September 2013</h2>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 5 September 2013</h2>
</hgroup><dl><dt><strong>Web developer edition:</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
@@ -2968,8 +2968,8 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
it.</p>

<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or control
characters).</p>
string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
characters</a>).</p>


<h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>
@@ -3051,7 +3051,7 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d


<h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>

xxxxx
<p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
@@ -4159,6 +4159,9 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
<p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>

<p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>

<p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>

@@ -10676,7 +10679,7 @@ background: transparent"&gt;blue&lt;/span&gt;.&lt;/p&gt;</pre>

<p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
character">Unicode characters</a>, must not contain U+0000 characters, must not contain
permanently undefined Unicode characters (noncharacters), and must not contain control characters
permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
other than <a href=#space-character title="space character">space characters</a>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -84882,8 +84885,8 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>
<p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
must consist of one or more characters other than the <a href=#space-character title="space character">space
characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
name.</p>
@@ -85280,7 +85283,7 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>

</dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
control characters other than <a href=#space-character title="space character">space characters</a>.</p>
<a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>

<p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
@@ -86373,7 +86376,7 @@ dictionary <dfn id=storageeventinit>StorageEventInit</dfn> : <a href=#eventinit>
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
errors</a>. These are all control characters or permanently
errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
undefined Unicode characters (noncharacters).</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
20 source
@@ -1726,8 +1726,8 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
it.</p>

<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code>Text</code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or control
characters).</p>
string, means that the length of the text is zero (i.e. not even containing spaces or <span>control
characters</span>).</p>


<h4>Scripting</h4>
@@ -1811,7 +1811,7 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d


<h4 id="encoding-terminology">Character encodings</h4>

xxxxx
<p>A <dfn title="encoding">character encoding</dfn>, or just <i>encoding</i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <span>encoding</span> has an <dfn>encoding name</dfn> and one or more
@@ -3041,6 +3041,10 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d
<p class="note">This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>

<p>The <dfn>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a
href="#refsUNICODE">[UNICODE]</a></p>

<p>The <dfn>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>

@@ -10695,7 +10699,7 @@ background: transparent">blue&lt;/span>.&lt;/p></pre>

<p><code>Text</code> nodes and attribute values must consist of <span title="Unicode
character">Unicode characters</span>, must not contain U+0000 characters, must not contain
permanently undefined Unicode characters (noncharacters), and must not contain control characters
permanently undefined Unicode characters (noncharacters), and must not contain <span>control characters</span>
other than <span title="space character">space characters</span>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
@@ -94715,8 +94719,8 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
<p>Attributes have a name and a value. <dfn title="syntax-attribute-name">Attribute names</dfn>
must consist of one or more characters other than the <span title="space character">space
characters</span>, U+0000 NULL, U+0022 QUOTATION MARK (&#x22;), U+0027 APOSTROPHE (&#x27;), U+003E
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <span>control
characters</span>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <span>foreign elements</span>, may be written with any mix of lower- and
uppercase letters that are an <span>ASCII case-insensitive</span> match for the attribute's
name.</p>
@@ -95144,7 +95148,7 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {

<p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
control characters other than <span title="space character">space characters</span>.</p>
<span>control characters</span> other than <span title="space character">space characters</span>.</p>

<p>An <dfn title="syntax-ambiguous-ampersand">ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <span>alphanumeric ASCII characters</span>,
@@ -96389,7 +96393,7 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> {
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <span title="parse error">parse
errors</span>. These are all control characters or permanently
errors</span>. These are all <span>control characters</span> or permanently
undefined Unicode characters (noncharacters).</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)

0 comments on commit ee40921

Please sign in to comment.