Skip to content
Permalink
Browse files

[c] (0) Define 'control characters' formally.

Affected topics: HTML, HTML Syntax and Parsing

git-svn-id: http://svn.whatwg.org/webapps@8173 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information...
Hixie committed Sep 5, 2013
1 parent 3aad91e commit ee409213cad625133655dcf2607706040d5de9e0
Showing with 36 additions and 26 deletions.
  1. +12 −9 complete.html
  2. +12 −9 index
  3. +12 −8 source

<header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 4 September 2013</h2>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 5 September 2013</h2>
</hgroup><dl><dt><strong>Web developer edition:</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
it.</p>

<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or control
characters).</p>
string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
characters</a>).</p>


<h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>


<h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>

xxxxx
<p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
<p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>

<p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>

<p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>


<p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
character">Unicode characters</a>, must not contain U+0000 characters, must not contain
permanently undefined Unicode characters (noncharacters), and must not contain control characters
permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
other than <a href=#space-character title="space character">space characters</a>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
<p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
must consist of one or more characters other than the <a href=#space-character title="space character">space
characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
name.</p>

</dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
control characters other than <a href=#space-character title="space character">space characters</a>.</p>
<a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>

<p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
errors</a>. These are all control characters or permanently
errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
undefined Unicode characters (noncharacters).</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
21 index

<header class=head id=head><p><a href=http://www.whatwg.org/ class=logo><img width=101 src=/images/logo alt=WHATWG height=101></a></p>
<hgroup><h1 class=allcaps>HTML</h1>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 4 September 2013</h2>
<h2 class="no-num no-toc">Living Standard &mdash; Last Updated 5 September 2013</h2>
</hgroup><dl><dt><strong>Web developer edition:</strong></dt>
<dd><strong><a href=http://developers.whatwg.org/>http://developers.whatwg.org/</a></strong></dd>
<dt>Multiple-page version:</dt>
it.</p>

<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code><a href=#text>Text</a></code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or control
characters).</p>
string, means that the length of the text is zero (i.e. not even containing spaces or <a href=#control-characters>control
characters</a>).</p>


<h4 id=scripting-0><span class=secno>2.1.4 </span>Scripting</h4>


<h4 id=encoding-terminology><span class=secno>2.1.6 </span>Character encodings</h4>

xxxxx
<p>A <dfn id=encoding title=encoding>character encoding</dfn>, or just <i><a href=#encoding>encoding</a></i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <a href=#encoding>encoding</a> has an <dfn id=encoding-name>encoding name</dfn> and one or more
<p class=note>This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>

<p>The <dfn id=control-characters>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a href=#refsUNICODE>[UNICODE]</a></p>

<p>The <dfn id=uppercase-ascii-letters>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>


<p><code><a href=#text>Text</a></code> nodes and attribute values must consist of <a href=#unicode-character title="Unicode
character">Unicode characters</a>, must not contain U+0000 characters, must not contain
permanently undefined Unicode characters (noncharacters), and must not contain control characters
permanently undefined Unicode characters (noncharacters), and must not contain <a href=#control-characters>control characters</a>
other than <a href=#space-character title="space character">space characters</a>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
<p>Attributes have a name and a value. <dfn id=syntax-attribute-name title=syntax-attribute-name>Attribute names</dfn>
must consist of one or more characters other than the <a href=#space-character title="space character">space
characters</a>, U+0000 NULL, U+0022 QUOTATION MARK ("), U+0027 APOSTROPHE ('), U+003E
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <a href=#control-characters>control
characters</a>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <a href=#foreign-elements>foreign elements</a>, may be written with any mix of lower- and
uppercase letters that are an <a href=#ascii-case-insensitive>ASCII case-insensitive</a> match for the attribute's
name.</p>

</dl><p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
control characters other than <a href=#space-character title="space character">space characters</a>.</p>
<a href=#control-characters>control characters</a> other than <a href=#space-character title="space character">space characters</a>.</p>

<p>An <dfn id=syntax-ambiguous-ampersand title=syntax-ambiguous-ampersand>ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <a href=#alphanumeric-ascii-characters>alphanumeric ASCII characters</a>,
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
errors</a>. These are all control characters or permanently
errors</a>. These are all <a href=#control-characters>control characters</a> or permanently
undefined Unicode characters (noncharacters).</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
20 source
it.</p>

<p>The term <dfn title="">empty</dfn>, when used of an attribute value, <code>Text</code> node, or
string, means that the length of the text is zero (i.e. not even containing spaces or control
characters).</p>
string, means that the length of the text is zero (i.e. not even containing spaces or <span>control
characters</span>).</p>


<h4>Scripting</h4>


<h4 id="encoding-terminology">Character encodings</h4>

xxxxx
<p>A <dfn title="encoding">character encoding</dfn>, or just <i>encoding</i> where that is not
ambiguous, is a defined way to convert between byte streams and Unicode strings, as defined in the
WHATWG Encoding standard. An <span>encoding</span> has an <dfn>encoding name</dfn> and one or more
<p class="note">This should not be confused with the "White_Space" value (abbreviated "WS") of the
"Bidi_Class" property in the <code title="">Unicode.txt</code> data file.</p>

<p>The <dfn>control characters</dfn> are those whose Unicode "General_Category" property has the
value "Cc" in the Unicode <code title="">UnicodeData.txt</code> data file. <a
href="#refsUNICODE">[UNICODE]</a></p>

<p>The <dfn>uppercase ASCII letters</dfn> are the characters in the range U+0041 LATIN CAPITAL
LETTER A to U+005A LATIN CAPITAL LETTER Z.</p>


<p><code>Text</code> nodes and attribute values must consist of <span title="Unicode
character">Unicode characters</span>, must not contain U+0000 characters, must not contain
permanently undefined Unicode characters (noncharacters), and must not contain control characters
permanently undefined Unicode characters (noncharacters), and must not contain <span>control characters</span>
other than <span title="space character">space characters</span>.

<!--<code>Text</code> nodes and attribute values may begin with an <i>isolated combining
<p>Attributes have a name and a value. <dfn title="syntax-attribute-name">Attribute names</dfn>
must consist of one or more characters other than the <span title="space character">space
characters</span>, U+0000 NULL, U+0022 QUOTATION MARK (&#x22;), U+0027 APOSTROPHE (&#x27;), U+003E
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the control
characters, and any characters that are not defined by Unicode. In the HTML syntax, attribute
GREATER-THAN SIGN (&gt;), U+002F SOLIDUS (/), and U+003D EQUALS SIGN (=) characters, the <span>control
characters</span>, and any characters that are not defined by Unicode. In the HTML syntax, attribute
names, even those for <span>foreign elements</span>, may be written with any mix of lower- and
uppercase letters that are an <span>ASCII case-insensitive</span> match for the attribute's
name.</p>

<p>The numeric character reference forms described above are allowed to reference any Unicode code
point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), and
control characters other than <span title="space character">space characters</span>.</p>
<span>control characters</span> other than <span title="space character">space characters</span>.</p>

<p>An <dfn title="syntax-ambiguous-ampersand">ambiguous ampersand</dfn> is a U+0026 AMPERSAND
character (&amp;) that is followed by one or more <span>alphanumeric ASCII characters</span>,
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <span title="parse error">parse
errors</span>. These are all control characters or permanently
errors</span>. These are all <span>control characters</span> or permanently
undefined Unicode characters (noncharacters).</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)

0 comments on commit ee40921

Please sign in to comment.
You can’t perform that action at this time.