Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
[ct] (2) Make surrogates in UTF-8 and character references turn into …
…U+FFFD to prevent UTF-16 environments having hard-to-handle bugs.

git-svn-id: http://svn.whatwg.org/webapps@3871 340c8d12-0b0e-0410-8428-c7bf67bfef74
  • Loading branch information
Hixie committed Sep 16, 2009
1 parent 18f4c73 commit 6db2194
Show file tree
Hide file tree
Showing 2 changed files with 56 additions and 48 deletions.
52 changes: 28 additions & 24 deletions index
Expand Up @@ -62159,23 +62159,25 @@ interface <dfn id=messageport>MessagePort</dfn> {
motivated by a desire to increase the resilience of user agents in
the face of na&iuml;ve transcoders.</p>

<p>All U+0000 NULL characters in the input must be replaced by
U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is
a <a href=#parse-error>parse error</a>.</p>
<p>All U+0000 NULL characters and characters in the range U+D800 to
U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want
them to suddenly turn into codepoints when they go through a UTF-16
pipe --> in the input must be replaced by U+FFFD REPLACEMENT
CHARACTERs. Any occurrences of such characters is a <a href=#parse-error>parse
error</a>.</p>

<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
<!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F
<!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+D800
to U+DFFF<!-- surrogates not allowed -->, U+FDD0 to U+FDEF, and
characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE,
U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF,
U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE,
U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF,
U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and
U+10FFFF are <a href=#parse-error title="parse error">parse errors</a>. (These
are all control characters or permanently undefined Unicode
characters.)</p>
<!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+FDD0
to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF,
U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE,
U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF,
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <a href=#parse-error title="parse error">parse
errors</a>. (These are all control characters or permanently
undefined Unicode characters.)</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
characters are treated specially. Any CR characters that are
Expand Down Expand Up @@ -64016,9 +64018,11 @@ interface <dfn id=messageport>MessagePort</dfn> {
<tr><td>0x9D <td>U+009D <td>&lt;control&gt;
<tr><td>0x9E <td>U+017E <td>LATIN SMALL LETTER Z WITH CARON ('&#382;')
<tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('&Yuml;')
</table><p>Otherwise, if the number is greater than 0x10FFFF, then this is
a <a href=#parse-error>parse error</a>. Return a U+FFFD REPLACEMENT
CHARACTER.</p>
</table><p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!--
surrogates not allowed; see the comment in the "preprocessing the
input stream" section for details --> or is greater than 0x10FFFF,
then this is a <a href=#parse-error>parse error</a>. Return a U+FFFD
REPLACEMENT CHARACTER.</p>

<p>Otherwise, return a character token for the Unicode character
whose code point is that number.
Expand All @@ -64028,14 +64032,14 @@ interface <dfn id=messageport>MessagePort</dfn> {
If the number is in the range 0x0001 to 0x0008, <!-- HT, LF
allowed --> <!-- U+000B is in the next list --> <!-- FF, CR
allowed --> 0x000E to 0x001F, <!-- ASCII allowed --> 0x007F <!--to
0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to
0xDFFF<!-- surrogates not allowed -->, 0xFDD0 to 0xFDEF, or is one
of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF,
0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE,
0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF,
0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE,
0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or
0x10FFFF, then this is a <a href=#parse-error>parse error</a>.</p>
0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xFDD0 to
0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF,
0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE,
0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF,
0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE,
0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF,
0x10FFFE, or 0x10FFFF, then this is a <a href=#parse-error>parse
error</a>.</p>

</dd>

Expand Down
52 changes: 28 additions & 24 deletions source
Expand Up @@ -76737,23 +76737,25 @@ interface <dfn>MessagePort</dfn> {
motivated by a desire to increase the resilience of user agents in
the face of na&iuml;ve transcoders.</p>

<p>All U+0000 NULL characters in the input must be replaced by
U+FFFD REPLACEMENT CHARACTERs. Any occurrences of such characters is
a <span>parse error</span>.</p>
<p>All U+0000 NULL characters and characters in the range U+D800 to
U+DFFF<!-- surrogates not allowed e.g. in UTF-8, and we don't want
them to suddenly turn into codepoints when they go through a UTF-16
pipe --> in the input must be replaced by U+FFFD REPLACEMENT
CHARACTERs. Any occurrences of such characters is a <span>parse
error</span>.</p>

<p>Any occurrences of any characters in the ranges U+0001 to U+0008,
<!-- HT, LF allowed --> <!-- U+000B is in the next list --> <!-- FF,
CR allowed --> U+000E to U+001F, <!-- ASCII allowed --> U+007F
<!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+D800
to U+DFFF<!-- surrogates not allowed -->, U+FDD0 to U+FDEF, and
characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF, U+2FFFE,
U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE, U+5FFFF,
U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF, U+9FFFE,
U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE, U+CFFFF,
U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF, U+10FFFE, and
U+10FFFF are <span title="parse error">parse errors</span>. (These
are all control characters or permanently undefined Unicode
characters.)</p>
<!--to U+0084, (U+0085 NEL not allowed), U+0086--> to U+009F, U+FDD0
to U+FDEF, and characters U+000B, U+FFFE, U+FFFF, U+1FFFE, U+1FFFF,
U+2FFFE, U+2FFFF, U+3FFFE, U+3FFFF, U+4FFFE, U+4FFFF, U+5FFFE,
U+5FFFF, U+6FFFE, U+6FFFF, U+7FFFE, U+7FFFF, U+8FFFE, U+8FFFF,
U+9FFFE, U+9FFFF, U+AFFFE, U+AFFFF, U+BFFFE, U+BFFFF, U+CFFFE,
U+CFFFF, U+DFFFE, U+DFFFF, U+EFFFE, U+EFFFF, U+FFFFE, U+FFFFF,
U+10FFFE, and U+10FFFF are <span title="parse error">parse
errors</span>. (These are all control characters or permanently
undefined Unicode characters.)</p>

<p>U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF)
characters are treated specially. Any CR characters that are
Expand Down Expand Up @@ -78857,9 +78859,11 @@ interface <dfn>MessagePort</dfn> {
<tr><td>0x9F <td>U+0178 <td>LATIN CAPITAL LETTER Y WITH DIAERESIS ('&#x0178;')
</table>

<p>Otherwise, if the number is greater than 0x10FFFF, then this is
a <span>parse error</span>. Return a U+FFFD REPLACEMENT
CHARACTER.</p>
<p>Otherwise, if the number is in the range 0xD800 to 0xDFFF<!--
surrogates not allowed; see the comment in the "preprocessing the
input stream" section for details --> or is greater than 0x10FFFF,
then this is a <span>parse error</span>. Return a U+FFFD
REPLACEMENT CHARACTER.</p>

<p>Otherwise, return a character token for the Unicode character
whose code point is that number.
Expand All @@ -78869,14 +78873,14 @@ interface <dfn>MessagePort</dfn> {
If the number is in the range 0x0001 to 0x0008, <!-- HT, LF
allowed --> <!-- U+000B is in the next list --> <!-- FF, CR
allowed --> 0x000E to 0x001F, <!-- ASCII allowed --> 0x007F <!--to
0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xD800 to
0xDFFF<!-- surrogates not allowed -->, 0xFDD0 to 0xFDEF, or is one
of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF, 0x2FFFE, 0x2FFFF,
0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE, 0x5FFFF, 0x6FFFE,
0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF, 0x9FFFE, 0x9FFFF,
0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE, 0xCFFFF, 0xDFFFE,
0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF, 0x10FFFE, or
0x10FFFF, then this is a <span>parse error</span>.</p>
0x0084, (0x0085 NEL not allowed), 0x0086--> to 0x009F, 0xFDD0 to
0xFDEF, or is one of 0x000B, 0xFFFE, 0xFFFF, 0x1FFFE, 0x1FFFF,
0x2FFFE, 0x2FFFF, 0x3FFFE, 0x3FFFF, 0x4FFFE, 0x4FFFF, 0x5FFFE,
0x5FFFF, 0x6FFFE, 0x6FFFF, 0x7FFFE, 0x7FFFF, 0x8FFFE, 0x8FFFF,
0x9FFFE, 0x9FFFF, 0xAFFFE, 0xAFFFF, 0xBFFFE, 0xBFFFF, 0xCFFFE,
0xCFFFF, 0xDFFFE, 0xDFFFF, 0xEFFFE, 0xEFFFF, 0xFFFFE, 0xFFFFF,
0x10FFFE, or 0x10FFFF, then this is a <span>parse
error</span>.</p>

</dd>

Expand Down

0 comments on commit 6db2194

Please sign in to comment.