From 0a42fa65a4a49cdbcebad2dac8c6f830d065aeba Mon Sep 17 00:00:00 2001
From: Ian Hickson If the document has an active parser that isn't a
script-created parser, and the insertion
- point associated with that parser's input
+ point associated with that parser's input
stream is not undefined (that is, it does point to
somewhere in the input stream), then the method does
nothing. Abort these steps and return the Finally, set the insertion point to point at
- just before the end of the input stream (which at this
+ just before the end of the input stream (which at this
point will be empty). Return the Insert an explicit "EOF" character at the end
- of the parser's input stream. If there is a pending parsing-blocking script,
then abort these steps.Living Standard — Last Updated 13 February 2012<
3.4.1 Opening the input
Document
@@ -13783,7 +13783,7 @@ 3.4.1 Opening the input
entry.
Document
on which the method was
@@ -13833,7 +13833,7 @@ 3.4.2 Closing the input
with the document, then abort these steps.
3.4.3
refused to allow the document to be
unloaded, then abort these steps. Otherwise, the
insertion point will point at just before the end of
- the (empty) input stream.
Insert the string consisting of the concatenation of all the - arguments to the method into the input stream just + arguments to the method into the input stream just before the insertion point.
text/html
", create an HTML parser, and
associate it with the document. Each task that the networking task
source places on the task queue while the fetching algorithm runs must then fill the
- parser's input stream with the fetched bytes and cause
- the HTML parser to perform the appropriate processing
- of the input stream.
+ parser's input byte stream with the fetched bytes and
+ cause the HTML parser to perform the appropriate
+ processing of the input stream.
- The input stream converts bytes into - characters for use in the tokenizer. This process relies, in part, +
The input byte stream converts bytes + into characters for use in the tokenizer. This process relies, in part, on character encoding information found in the real Content-Type metadata of the resource; the "sniffed type" is not used for this purpose.
@@ -64377,9 +64377,9 @@The rules for how to convert the bytes of the plain text document into actual characters, and the rules for actually rendering the @@ -81111,13 +81111,13 @@
The input to the HTML parsing process consists of a stream of
- Unicode code points, which is passed through a
- tokenization stage followed by a tree
- construction stage. The output is a Document
- object.
Document
object.
Implementations that do not support scripting do not have to actually create a DOM @@ -81157,21 +81157,50 @@
The stream of Unicode code points that comprises the input to the tokenization stage will be initially seen by the user agent as a stream of bytes (typically coming over the network or from the local file system). The bytes encode the actual characters according to a - particular character encoding, which the user agent must - use to decode the bytes into characters.
+ particular character encoding, which the user agent must use + to decode the bytes into characters.For XML documents, the algorithm user agents must use to determine the character encoding is given by the XML specification. This section does not apply to XML documents. [XML]
+The encoding sniffing algorithm defined below is + used to determine the character encoding.
+ +Given an encoding, the bytes in the input byte + stream must be converted to Unicode code points for the + tokenizer's input stream, as described by the rules for + that encoding, except that the leading U+FEFF BYTE ORDER MARK + character, if any, must not be stripped by the encoding layer (it is + stripped by the rule below).
+ +Bytes or sequences of bytes in the original byte stream that + could not be converted to Unicode code points must be converted to + U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is + UTF-8, the bytes must be decoded with the error handling defined in this + specification.
+ +Bytes or sequences of bytes in the original byte + stream that did not conform to the encoding specification (e.g. + invalid UTF-8 byte sequences in a UTF-8 input byte stream) are + errors that conformance checkers are expected to report.
+ +Any byte or sequence of bytes in the original byte stream that is + misinterpreted for compatibility is a parse + error.
+The document's character encoding must immediately be set to the value returned from this algorithm, at the same time as the user agent uses the returned value to select the decoder to - use for the input stream.
+ use for the input byte stream.When an algorithm requires a user agent to prescan a byte stream to determine its encoding, given some defined end condition, then it must run the following steps. @@ -81438,7 +81467,7 @@
Let position be a pointer to a byte in the - input stream, initially pointing at the first byte. If at any + input byte stream, initially pointing at the first byte. If at any point during these steps the user agent either runs out of bytes or reaches its end condition, then abort the prescan a byte stream to determine its encoding @@ -81575,8 +81604,8 @@
When the prescan a byte stream to determine its encoding algorithm says to get an attribute, @@ -81851,32 +81880,12 @@
Given an encoding, the bytes in the input stream must be - converted to Unicode code points for the tokenizer, as described by - the rules for that encoding, except that the leading U+FEFF BYTE - ORDER MARK character, if any, must not be stripped by the encoding - layer (it is stripped by the rule below).
- -Bytes or sequences of bytes in the original byte stream that - could not be converted to Unicode code points must be converted to - U+FFFD REPLACEMENT CHARACTERs. Specifically, if the encoding is - UTF-8, the bytes must be decoded with the error handling defined in this - specification.
- -Bytes or sequences of bytes in the original byte - stream that did not conform to the encoding specification - (e.g. invalid UTF-8 byte sequences in a UTF-8 input stream) are - errors that conformance checkers are expected to report.
- -Any byte or sequence of bytes in the original byte stream that is - misinterpreted for compatibility is a parse - error.
+The input stream consists of the characters pushed + into it as the input byte stream is decoded or from the + various APIs that directly manipulate the input stream.
One leading U+FEFF BYTE ORDER MARK character must be ignored if - any are present.
+ any are present in the input stream.The requirement to strip a U+FEFF BYTE ORDER MARK character regardless of whether that character was used to determine @@ -81898,18 +81907,18 @@
U+000D CARRIAGE RETURN (CR) characters and U+000A LINE FEED (LF) - characters are treated specially. Any CR characters that are - followed by LF characters must be removed, and any CR characters not - followed by LF characters must be converted to LF characters. Thus, - newlines in HTML DOMs are represented by LF characters, and there - are never any CR characters in the input to the - tokenization stage.
+ characters are treated specially. All CR characters must be + converted to LF characters, and any LF characters that immediately + follow a CR character must be ignored. Thus, newlines in HTML DOMs + are represented by LF characters, and there are never any CR + characters in the input to the tokenization stage.The next input character is the first character in the - input stream that has not yet been consumed. Initially, - the next input character is the first character in the - input. The current input character is the last character - to have been consumed.
+ input stream that has not yet been consumed + or explicit ignored by the requirements in this section. Initially, + the next input character is the first character in the input. + The current input character is the last character to have + been consumed.The insertion point is the position (just before a character or just before the end of the input stream) where content @@ -81920,9 +81929,9 @@
The "EOF" character in the tables below is a conceptual character
- representing the end of the input stream. If the parser
+ representing the end of the input stream. If the parser
is a script-created parser, then the end of the
- input stream is reached when an explicit "EOF"
+ input stream is reached when an explicit "EOF"
character (inserted by the document.close()
method) is
consumed. Otherwise, the "EOF" character is not a real character in
the stream, but rather the lack of any further characters.
When the user agent is to abort a parser, it must run the following steps:
-Throw away any pending content in the input
+ Throw away any pending content in the input
stream, and discard any future content that would have been
added to it. Place into the input stream for the HTML
+ Place into the input stream for the HTML
parser just created the input. The
encoding confidence is
irrelevant. TMZD}#?Yg4@LBV?JkG4C
zc#{75yh=FQ-v7n0b-Iu;hx*D%cWU|M5k8e0aqC-TtMh0vJ9}LkcIY&hwJLgk5`@RM
z|E@r5+>^OM*~!mtd%a`XGI#YcP`BS!bY7H`m(u4j?dcVkO+2+HIt3aL2+a+sCDm~>
z#krLKDUUE!k!F8WKi;az&F#bCdJO%0p|h3HW5a{f3t^}0>VYf9lHzjvu!jWr6wfyO
zzyJ?nF3SyirFlITPnA-=qg~Q!wC^dk=*3-e!X~=b5vFkG_0<<+t4pOX(VSm +_hm4dU%rPP;i&)%EJ6%b!eFx0$ndb$%!gnq_-K{jI_F@M^op&)_hc6JeTDHM8H?
zFzdjp;}bIf?mD(6f5}>>f4u0k?8Lq7@kU;alg0G8Wq3eAvAwmGh?68|F@1t~EtJFj
zaCJD3U9IAAUNps!BjWREnU|u;8;P6;`BN581pyov76qt^ia}w6&wd12AYHfUi<2E5
ze3HFB>J~m5jjnjzW(a$sJe^p0SPv0_l@O#SDbyy$atJ}msOhq9fo-N(Wq;Og47u#c
zT6f(Rdv0O2;iWx(NmIPQU1{{5>hZ(Zz38|Z1L1X9Ks@Pu#hO9scdp*7xo=X#wFWFK
zaV0Hez8_>Lw|j5cqlIyZDUu57uA*_ {ll2-wWOC-?_rmK*4CO<6d5xEBeHs
zoI_xkQ7DKc_(X>^D&ZI8`WV>jz`Z$|4@Z)LKj_%fY;zDEE4e4X#W)y37mUGHp7
zi12.3 Serializing HTML
Yvjthp+8F)QxNg-Ohh`)QBTB@fz===UN-qCA@%bo4K&_>Cuo%BkG;g@t
zQ>H`HOb5N}b;)LVX7y~*^9;nH$mY
MXD81^}Sr
z{rdp{8Ck>t05w2CMpDxocxZ@hg5P(YlVC&xEJ9Aek}QYfC&6Hfs3;GT3{2E0;9@#i
zHTm5R1*bc_GM2m$kQ4w|EWOL6i3(z*=~!8QUdybZezgZSQ#NXJ)WtX|?kG)^aCP
zz$r83P2mT86rfbz4?4B-A9PRv1QZH`0bGEw2vRTz1VYW8=K@86fIy%hh`d-Y3!zZ>
zSEvcBHwpPSdpv$5cUbE?R#w*1h(i3Ev7SnvJuMlWZ@fgO=`~svINj>eYp)<%1GH$x
zb+%^b($sOqiYXg%%so>nRdGc)Ai`~s%~M4kQWSn9PLa?JvJeGmh9+oswZ#-xjEex=7BYd
zOZ}6j&m&s4Zv`?Mq0{8-zIwKP-+P-ZbnspD5_dJJy1&Y418mY-o#Z==rbkeDxEk>I_zFPMfYJsgV*Mz;MWtld5J
z-u=XhrkV3BRIxgm^TibIJ>|r)#|n!t8^OIiX2M7ep&e$-;+N_V`H&qSxfz$I{Fw+H
zMRzoxaVhKV7zgi;hxf*}7X%2KZ-Vajj#}|i1?A;JgHuVKW_HulYHYaL>(6Nw$E;Ns
zRTMozp;gI_YA;Pl#H^yDnVohD$O2ta@H(TzR1>A4)Y=t&@-s?T<6$#b=~9s7r5#l*
zhm%%gDItuZ^p7_sLS?Gi@OEjQSxhsN)b|q4PIsc#Rg=G+cr>H+4h