From 8a1a04678325f775f889f9d2166e2a2336c908c7 Mon Sep 17 00:00:00 2001 From: Ingvar Stepanyan Date: Tue, 5 Apr 2016 12:20:11 +0100 Subject: [PATCH 1/3] Avoid duplication of actions for reading tag names Proof of work: https://github.com/RReverser/parse5/commit/2ece567 --- source | 83 +++++++++++----------------------------------------------- 1 file changed, 16 insertions(+), 67 deletions(-) diff --git a/source b/source index 0b85ae04a04..32457587c06 100644 --- a/source +++ b/source @@ -99369,15 +99369,9 @@ dictionary StorageEventInit : EventInit {
Switch to the end tag open state.
Uppercase ASCII letter
-
Create a new start tag token, set its tag name to the lowercase version of the current - input character (add 0x0020 to the character's code point), then switch to the tag - name state. (Don't emit the token yet; further details will be filled in before it is - emitted.)
-
Lowercase ASCII letter
-
Create a new start tag token, set its tag name to the current input character, - then switch to the tag name state. (Don't emit the token yet; further details will - be filled in before it is emitted.)
+
Create a new start tag token, set its tag name to the empty string. Switch to the tag + name state. Reconsume the current input character.
U+003F QUESTION MARK (?)
Parse error. Create a comment token whose data is the empty string. Switch to @@ -99396,15 +99390,9 @@ dictionary StorageEventInit : EventInit {
Uppercase ASCII letter
-
Create a new end tag token, set its tag name to the lowercase version of the current - input character (add 0x0020 to the character's code point), then switch to the tag - name state. (Don't emit the token yet; further details will be filled in before it is - emitted.)
-
Lowercase ASCII letter
-
Create a new end tag token, set its tag name to the current input character, - then switch to the tag name state. (Don't emit the token yet; further details will - be filled in before it is emitted.)
+
Create a new end tag token, set its tag name to the empty string. Switch to the tag + name state. Reconsume the current input character.
U+003E GREATER-THAN SIGN (>)
Parse error. Switch to the data state.
@@ -99483,17 +99471,9 @@ dictionary StorageEventInit : EventInit {
Uppercase ASCII letter
-
Create a new end tag token, and set its tag name to the lowercase version of the - current input character (add 0x0020 to the character's code point). Append the - current input character to the temporary - buffer. Finally, switch to the RCDATA end tag name state. (Don't emit the - token yet; further details will be filled in before it is emitted.)
-
Lowercase ASCII letter
-
Create a new end tag token, and set its tag name to the current input character. - Append the current input character to the temporary - buffer. Finally, switch to the RCDATA end tag name state. (Don't emit the - token yet; further details will be filled in before it is emitted.)
+
Create a new end tag token, set its tag name to the empty string. Switch to the RCDATA + end tag name state. Reconsume the current input character.
Anything else
Switch to the RCDATA state. Emit a U+003C LESS-THAN SIGN character token and a @@ -99573,17 +99553,9 @@ dictionary StorageEventInit : EventInit {
Uppercase ASCII letter
-
Create a new end tag token, and set its tag name to the lowercase version of the - current input character (add 0x0020 to the character's code point). Append the - current input character to the temporary - buffer. Finally, switch to the RAWTEXT end tag name state. (Don't emit the - token yet; further details will be filled in before it is emitted.)
-
Lowercase ASCII letter
-
Create a new end tag token, and set its tag name to the current input character. - Append the current input character to the temporary - buffer. Finally, switch to the RAWTEXT end tag name state. (Don't emit the - token yet; further details will be filled in before it is emitted.)
+
Create a new end tag token, set its tag name to the empty string. Switch to the + RAWTEXT end tag name state. Reconsume the current input character.
Anything else
Switch to the RAWTEXT state. Emit a U+003C LESS-THAN SIGN character token and a @@ -99666,17 +99638,9 @@ dictionary StorageEventInit : EventInit {
Uppercase ASCII letter
-
Create a new end tag token, and set its tag name to the lowercase version of the - current input character (add 0x0020 to the character's code point). Append the - current input character to the temporary - buffer. Finally, switch to the script data end tag name state. (Don't emit the - token yet; further details will be filled in before it is emitted.)
-
Lowercase ASCII letter
-
Create a new end tag token, and set its tag name to the current input character. - Append the current input character to the temporary - buffer. Finally, switch to the script data end tag name state. (Don't emit the - token yet; further details will be filled in before it is emitted.)
+
Create a new end tag token, set its tag name to the empty string. Switch to the script + data end tag name state. Reconsume the current input character.
Anything else
Switch to the script data state. Emit a U+003C LESS-THAN SIGN character token @@ -99860,18 +99824,10 @@ dictionary StorageEventInit : EventInit { the script data escaped end tag open state.
Uppercase ASCII letter
-
Set the temporary buffer to the empty string. Append the - lowercase version of the current input character (add 0x0020 to the character's code - point) to the temporary buffer. Switch to the script - data double escape start state. Emit a U+003C LESS-THAN SIGN character token and the - current input character as a character token.
-
Lowercase ASCII letter
-
Set the temporary buffer to the empty string. Append the - current input character to the temporary - buffer. Switch to the script data double escape start state. Emit a U+003C - LESS-THAN SIGN character token and the current input character as a character - token.
+
Set the temporary buffer to the empty string. Switch to + the script data double escape start state. Reconsume the current input + character. Emit a U+003C LESS-THAN SIGN character token.
Anything else
Switch to the script data escaped state. Emit a U+003C LESS-THAN SIGN character @@ -99887,17 +99843,10 @@ dictionary StorageEventInit : EventInit {
Uppercase ASCII letter
-
Create a new end tag token, and set its tag name to the lowercase version of the - current input character (add 0x0020 to the character's code point). Append the - current input character to the temporary - buffer. Finally, switch to the script data escaped end tag name state. (Don't - emit the token yet; further details will be filled in before it is emitted.)
-
Lowercase ASCII letter
-
Create a new end tag token, and set its tag name to the current input character. - Append the current input character to the temporary - buffer. Finally, switch to the script data escaped end tag name state. (Don't - emit the token yet; further details will be filled in before it is emitted.)
+
Create a new end tag token. Switch to the script data escaped end tag name + state. Reconsume the current input character. (Don't emit the token yet; + further details will be filled in before it is emitted.)
Anything else
Switch to the script data escaped state. Emit a U+003C LESS-THAN SIGN character From 2336118f330bccd448c7d9bea9bfbc43f389a1c7 Mon Sep 17 00:00:00 2001 From: Ingvar Stepanyan Date: Tue, 5 Apr 2016 12:41:45 +0100 Subject: [PATCH 2/3] Avoid duplication of actions for reading attributes Proof of work: https://github.com/RReverser/parse5/commit/b159bb9 --- source | 88 ++++++++++++---------------------------------------------- 1 file changed, 17 insertions(+), 71 deletions(-) diff --git a/source b/source index 32457587c06..f5a06a1a2a2 100644 --- a/source +++ b/source @@ -100089,35 +100089,20 @@ dictionary StorageEventInit : EventInit {
Ignore the character.
U+002F SOLIDUS (/)
-
Switch to the self-closing start tag state.
-
U+003E GREATER-THAN SIGN (>)
-
Switch to the data state. Emit the current tag token.
- -
Uppercase ASCII letter
-
Start a new attribute in the current tag token. Set that attribute's name to the lowercase - version of the current input character (add 0x0020 to the character's code point), - and its value to the empty string. Switch to the attribute name state.
+
EOF
+
Switch to the after attribute name state. Reconsume the current input + character.
-
U+0000 NULL
+
U+003D EQUALS SIGN (=)
Parse error. Start a new attribute in the current tag token. Set that - attribute's name to a U+FFFD REPLACEMENT CHARACTER character, and its value to the empty string. + attribute's name to the current input character, and its value to the empty string. Switch to the attribute name state.
-
U+0022 QUOTATION MARK (")
-
U+0027 APOSTROPHE (')
-
U+003C LESS-THAN SIGN (<)
-
U+003D EQUALS SIGN (=)
-
Parse error. Treat it as per the "anything else" entry below.
- -
EOF
-
Parse error. Switch to the data state. Reconsume the EOF - character.
-
Anything else
-
Start a new attribute in the current tag token. Set that attribute's name to the - current input character, and its value to the empty string. Switch to the - attribute name state.
+
Start a new attribute in the current tag token. Set that attribute name and value to the + empty string. Switch to the attribute name state. Reconsume the current input + character.
@@ -100133,17 +100118,15 @@ dictionary StorageEventInit : EventInit {
U+000C FORM FEED (FF)
U+0020 SPACE
-
Switch to the after attribute name state.
-
U+002F SOLIDUS (/)
-
Switch to the self-closing start tag state.
+
U+003E GREATER-THAN SIGN (>)
+
EOF
+
Switch to the after attribute name state. Reconsume the current input + character.
U+003D EQUALS SIGN (=)
Switch to the before attribute value state.
-
U+003E GREATER-THAN SIGN (>)
-
Switch to the data state. Emit the current tag token.
-
Uppercase ASCII letter
Append the lowercase version of the current input character (add 0x0020 to the character's code point) to the current attribute's name.
@@ -100157,10 +100140,6 @@ dictionary StorageEventInit : EventInit {
U+003C LESS-THAN SIGN (<)
Parse error. Treat it as per the "anything else" entry below.
-
EOF
-
Parse error. Switch to the data state. Reconsume the EOF - character.
-
Anything else
Append the current input character to the current attribute's name.
@@ -100199,29 +100178,14 @@ dictionary StorageEventInit : EventInit {
U+003E GREATER-THAN SIGN (>)
Switch to the data state. Emit the current tag token.
-
Uppercase ASCII letter
-
Start a new attribute in the current tag token. Set that attribute's name to the lowercase - version of the current input character (add 0x0020 to the character's code point), - and its value to the empty string. Switch to the attribute name state.
- -
U+0000 NULL
-
Parse error. Start a new attribute in the current tag token. Set that - attribute's name to a U+FFFD REPLACEMENT CHARACTER character, and its value to the empty string. - Switch to the attribute name state.
- -
U+0022 QUOTATION MARK (")
-
U+0027 APOSTROPHE (')
-
U+003C LESS-THAN SIGN (<)
-
Parse error. Treat it as per the "anything else" entry below.
-
EOF
Parse error. Switch to the data state. Reconsume the EOF character.
Anything else
-
Start a new attribute in the current tag token. Set that attribute's name to the - current input character, and its value to the empty string. Switch to the - attribute name state.
+
Start a new attribute in the current tag token. Set that attribute name and value to the + empty string. Switch to the attribute name state. Reconsume the current input + character.
@@ -100242,33 +100206,15 @@ dictionary StorageEventInit : EventInit {
U+0022 QUOTATION MARK (")
Switch to the attribute value (double-quoted) state.
-
U+0026 AMPERSAND (&)
-
Switch to the attribute value (unquoted) state. Reconsume the current - input character.
-
U+0027 APOSTROPHE (')
Switch to the attribute value (single-quoted) state.
-
U+0000 NULL
-
Parse error. Append a U+FFFD REPLACEMENT CHARACTER character to the current - attribute's value. Switch to the attribute value (unquoted) state.
-
U+003E GREATER-THAN SIGN (>)
-
Parse error. Switch to the data state. Emit the current tag - token.
- -
U+003C LESS-THAN SIGN (<)
-
U+003D EQUALS SIGN (=)
-
U+0060 GRAVE ACCENT (`)
Parse error. Treat it as per the "anything else" entry below.
-
EOF
-
Parse error. Switch to the data state. Reconsume the EOF - character.
-
Anything else
-
Append the current input character to the current attribute's value. Switch to - the attribute value (unquoted) state.
+
Switch to the attribute value (unquoted) state. Reconsume the current + input character.
From 818a143b272f239d3e64bdfcb05bc2dcf46ccd60 Mon Sep 17 00:00:00 2001 From: Ingvar Stepanyan Date: Tue, 5 Apr 2016 13:08:05 +0100 Subject: [PATCH 3/3] Define "ASCII letters" --- source | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/source b/source index f5a06a1a2a2..7408cde5fec 100644 --- a/source +++ b/source @@ -4292,11 +4292,14 @@ a.setAttribute('href', 'http://example.com/'); // change the content attribute d

The lowercase ASCII letters are the characters in the range U+0061 LATIN SMALL LETTER A to U+007A LATIN SMALL LETTER Z.

+

The ASCII letters are the characters that are either uppercase ASCII + letters or lowercase ASCII letters.

+

The ASCII digits are the characters in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9).

-

The alphanumeric ASCII characters are those that are either uppercase ASCII - letters, lowercase ASCII letters, or ASCII digits.

+

The alphanumeric ASCII characters are those that are either ASCII + letters or ASCII digits.

The ASCII hex digits are the characters in the ranges U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), U+0041 LATIN CAPITAL LETTER A to U+0046 LATIN CAPITAL LETTER F, and U+0061 @@ -99368,8 +99371,7 @@ dictionary StorageEventInit : EventInit {

U+002F SOLIDUS (/)
Switch to the end tag open state.
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Create a new start tag token, set its tag name to the empty string. Switch to the tag name state. Reconsume the current input character. @@ -99389,8 +99391,7 @@ dictionary StorageEventInit : EventInit {
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Create a new end tag token, set its tag name to the empty string. Switch to the tag name state. Reconsume the current input character. @@ -99470,8 +99471,7 @@ dictionary StorageEventInit : EventInit {
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Create a new end tag token, set its tag name to the empty string. Switch to the RCDATA end tag name state. Reconsume the current input character. @@ -99552,8 +99552,7 @@ dictionary StorageEventInit : EventInit {
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Create a new end tag token, set its tag name to the empty string. Switch to the RAWTEXT end tag name state. Reconsume the current input character. @@ -99637,8 +99636,7 @@ dictionary StorageEventInit : EventInit {
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Create a new end tag token, set its tag name to the empty string. Switch to the script data end tag name state. Reconsume the current input character. @@ -99823,8 +99821,7 @@ dictionary StorageEventInit : EventInit {
Set the temporary buffer to the empty string. Switch to the script data escaped end tag open state.
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Set the temporary buffer to the empty string. Switch to the script data double escape start state. Reconsume the current input character. Emit a U+003C LESS-THAN SIGN character token.
@@ -99842,8 +99839,7 @@ dictionary StorageEventInit : EventInit {
-
Uppercase ASCII letter
-
Lowercase ASCII letter
+
ASCII letter
Create a new end tag token. Switch to the script data escaped end tag name state. Reconsume the current input character. (Don't emit the token yet; further details will be filled in before it is emitted.)