fix / Escape chars in identifiers #260

richard-jp-leguen · 2021-04-04T14:34:19Z

While working with emotion-js/styled, my team discovered that stylis is unable to parse CSS selectors which include escaped "special" characters. So a test like this:

  // add to test/Parser.js
  test.only('escaped chars in selector identifiers', () => {
    expect(
      stylis(`
        &.B\\&W{color:red;}
     `)
    ).to.equal(`.user.B\\&W{color:red;}`)
  });

Fails like this:

  1) Parser
       escaped chars in selector identifiers:

      AssertionError: expected '.user.B\\.userW{color:red;}' to equal '.user.B\\&W{color:red;}'
      + expected - actual

      -.user.B\.userW{color:red;}
      +.user.B\&W{color:red;}
      
      at Context.<anonymous> (test/Parser.js:140:10)
      at processImmediate (node:internal/timers:464:21)

Other special characters fail more spectacularly. Using @ for example:

  test.only('escaped chars in selector identifiers', () => {
    expect(
      stylis(`
        &.B\\@W{color:red;}
     `)
    ).to.equal(`.user.B\\@W{color:red;}`)
  });

... yields the following result, with form feeds in the output:

1) Parser
       escaped chars in selector identifiers:

      AssertionError: expected '&\f.B\\@W{color:red;}' to equal '.user.B\\@W{color:red;}'
      + expected - actual

      -&
        .B\@W{color:red;}
      +.user.B\@W{color:red;}
      
      at Context.<anonymous> (test/Parser.js:140:10)
      at processImmediate (node:internal/timers:464:21)

Various versions of the CSS recommendations mention that escape characters can be used.

From https://www.w3.org/TR/CSS2/syndata.html#characters

Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B&W?" or "B\26 W\3F".

From https://www.w3.org/TR/css-syntax-3/#escaping (non-normative)

Any Unicode code point can be included in an identifier or quoted string by escaping it. CSS escape sequences start with a backslash (), and continue with:

Any Unicode code point that is not a hex digits or a newline. The escape sequence is replaced by that code point.

Or one to six hex digits, followed by an optional whitespace. The escape sequence is replaced by the Unicode code point whose value is given by the hexadecimal digits. This optional whitespace allow hexadecimal escape sequences to be followed by "real" hex digits.

So in this PR I'm attempting to add some support for those behaviors to stylis.js

Just basic backslash escaping of the "next character" - this commit adds no support for hex digits terminated with a space

Wasn't sure what to do about a for loop because I didn't want to pollute the function with `var i` - given there's no use of `let` and all the variable declarations are explicitly hoisted.

coveralls · 2021-04-04T14:35:00Z

Pull Request Test Coverage Report for Build 45b668e77edce8a958f23a6ef3970f47001f033b-PR-260

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

For more information on this, see Tracking coverage changes with pull request builds.
To avoid this issue with future PRs, see these Recommended CI Configurations.
For a quick fix, rebase this PR at GitHub. Your next report should be accurate.

Details

7 of 7 (100.0%) changed or added relevant lines in 2 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.02%) to 99.438%

Totals
Change from base Build f1e01ca8493f71e06457fff9aaba8831dfbae0f1:	0.02%
Covered Lines:	258
Relevant Lines:	259

💛 - Coveralls

richard-jp-leguen · 2021-04-04T14:38:15Z

src/Parser.js

@@ -94,6 +94,25 @@ export function parse (value, root, parent, rule, rules, rulesets, pseudo, point

 				index = offset = property = 0, variable = ampersand = 1, type = characters = '', length = pseudo
 				break
+			// \
+			case 92:


I'm a little worried about the fact that there's nothing here to distinguish between an escape sequence in an "identifier" as opposed to somewhere else, but I can't think of a test case where that's going to be a problem.

richard-jp-leguen · 2021-04-04T14:38:56Z

src/Parser.js

+					characters += from(next())
+				} else {
+					characters += '\\';
+					[1,2,3,4,5,6].some(() => {


I hesitated to add a for(var i loop, as all the variables are hoisted and use the var keyword - I didn't want to pollute that list with a scarcely-used i.

richard-jp-leguen · 2021-04-04T14:41:58Z

test/Parser.js

+      '.user.B\\000026W{color:green;}',
+      '.user.B\\26 W{color:blue;}',
+      // next rules are important because the hex digits terminating space
+      // is not the same as a combinator space


Demonstration that the double-space is important: https://jsfiddle.net/6gsr7qv4/

Is \000026A valid unicode wouldn't it be \26A ?

It's valid. From the recommendation I linked to in the PR description, https://www.w3.org/TR/css-syntax-3/#escaping

EXAMPLE 3
An identifier with the value "&B" could be written as \26 B or \000026B.

... although the trailing A isn't part of the hex code, since the hex code is "one to six hex digits". In both what commented and what's in the recommendation's EXAMPLE 3, the hex code is \26 (or \000026) and the A or B which follows is not part of the hex code.

... as a matter of fact, because A is a hex digit, \000026A and \26A are different - one is &A and the other is ɪ, because with neither leading 0s nor a whitespace after the hex code, the A is treated as part of the hex code!

So:

Q\000026A -> Q&A

Q\000026 A -> Q&A

Q\26 A -> Q&A

Q\26A -> Qɪ

Demonstration: https://jsfiddle.net/cn1y6vpt/

So all unicode sequences need two spaces to separate them i.e Q \000026 A would need to be Q \000026 A with two spaces to yield Q & A?

I think github collapses spaces for inline code snippets, and I can't follow if you're trying to use a className of Q & A (which wouldn't work because classNames are delimited by spaces in the HTML) or trying to use space as a descendant combinator.

If you had something like id="Q & A" you'd need to escape the spaces just the same as the ampersand, and since the escaping slash is not a hex digit you wouldn't need a double space.

If you wanted to have a space which is a combinator after the hex digits, yes you'd need a double space.

Examples: https://jsfiddle.net/sz67trap/1/

No meant selecting \000026 A where a is another selector i.e \000026 header wouldn't that now treat it as unicode + header without a space between them?

Yes, this rule:

.Q\000026 header { color: red; }

Targets this element:

<div class="Q&header">

To target this:

<div class="Q&"> <header>

You would need a double space in the selector rule - one to terminate the hex code and one to use as a combinator:

.Q\000026 header { color: blue; }

See also https://jsfiddle.net/4heovdau/

This is accounted for in the tests:

a single space after 6-0r-fewer hex digits needs to be preserved: https://github.com/thysultan/stylis.js/pull/260/files#diff-4cafb040dcdb12664448ab294d801e96bb1378853e3a2a4337d280b1d3de7f89R158

a double space after 6-or-fewer hex digits needs to be preserved: https://github.com/thysultan/stylis.js/pull/260/files#diff-4cafb040dcdb12664448ab294d801e96bb1378853e3a2a4337d280b1d3de7f89R170-R171

a double space after a non-hex escape sequence is just a combinator and can be collapsed into a single whitespace char: https://github.com/thysultan/stylis.js/pull/260/files#diff-4cafb040dcdb12664448ab294d801e96bb1378853e3a2a4337d280b1d3de7f89R141

a double space after the 7th hex digit is no longer a hex code terminating space but just a run of the mill combinator space which can be collapsed: https://github.com/thysultan/stylis.js/pull/260/files#diff-4cafb040dcdb12664448ab294d801e96bb1378853e3a2a4337d280b1d3de7f89R172

thysultan · 2021-04-04T14:43:35Z

src/Parser.js

@@ -94,6 +94,25 @@ export function parse (value, root, parent, rule, rules, rulesets, pseudo, point

 				index = offset = property = 0, variable = ampersand = 1, type = characters = '', length = pseudo
 				break
+			// \
+			case 92:
+				if (from(peek()).match(/[^a-f0-9]/i)) {


This looks overly involved, why not just next() X2 then continue i.e

case 92: next() && next() continue

Because then the tests fail!

2 failing 1) Parser escaped chars in selector identifiers: AssertionError: expected '.user.B{color:red;}.user.xampleom{color:blue;}.user.ownerounder{color:green;}' to equal '.user.B\\&W{color:red;}.user.\\@example\\.com{color:blue;}.user.owner\\/founder{color:green;}' + expected - actual -.user.B{color:red;}.user.xampleom{color:blue;}.user.ownerounder{color:green;} +.user.B\&W{color:red;}.user.\@example\.com{color:blue;}.user.owner\/founder{color:green;} at Context.<anonymous> (test/Parser.js:142:10) at processImmediate (node:internal/timers:464:21) 2) Parser escaped hex digits in selector identifiers: AssertionError: expected '.user.BW{color:red;}.user.B0026W{color:green;}.user.B W{color:blue;}.user.endsWith00A9 a.childNode{color:green;}.user.endsWith a.childNode{color:yellow;}' to equal '.user.B\\26W{color:red;}.user.B\\000026W{color:green;}.user.B\\26 W{color:blue;}.user.endsWith\\0000A9 a.childNode{color:green;}.user.endsWith\\AE a.childNode{color:yellow;}' + expected - actual -.user.BW{color:red;}.user.B0026W{color:green;}.user.B W{color:blue;}.user.endsWith00A9 a.childNode{color:green;}.user.endsWith a.childNode{color:yellow;} +.user.B\26W{color:red;}.user.B\000026W{color:green;}.user.B\26 W{color:blue;}.user.endsWith\0000A9 a.childNode{color:green;}.user.endsWith\AE a.childNode{color:yellow;} at Context.<anonymous> (test/Parser.js:154:10) at processImmediate (node:internal/timers:464:21)

The output still needs the \ and the next char, so I assume we need a characters += to appear somewhere.

And we sometimes need to preserve double spaces when they follow hex code sequences, but not when they follow escaped non-hex code characters.

so I assume we need a characters += to appear somewhere.

Yes we can concat the two characters that follow:

case 92: characters += from(next()) + from(next()) continue

At a glance - this skips the slash? Would it be more appropriate to do:

case 92: characters += from(character) + from(next()) continue;

That still won't pass one of the tests - this solution does nothing for the escaped 6 character hex code followed by a double space? https://jsfiddle.net/8ouxLvdf/

Changing the code, the tests say:

1) Parser double spaces after escaped hex codes in selector identifiers: AssertionError: expected '.user.endsWith\\0000A9 a.childNode{color:green;}.user.endsWith\\AE a.childNode{color:yellow;}.user.Q\\000026A a.childNode{color:purple;}' to equal '.user.endsWith\\0000A9 a.childNode{color:green;}.user.endsWith\\AE a.childNode{color:yellow;}.user.Q\\000026A a.childNode{color:purple;}' + expected - actual -.user.endsWith\0000A9 a.childNode{color:green;}.user.endsWith\AE a.childNode{color:yellow;}.user.Q\000026A a.childNode{color:purple;} +.user.endsWith\0000A9 a.childNode{color:green;}.user.endsWith\AE a.childNode{color:yellow;}.user.Q\000026A a.childNode{color:purple;} at Context.<anonymous> (test/Parser.js:166:10) at processImmediate (node:internal/timers:464:21)

…ences

richard-jp-leguen · 2021-04-05T23:36:32Z

src/Parser.js

@@ -49,6 +49,10 @@ export function parse (value, root, parent, rule, rules, rulesets, pseudo, point
 			case 9: case 10: case 13: case 32:
 				characters += whitespace(previous)
 				break
+			// \
+			case 92:
+				characters += escaping(caret() - 1, 7)


Really like adding a function to Tokenizer.js for this.

Richard JP Le Guen added 2 commits April 4, 2021 09:22

Adding basic support for escaped characters

211af9a

Just basic backslash escaping of the "next character" - this commit adds no support for hex digits terminated with a space

Quick naive implementation

3937314

Wasn't sure what to do about a for loop because I didn't want to pollute the function with `var i` - given there's no use of `let` and all the variable declarations are explicitly hoisted.

richard-jp-leguen commented Apr 4, 2021

View reviewed changes

thysultan reviewed Apr 4, 2021

View reviewed changes

Richard JP Le Guen and others added 5 commits April 4, 2021 11:14

Adding test to enforce that 6 is the max length of the hex digit sequ…

6b8d2fa

…ences

Splitting tests to be smaller and more concise

1ce04ee

Adding a test

d28fdf5

Typo in comment

43053e8

improved implementation

4e83cfc

richard-jp-leguen commented Apr 5, 2021

View reviewed changes

thysultan requested a review from Andarist April 6, 2021 15:49

Andarist approved these changes Apr 6, 2021

View reviewed changes

thysultan merged commit a5acfdb into thysultan:master Apr 7, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix / Escape chars in identifiers #260

fix / Escape chars in identifiers #260

richard-jp-leguen commented Apr 4, 2021 •

edited

Loading

coveralls commented Apr 4, 2021 •

edited

Loading

richard-jp-leguen Apr 4, 2021

richard-jp-leguen Apr 4, 2021

richard-jp-leguen Apr 4, 2021

thysultan Apr 5, 2021

richard-jp-leguen Apr 5, 2021

richard-jp-leguen Apr 5, 2021

richard-jp-leguen Apr 5, 2021 •

edited

Loading

thysultan Apr 5, 2021 •

edited

Loading

richard-jp-leguen Apr 5, 2021 •

edited

Loading

thysultan Apr 5, 2021

richard-jp-leguen Apr 5, 2021

thysultan Apr 4, 2021 •

edited

Loading

richard-jp-leguen Apr 4, 2021

thysultan Apr 5, 2021

richard-jp-leguen Apr 5, 2021

richard-jp-leguen Apr 5, 2021

fix / Escape chars in identifiers #260

fix / Escape chars in identifiers #260

Conversation

richard-jp-leguen commented Apr 4, 2021 • edited Loading

coveralls commented Apr 4, 2021 • edited Loading

Pull Request Test Coverage Report for Build 45b668e77edce8a958f23a6ef3970f47001f033b-PR-260

Warning: This coverage report may be inaccurate.

Details

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richard-jp-leguen Apr 5, 2021 • edited Loading

Choose a reason for hiding this comment

thysultan Apr 5, 2021 • edited Loading

Choose a reason for hiding this comment

richard-jp-leguen Apr 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thysultan Apr 4, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

richard-jp-leguen commented Apr 4, 2021 •

edited

Loading

coveralls commented Apr 4, 2021 •

edited

Loading

richard-jp-leguen Apr 5, 2021 •

edited

Loading

thysultan Apr 5, 2021 •

edited

Loading

richard-jp-leguen Apr 5, 2021 •

edited

Loading

thysultan Apr 4, 2021 •

edited

Loading