Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix / Escape chars in identifiers #260

Merged
merged 7 commits into from
Apr 7, 2021
Merged

fix / Escape chars in identifiers #260

merged 7 commits into from
Apr 7, 2021

Conversation

richard-jp-leguen
Copy link

@richard-jp-leguen richard-jp-leguen commented Apr 4, 2021

While working with emotion-js/styled, my team discovered that stylis is unable to parse CSS selectors which include escaped "special" characters. So a test like this:

  // add to test/Parser.js
  test.only('escaped chars in selector identifiers', () => {
    expect(
      stylis(`
        &.B\\&W{color:red;}
     `)
    ).to.equal(`.user.B\\&W{color:red;}`)
  });

Fails like this:

  1) Parser
       escaped chars in selector identifiers:

      AssertionError: expected '.user.B\\.userW{color:red;}' to equal '.user.B\\&W{color:red;}'
      + expected - actual

      -.user.B\.userW{color:red;}
      +.user.B\&W{color:red;}
      
      at Context.<anonymous> (test/Parser.js:140:10)
      at processImmediate (node:internal/timers:464:21)

Other special characters fail more spectacularly. Using @ for example:

  test.only('escaped chars in selector identifiers', () => {
    expect(
      stylis(`
        &.B\\@W{color:red;}
     `)
    ).to.equal(`.user.B\\@W{color:red;}`)
  });

... yields the following result, with form feeds in the output:

1) Parser
       escaped chars in selector identifiers:

      AssertionError: expected '&\f.B\\@W{color:red;}' to equal '.user.B\\@W{color:red;}'
      + expected - actual

      -&
        .B\@W{color:red;}
      +.user.B\@W{color:red;}
      
      at Context.<anonymous> (test/Parser.js:140:10)
      at processImmediate (node:internal/timers:464:21)

Various versions of the CSS recommendations mention that escape characters can be used.

From https://www.w3.org/TR/CSS2/syndata.html#characters

Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B&W?" or "B\26 W\3F".

From https://www.w3.org/TR/css-syntax-3/#escaping (non-normative)

Any Unicode code point can be included in an identifier or quoted string by escaping it. CSS escape sequences start with a backslash (), and continue with:

  • Any Unicode code point that is not a hex digits or a newline. The escape sequence is replaced by that code point.
  • Or one to six hex digits, followed by an optional whitespace. The escape sequence is replaced by the Unicode code point whose value is given by the hexadecimal digits. This optional whitespace allow hexadecimal escape sequences to be followed by "real" hex digits.

So in this PR I'm attempting to add some support for those behaviors to stylis.js

Richard JP Le Guen added 2 commits April 4, 2021 09:22
Just basic backslash escaping of the "next character" - this commit adds no support for hex digits terminated with a space
Wasn't sure what to do about a for loop because I didn't want to pollute the function with `var i` - given there's no use of `let` and all the variable declarations are explicitly hoisted.
@coveralls
Copy link

coveralls commented Apr 4, 2021

Pull Request Test Coverage Report for Build 45b668e77edce8a958f23a6ef3970f47001f033b-PR-260

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 7 of 7 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.02%) to 99.438%

Totals Coverage Status
Change from base Build f1e01ca8493f71e06457fff9aaba8831dfbae0f1: 0.02%
Covered Lines: 258
Relevant Lines: 259

💛 - Coveralls

src/Parser.js Outdated
@@ -94,6 +94,25 @@ export function parse (value, root, parent, rule, rules, rulesets, pseudo, point

index = offset = property = 0, variable = ampersand = 1, type = characters = '', length = pseudo
break
// \
case 92:
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about the fact that there's nothing here to distinguish between an escape sequence in an "identifier" as opposed to somewhere else, but I can't think of a test case where that's going to be a problem.

src/Parser.js Outdated
characters += from(next())
} else {
characters += '\\';
[1,2,3,4,5,6].some(() => {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitated to add a for(var i loop, as all the variables are hoisted and use the var keyword - I didn't want to pollute that list with a scarcely-used i.

'.user.B\\000026W{color:green;}',
'.user.B\\26 W{color:blue;}',
// next rules are important because the hex digits terminating space
// is not the same as a combinator space
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Demonstration that the double-space is important: https://jsfiddle.net/6gsr7qv4/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is \000026A valid unicode wouldn't it be \26A ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's valid. From the recommendation I linked to in the PR description, https://www.w3.org/TR/css-syntax-3/#escaping

EXAMPLE 3
An identifier with the value "&B" could be written as \26 B or \000026B.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... although the trailing A isn't part of the hex code, since the hex code is "one to six hex digits". In both what commented and what's in the recommendation's EXAMPLE 3, the hex code is \26 (or \000026) and the A or B which follows is not part of the hex code.

Copy link
Author

@richard-jp-leguen richard-jp-leguen Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... as a matter of fact, because A is a hex digit, \000026A and \26A are different - one is &A and the other is ɪ, because with neither leading 0s nor a whitespace after the hex code, the A is treated as part of the hex code!

So:

  • Q\000026A -> Q&A
  • Q\000026 A -> Q&A
  • Q\26 A -> Q&A
  • Q\26A ->

Demonstration: https://jsfiddle.net/cn1y6vpt/

Copy link
Owner

@thysultan thysultan Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So all unicode sequences need two spaces to separate them i.e Q \000026 A would need to be Q \000026 A with two spaces to yield Q & A?

Copy link
Author

@richard-jp-leguen richard-jp-leguen Apr 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think github collapses spaces for inline code snippets, and I can't follow if you're trying to use a className of Q & A (which wouldn't work because classNames are delimited by spaces in the HTML) or trying to use space as a descendant combinator.

If you had something like id="Q & A" you'd need to escape the spaces just the same as the ampersand, and since the escaping slash is not a hex digit you wouldn't need a double space.

If you wanted to have a space which is a combinator after the hex digits, yes you'd need a double space.

Examples: https://jsfiddle.net/sz67trap/1/

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No meant selecting \000026 A where a is another selector i.e \000026 header wouldn't that now treat it as unicode + header without a space between them?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this rule:

.Q\000026 header {
    color: red;
}

Targets this element:

<div class="Q&header">

To target this:

<div class="Q&">
    <header>

You would need a double space in the selector rule - one to terminate the hex code and one to use as a combinator:

.Q\000026  header {
    color: blue;
}

See also https://jsfiddle.net/4heovdau/


This is accounted for in the tests:

src/Parser.js Outdated
@@ -94,6 +94,25 @@ export function parse (value, root, parent, rule, rules, rulesets, pseudo, point

index = offset = property = 0, variable = ampersand = 1, type = characters = '', length = pseudo
break
// \
case 92:
if (from(peek()).match(/[^a-f0-9]/i)) {
Copy link
Owner

@thysultan thysultan Apr 4, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks overly involved, why not just next() X2 then continue i.e

case 92: next() && next()
    continue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because then the tests fail!

 2 failing

  1) Parser
       escaped chars in selector identifiers:

      AssertionError: expected '.user.B{color:red;}.user.xampleom{color:blue;}.user.ownerounder{color:green;}' to equal '.user.B\\&W{color:red;}.user.\\@example\\.com{color:blue;}.user.owner\\/founder{color:green;}'
      + expected - actual

      -.user.B{color:red;}.user.xampleom{color:blue;}.user.ownerounder{color:green;}
      +.user.B\&W{color:red;}.user.\@example\.com{color:blue;}.user.owner\/founder{color:green;}
      
      at Context.<anonymous> (test/Parser.js:142:10)
      at processImmediate (node:internal/timers:464:21)

  2) Parser
       escaped hex digits in selector identifiers:

      AssertionError: expected '.user.BW{color:red;}.user.B0026W{color:green;}.user.B W{color:blue;}.user.endsWith00A9 a.childNode{color:green;}.user.endsWith a.childNode{color:yellow;}' to equal '.user.B\\26W{color:red;}.user.B\\000026W{color:green;}.user.B\\26 W{color:blue;}.user.endsWith\\0000A9  a.childNode{color:green;}.user.endsWith\\AE  a.childNode{color:yellow;}'
      + expected - actual

      -.user.BW{color:red;}.user.B0026W{color:green;}.user.B W{color:blue;}.user.endsWith00A9 a.childNode{color:green;}.user.endsWith a.childNode{color:yellow;}
      +.user.B\26W{color:red;}.user.B\000026W{color:green;}.user.B\26 W{color:blue;}.user.endsWith\0000A9  a.childNode{color:green;}.user.endsWith\AE  a.childNode{color:yellow;}
      
      at Context.<anonymous> (test/Parser.js:154:10)
      at processImmediate (node:internal/timers:464:21)

The output still needs the \ and the next char, so I assume we need a characters += to appear somewhere.

And we sometimes need to preserve double spaces when they follow hex code sequences, but not when they follow escaped non-hex code characters.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I assume we need a characters += to appear somewhere.

Yes we can concat the two characters that follow:

case 92: characters += from(next()) + from(next())
    continue

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance - this skips the slash? Would it be more appropriate to do:

case 92: characters += from(character) + from(next())
    continue;

That still won't pass one of the tests - this solution does nothing for the escaped 6 character hex code followed by a double space? https://jsfiddle.net/8ouxLvdf/

Changing the code, the tests say:

1) Parser
       double spaces after escaped hex codes in selector identifiers:

      AssertionError: expected '.user.endsWith\\0000A9 a.childNode{color:green;}.user.endsWith\\AE a.childNode{color:yellow;}.user.Q\\000026A a.childNode{color:purple;}' to equal '.user.endsWith\\0000A9  a.childNode{color:green;}.user.endsWith\\AE  a.childNode{color:yellow;}.user.Q\\000026A a.childNode{color:purple;}'
      + expected - actual

      -.user.endsWith\0000A9 a.childNode{color:green;}.user.endsWith\AE a.childNode{color:yellow;}.user.Q\000026A a.childNode{color:purple;}
      +.user.endsWith\0000A9  a.childNode{color:green;}.user.endsWith\AE  a.childNode{color:yellow;}.user.Q\000026A a.childNode{color:purple;}
      
      at Context.<anonymous> (test/Parser.js:166:10)
      at processImmediate (node:internal/timers:464:21)

@@ -49,6 +49,10 @@ export function parse (value, root, parent, rule, rules, rulesets, pseudo, point
case 9: case 10: case 13: case 32:
characters += whitespace(previous)
break
// \
case 92:
characters += escaping(caret() - 1, 7)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really like adding a function to Tokenizer.js for this.

@thysultan thysultan requested a review from Andarist April 6, 2021 15:49
@thysultan thysultan merged commit a5acfdb into thysultan:master Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants