[tokenization-css-question] Questions about tokenization of CSS #6944

HcySunYang · 2022-01-13T04:27:05Z

I read the specification related to CSS tokenization, and there is this sentence in the "4.3.4. Consume an ident-like token" section: "While the next two input code points are whitespace, consume the next input code point". What does it mean and why? Is there any historical reason here? Can it be elaborated with css code example?

I've asked a question on StackOverflow but got no answer: https://stackoverflow.com/questions/70681897/questions-about-tokenization-of-css

Thanks.

SelenIT · 2022-01-13T09:20:32Z

This wording seems to have the same meaning as "Consume as much whitespace as possible" later in "4.3.6. Consume a url token". The intent is clearly to search for the first non-whitespace character after url(, and if it's a single or double quote, then interpret the whole previous sequence as a function token, otherwise as an url token.

However, if I understand it correctly, point 2 in 4.3.6 seems redundant, since all possible whitespaces should already be consumed and included into the url token by its construction algorithm. Maybe it needs some clarification?

HcySunYang · 2022-01-13T09:25:29Z

@SelenIT Thank you for your reply, you solved my confusion very well, thanks again. Yes, point 2 in 4.3.6 seems redundant.

tabatkins · 2022-01-13T20:10:39Z

That sentence translates directly into code in C-like languages:

while(isWhitespace(input.get(0)) && isWhitespace(input.get(1))) {
  input.consumeNext();
}

If you had code like url( "http://example.com"), that's a valid url function and we need to match it. So I need to consume an arbitrary amount of whitespace between the ( and the start of the url itself.

I don't use "consume as much whitespace as possible" because the parser always retains the existence of whitespace; in the above example it will output `FUNCTION-TOKEN("url") WS STRING-TOKEN("http://example.com") CLOSE-PAREN.

I can't just preemptively consume all the whitespace and immediately output a WS token, either, because the "consume a token" algo only ever emits a single token at a time. So I have to leave at least one whitespace character behind, so the next call to the algorithm can find it and emit a WS token.

However, if I understand it correctly, point 2 in 4.3.6 seems redundant

It's not. As I said above, I purposely leave behind one space, but if I'm just emitting a url token, I don't need it. So I go ahead and consume it. Technically I could just have it check if the next character is whitespace and consume it, but "consume as much whitespace as possible" is clearer in its intent and prevents any bugs if I somehow call into this algorithm with more than one space character left behind. I never want a URL to actually start with whitespace.

HcySunYang · 2022-01-14T14:42:32Z

@tabatkins clear enough, thanks.

tabatkins added Closed as Question Answered Used when the issue is more of a question than a problem, and it's been answered. css-syntax-3 labels Jan 13, 2022

HcySunYang closed this as completed Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tokenization-css-question] Questions about tokenization of CSS #6944

[tokenization-css-question] Questions about tokenization of CSS #6944

HcySunYang commented Jan 13, 2022

SelenIT commented Jan 13, 2022

HcySunYang commented Jan 13, 2022

tabatkins commented Jan 13, 2022 •

edited

Loading

HcySunYang commented Jan 14, 2022

[tokenization-css-question] Questions about tokenization of CSS #6944

[tokenization-css-question] Questions about tokenization of CSS #6944

Comments

HcySunYang commented Jan 13, 2022

SelenIT commented Jan 13, 2022

HcySunYang commented Jan 13, 2022

tabatkins commented Jan 13, 2022 • edited Loading

HcySunYang commented Jan 14, 2022

tabatkins commented Jan 13, 2022 •

edited

Loading