Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tokenization-css-question] Questions about tokenization of CSS #6944

Closed
HcySunYang opened this issue Jan 13, 2022 · 4 comments
Closed

[tokenization-css-question] Questions about tokenization of CSS #6944

HcySunYang opened this issue Jan 13, 2022 · 4 comments
Labels
Closed as Question Answered Used when the issue is more of a question than a problem, and it's been answered. css-syntax-3

Comments

@HcySunYang
Copy link

I read the specification related to CSS tokenization, and there is this sentence in the "4.3.4. Consume an ident-like token" section: "While the next two input code points are whitespace, consume the next input code point". What does it mean and why? Is there any historical reason here? Can it be elaborated with css code example?

I've asked a question on StackOverflow but got no answer: https://stackoverflow.com/questions/70681897/questions-about-tokenization-of-css

Thanks.

@SelenIT
Copy link
Collaborator

SelenIT commented Jan 13, 2022

This wording seems to have the same meaning as "Consume as much whitespace as possible" later in "4.3.6. Consume a url token". The intent is clearly to search for the first non-whitespace character after url(, and if it's a single or double quote, then interpret the whole previous sequence as a function token, otherwise as an url token.

However, if I understand it correctly, point 2 in 4.3.6 seems redundant, since all possible whitespaces should already be consumed and included into the url token by its construction algorithm. Maybe it needs some clarification?

@HcySunYang
Copy link
Author

@SelenIT Thank you for your reply, you solved my confusion very well, thanks again. Yes, point 2 in 4.3.6 seems redundant.

@tabatkins
Copy link
Member

tabatkins commented Jan 13, 2022

That sentence translates directly into code in C-like languages:

while(isWhitespace(input.get(0)) && isWhitespace(input.get(1))) {
  input.consumeNext();
}

If you had code like url(     "http://example.com"), that's a valid url function and we need to match it. So I need to consume an arbitrary amount of whitespace between the ( and the start of the url itself.

I don't use "consume as much whitespace as possible" because the parser always retains the existence of whitespace; in the above example it will output `FUNCTION-TOKEN("url") WS STRING-TOKEN("http://example.com") CLOSE-PAREN.

I can't just preemptively consume all the whitespace and immediately output a WS token, either, because the "consume a token" algo only ever emits a single token at a time. So I have to leave at least one whitespace character behind, so the next call to the algorithm can find it and emit a WS token.

However, if I understand it correctly, point 2 in 4.3.6 seems redundant

It's not. As I said above, I purposely leave behind one space, but if I'm just emitting a url token, I don't need it. So I go ahead and consume it. Technically I could just have it check if the next character is whitespace and consume it, but "consume as much whitespace as possible" is clearer in its intent and prevents any bugs if I somehow call into this algorithm with more than one space character left behind. I never want a URL to actually start with whitespace.

@tabatkins tabatkins added Closed as Question Answered Used when the issue is more of a question than a problem, and it's been answered. css-syntax-3 labels Jan 13, 2022
@HcySunYang
Copy link
Author

@tabatkins clear enough, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed as Question Answered Used when the issue is more of a question than a problem, and it's been answered. css-syntax-3
Projects
None yet
Development

No branches or pull requests

3 participants