-
Notifications
You must be signed in to change notification settings - Fork 642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[tokenization-css-question] Questions about tokenization of CSS #6944
Comments
This wording seems to have the same meaning as "Consume as much whitespace as possible" later in "4.3.6. Consume a url token". The intent is clearly to search for the first non-whitespace character after However, if I understand it correctly, point 2 in 4.3.6 seems redundant, since all possible whitespaces should already be consumed and included into the url token by its construction algorithm. Maybe it needs some clarification? |
@SelenIT Thank you for your reply, you solved my confusion very well, thanks again. Yes, point 2 in 4.3.6 seems redundant. |
That sentence translates directly into code in C-like languages: while(isWhitespace(input.get(0)) && isWhitespace(input.get(1))) {
input.consumeNext();
} If you had code like url( "http://example.com"), that's a valid url function and we need to match it. So I need to consume an arbitrary amount of whitespace between the I don't use "consume as much whitespace as possible" because the parser always retains the existence of whitespace; in the above example it will output `FUNCTION-TOKEN("url") WS STRING-TOKEN("http://example.com") CLOSE-PAREN. I can't just preemptively consume all the whitespace and immediately output a WS token, either, because the "consume a token" algo only ever emits a single token at a time. So I have to leave at least one whitespace character behind, so the next call to the algorithm can find it and emit a WS token.
It's not. As I said above, I purposely leave behind one space, but if I'm just emitting a url token, I don't need it. So I go ahead and consume it. Technically I could just have it check if the next character is whitespace and consume it, but "consume as much whitespace as possible" is clearer in its intent and prevents any bugs if I somehow call into this algorithm with more than one space character left behind. I never want a URL to actually start with whitespace. |
@tabatkins clear enough, thanks. |
I read the specification related to CSS tokenization, and there is this sentence in the "4.3.4. Consume an ident-like token" section: "While the next two input code points are whitespace, consume the next input code point". What does it mean and why? Is there any historical reason here? Can it be elaborated with css code example?
I've asked a question on StackOverflow but got no answer: https://stackoverflow.com/questions/70681897/questions-about-tokenization-of-css
Thanks.
The text was updated successfully, but these errors were encountered: