[css-syntax] Review requested of new Parsing text #8834

tabatkins · 2023-05-12T22:13:20Z

Since we resolved on the new Nesting behavior (try to parse as a declaration, then parse as a rule if that's invalid), I had to do some decent rewriting of Syntax's algorithms, and went ahead and dove into a larger rewrite to clean it up in general. I've implemented the new text in my CSS parser library and the (fairly limited, admittedly) tests I've run look good, but I'd appreciate a larger review.

Significant changes from the previous version:

Algorithm structure generally changed; rather than consuming a token and often reconsuming for another algorithm to deal with, it just always relies on lookahead and doesn't consume tokens until they're actually going to be used for certain. This should better resemble how an actual parser works. (I haven't changed the tokenizer to this structure, but doing so is probably a good idea at some point.)
Previously, I had "consume a list of rules" for rule+at-rules and "consume a list of declarations" for declarations+at-rules. Stylesheets and things like @media used "list of rules"; style rules and things like @font-face used "list of declarations". I've shifted all blocks to just use the new "consume a block's contents", and since stylesheets are now the only user of "list of rules", renamed it to "consume a stylesheet's contents" and specialized it to always ignore the CDO/CDC tokens.

Aside from allowing the new nesting behavior, all of these changes should be only editorial, with one exception: blocks that previously only contained rules (@media, @keyframes, etc) previously used the "consume a list of rules", but now use the unified "consume a block's contents", which means their error-recovery in the face of semicolons changes.

For example, @media { garbage; bar {...} } previously would contain a style rule with a garbage; bar selector. (This is what happens at the top-level of a stylesheet, still.) Now the rule's selector will be just bar, since the garbage; part will get dropped as an invalid declaration. This means that rules which were accidentally invalid and dropped due to garbage might now be valid, if there's a semicolon separating them from preceding garbage.

I suspect this is fine, and I'd really like it to be, because it means the overall parsing behavior doesn't need to branch on grammar knowledge (and thus, whether a rule is known or unknown won't change its generic parsing). It used to be the case that parsing depended on this kind of knowledge, and it was super awkward to use in tooling. Also, it means that parsing doesn't change between a top-level @media and a nested one, except for declarations becoming valid instead of invalid; all the rules inside of the @media remain precisely the same.

But if necessary, we can hardcode some at-rules to trigger a different parsing behavior that preserves backwards compatibility more completely.

(Technically parsing in general depends on grammar knowledge anyway, since you need to know whether a declaration is valid in a given context to tell if you should try and redo parsing as a rule. But it turns out there's a simple and reliable rule you can use generically to get approximately the right behavior without having to know anything about grammars.)

The text was updated successfully, but these errors were encountered:

tabatkins · 2023-05-12T22:16:25Z

Agenda+ to get approval on this, and ask for a new publication of Syntax.

…ct some examples. #8834

tabatkins · 2023-05-18T19:33:11Z

@cdoublev Please don't delete your comments, it's very confusing reading an email, coming here to answer the questions, and finding there's no comment anymore.

Restoring the comment:

In examples 12 and 13, I do not understand why @keyframes is not defined with <qualified-rule-list>, and why @page is defined with <declaration-rule-list> (in which at-rules - like margin rules - are automatically invalid).

Ah, it's because I'm dumb. @keyframes definitely should be <qualified-rule-list>, and I misdefined <declaration-rule-list> - it should allow declarations and at-rules, and disallow qualified rules. Both fixed now.

I am not sure what type of item represents <block-contents> and its specific sub-productions: is it a <simple-block> (ie. a component) or an undefined structure (containing declarations and/or rules)? The return value of consume a block's contents is a bit handwavey. But this may be intentional...

The return value isn't at all handwavey - it's a list of declarations and a list of rules. <block-contents> represents the contents of a block; it is not the block itself (it doesn't represent the wrapping {}). These productions aren't substantially different from what was there before, I just reshuffled them a little bit.

cdoublev · 2023-05-19T12:12:49Z

Please don't delete your comments, it's very confusing reading an email, coming here to answer the questions, and finding there's no comment anymore.

Sorry about that. My intent was to avoid hijacking the thread with a minor mistake and something I was wrong about, indeed.

<declaration-at-rule-list> is perhaps clearer than <declaration-rule-list>:

<declaration-rule-list>: declarations and at-rules are allowed; qualified-rules are automatically invalid.

...and more consistent with <at-rule-list>.

A link to consume a block's contents in the description of <block-contents> and its subproductions might also be usefull, perhaps in this paragraph:

The CSS parser is agnostic as to the contents of blocks—they’re all parsed with the same algorithm, and differentiate themselves solely by what constructs are valid.

5.4.5. Parse a list of declarations

To parse a block’s contents from input:

I think it should be "To parse a list of declarations from input:".

tabatkins · 2023-05-19T18:14:45Z

My intent was to avoid hijacking the thread with a minor mistake and something I was wrong about, indeed.

If you'd been wrong about it I wouldn't have said anything, but I had to repost your comment because you did catch some mistakes.

A link to consume a block's contents

Done.

I think it should be "To parse a list of declarations from input:".

No, it's the heading that I didn't update. I fixed the algo name and the heading's own ID, but not the heading's text. ^_^ Fixed.

emilio · 2023-06-04T11:55:09Z

I think we should treat error recovery in nested and unnested rules the same (#7961 (comment)), but other than that the new text looks good to me.

I've enabled this in Firefox Nightly in bug 1835066, fwiw.

tabatkins · 2023-06-04T20:49:22Z

@emilio I'm a little confused - the error recovery is the same. I noted that as the only (non-Nesting related) normative change, because switching the algos around meant that I was changing the non-nested error-recovery (to match the nested error recovery).

tabatkins · 2023-06-04T20:51:15Z

Unless you're referring to wanting to change top-level error-recovery to be the same as what you get inside of rule bodies? (afaict this isn't what you were asking for in your linked comment, but I can't tell what you're asking for otherwise)

emilio · 2023-06-04T22:06:13Z

No, I meant that as long as we assure that all rule bodies recover errors the same regardless of whether we're nested on a style rule, we're good I believe.

Which is the case with the current spec text AFAICT, I just wanted to explicitly point out that, since that was one of the breaking changes I noticed while implementing this, sorry if I wasn't clear on that :)

emilio · 2023-06-04T22:06:34Z

Sorry, I guess I shouldn't have worded it with "other than that" :)

css-meeting-bot · 2023-07-21T20:55:51Z

The CSS Working Group just discussed [css-syntax] Review requested of new Parsing text, and agreed to the following:

RESOLVED: Accept parsing behavior change as described in issue
RESOLVED: Republish CSS Syntax as CRD

The full IRC log of that discussion

<florian> TabAtkins: as part of refactorings for nesting, I realized that there is some common structures we could use to make more sense
<florian> TabAtkins: this results into 1 non editorial change
<fantasai> scribe+
<fantasai> TabAtkins: Rules that previously only contained rules, like @media, previously used an lagorithm that only cared about rules, not declarations
<fantasai> TabAtkins: now that they do care about declarations, at least in nested form -- ideally top-level form parses the same way -- and that causes an error-recovery change
<TabAtkins> @media { garbage; bar {...} }
<fantasai> TabAtkins: specifically ...
<fantasai> TabAtkins: this ^ previously would have resulted in a single rule, "garbage; bar { ... }"
<fantasai> TabAtkins: now it results in two statements, a "garbage;" statement and a "bar { ... }" statement
<fantasai> TabAtkins: only affects when there's a semicolon
<florian> q+
<fantasai> TabAtkins: but changes parsing on existing stylesheets
<emilio> q+
<fantasai> TabAtkins: question is if that's OK
<emilio> ack hober
<Rossen_> ack florian
<fantasai> florian: So you say previously we wouldn't discard "garbage;" and keep "bar { ... }"
<fantasai> florian: that seems better, but is it Web-compatible?
<fantasai> TabAtkins: Suspect so, since only small cases
<fantasai> florian: Doesn't seem unliely ppl will put in a media query and forgot selector and then add full rules with a selector later, and the garbage which used to gobble up next rule and thereby disabled it
<fantasai> florian: nolonger disables it
<fantasai> florian: seems plausible that it could be written by accident
<fantasai> florian: Let's try, but it would be possible it causes problems
<Rossen_> ack emilio
<fantasai> emilio: Fact that nested and unnested parse differently seems super weird, so +1 to try fixing this
<fantasai> emilio: I *think* that's how I implemented in Firefox, but might have been conservative when implementing, don't remember
<fantasai> Rossen_: Hearing support, any reasons to not resolve?
<fantasai> RESOLVED: Accept parsing behavior change as described in issue
<fantasai> TabAtkins: can I republish Syntax?
<fantasai> TabAtkins: all the changes should be captured in Changes list; will double-check before publishing
<fantasai> TabAtkins: still need to do review of tests that might need changing, but that's separate
<fantasai> Rossen_: Any objections to new CR?
<fantasai> RESOLVED: Republish CSS Syntax as CRD

zstarvit · 2023-07-25T19:24:35Z

I don't know if this is particularly the place to make a public comment, but I wanted to make sure that y'all are aware for the spec that there is legacy CSS code floating around that leans on certain CSS property declarations being invalid as a load bearing compatibility """feature""".

For example, older versions of Bootstrap contain declarations like *width: 100%; with the full understanding that *width is an invalid property and won't do anything on most browsers, but does do something in some browsers, notably IE7.

tabatkins · 2023-07-26T00:09:32Z

Yes, that's very specifically still supported. It will fail to parse as both a property and a rule, and stop at the semicolon, exactly as it does today.

cdoublev · 2023-08-10T06:17:32Z

Can you please clarify (not necessarily in the spec) the meaning of valid in the current context in consume a declaration, consume an at-rule, consume a qualified rule.

I first thought it was about what rules/properties/descriptors are accepted in the style sheet or rule:

The CSS parser is agnostic as to the contents of blocks [...].

Accompanying prose must define what is valid and invalid in this context.

But then I wondered if their grammar must also be valid, and how these two validations would address the new needs for parsing nested style rules.

Does it correspond to the validations in isValidInContext() in your CSS parser?

tabatkins · 2023-08-11T22:19:34Z

I'm not sure how to clarify it further than what's already there. The current parsing context (some declaration or rule, possible with parent rules, etc) determines whether any given thing is valid, according to specific grammars and/or prose in the relevant specs.

The function in my CSS parser library is just a generic, relatively dumb version of this that will usually correctly tell apart a rule from a declaration, without knowing anything about what is actually allowed. You just need something that can spot that a:hover { color: blue; } is not a valid declaration, so it can get parsed properly as a rule.

cdoublev · 2023-08-12T12:08:26Z

You just need something that can spot that a:hover { color: blue; } is not a valid declaration, so it can get parsed properly as a rule.

Understood, thanks.

Assuming that consume an at-rule receives @import following a style rule, it would return nothing if valid in the context includes looking at the rules preceding @import. But CSSStyleSheet.insertRule() involves consume an at-rule and insert a CSS rule, which must throw HierarchyRequestError in this situation.

tabatkins · 2023-08-14T18:39:53Z

Yeah. It's purposely somewhat handwavy about what "current context" is, because it's obvious in context what the necessary behavior is in any given algo and there's no real need to lock it down more precisely. It just needs a way to fail on invalid stuff.

cdoublev · 2023-08-19T07:03:50Z

I am not sure what type of item represents <block-contents> and its specific sub-productions: is it a simple block (ie. a component) or an undefined structure (containing declarations and/or rules)? The return value of consume a block's contents is a bit handwavey. But this may be intentional...

The return value isn't at all handwavey - it's a list of declarations and a list of rules. <block-contents> represents the contents of a block; it is not the block itself (it doesn't represent the wrapping {}). These productions aren't substantially different from what was there before, I just reshuffled them a little bit.

I think my point was that I find a bit inconsistent that the algorithms for consuming rules no longer process one level at a time, leaving lower levels as a list of component values. I would expect to process the block with parse a block’s contents before checking its grammar:

The CSS parser is agnostic as to the contents of blocks—they’re all parsed with the same algorithm

But it expects a list of component values, as it is only intended for an independent block, eg. style.cssText. It cannot handle a block resulting from parse a rule.

When rule blocks were consumed with consume a simple block while consuming a rule, the resulting simple block value was a list of component values to consume with the appropriate algorithm before checking the grammar.

Anyway, it is probably not important.

cdoublev · 2023-08-20T12:04:36Z

You just need something that can spot that a:hover { color: blue; } is not a valid declaration, so it can get parsed properly as a rule.

Isn't it possible to implement something that can spot this in consume a declaration and consume a list of component values?

The former would run the latter with a flag when the declaration name is not a custom property. When this flag is set, consume a list of component values would return nothing when it processes <{-token>. In turn, consume a declaration would consume the remnants of a bad declaration and return nothing.

edit: 77e9601 now fulfills this requirement, so there may no longer be a need for consume a declaration, consume an at-rule, consume a qualified rule, to require that the declaration or rule be valid in the current context.

tabatkins · 2024-01-04T20:38:41Z

I think my point was that I find a bit inconsistent that the algorithms for consuming rules no longer process one level at a time, leaving lower levels as a list of component values.

The only reason the old spec did so was because I didn't have a unified parsing model for all possible block contents. You had to know whether to parse a block's contents as "at-rules and declarations" or "at-rules and qualified rules", and that depended on what type of rule the block was in; it was easier to write the spec to have it delay this decision until later, when doing grammar-checking.

The new rules don't actually require having any grammar knowledge, because we've baked some restrictions into the grammar itself, so I was able to unify the two algorithms into one. (Knowing the grammar just lets you parse more efficiently; it doesn't change the end result.) So there was no longer any need to delay the parsing, and we could just get everything done in one pass.

But it expects a list of component values, as it is only intended for an independent block, eg. style.cssText. It cannot handle a block resulting from parse a rule.

Correct, but I'm not sure why that's relevant. The "parse" algorithms are the high-level algos called by other specs, while the "consume" algorithms are the low-level things that actually do the parsing. "Parse" algorithms call into "consume" ones; not vice versa.

so there may no longer be a need for consume a declaration, consume an at-rule, consume a qualified rule, to require that the declaration or rule be valid in the current context.

In theory, you can indeed remove the check from all of these places. You could just check, in "consume a block's contents", whether the returned declaration contains an improperly-used {}-block, and then do all the rest of the validity checking after parsing. But that wouldn't be very efficient; every top-level rule that starts like div:hover would consume the entire rest of the stylesheet as a declaration, then the parser would realize it contains a {}-block in the wrong place, and re-parse a single rule from it. Then do the same thing again, and again, and again. The end result would be identical to one that checks validity as it goes, as the spec currently says, it would just be massively less efficient.

So, it's better to write the spec in a way that guides implementations to do the better thing. This also ensures that if we did accidentally make a change that would cause the "check validity now" and "check validity later" impls to diverge, the "check validity now" impls would stay correct; the spec wouldn't be accidentally requiring impls to change to the less efficient method.

cdoublev · 2024-01-05T13:00:33Z

[The "parse" algorithms do not expect high level objects as input]

Correct, but I'm not sure why that's relevant. [...]

I thought that the "parse [a CSS]" algos could be used recursively. This is no longer possible with the unified model, because nested contents are already consumed.

You also cannot use parse something according to a CSS grammar with a <block-contents> subproduction anymore, because this entry point consumes a list of component values, which consumes simple blocks, whereas consume a block's contents expects tokens and consumes lists of rules and declarations.

every top-level rule that starts like div:hover would consume the entire rest of the stylesheet as a declaration, then the parser would realize it contains a {}-block in the wrong place

In my comment, I suggested to stop consuming a list of component values (for a non-custom property value) when encoutering the { following hover in the declaration value. However, I was wrong about consuming the remnants of the (bad) declaration: the algo would just return nothing, and consume a block's contents would backtrack to div and parse a rule.

But for both of these concerns, the current spec is fine with me. I understand that this is not a detailed implementation guide. When you try to conform to it as closely as possible, it can just be tricky to put pieces together sometimes.

Thank you for your clarifications.

tabatkins added css-syntax-3 Agenda+ labels May 12, 2023

tabatkins mentioned this issue May 16, 2023

[css-nesting-1] Can we relax the syntax further? #7961

Closed

tabatkins added a commit that referenced this issue May 18, 2023

[css-syntax] Fix the definition of <declaration-rule-list>, and corre…

9b8ce4e

…ct some examples. #8834

astearns added this to Thursday in Cupertino F2F Agenda Jul 16, 2023

astearns moved this from Thursday to Friday in Cupertino F2F Agenda Jul 17, 2023

css-meeting-bot removed the Agenda+ label Jul 21, 2023

chromium-helper mentioned this issue Jul 21, 2023

[css-syntax] Review requested of new Parsing text chromium-helper/csswg-resolutions#193

Closed

mozilla-apprentice mentioned this issue Jul 21, 2023

[css-syntax] Review requested of new Parsing text mozilla/wg-decisions#1202

Open

evanw mentioned this issue Jul 25, 2023

CSS: Properties with leading asterisks are being greedily parsed as selectors instead of as properties. evanw/esbuild#3272

Closed

mdubet mentioned this issue Aug 19, 2023

upperbroughton.org.uk - Darker background webcompat/web-bugs#125688

Open

cdoublev mentioned this issue Sep 15, 2023

[css-animations-1] Undefined procedures to get/set CSSOM attributes #9364

Open

cdoublev mentioned this issue Jan 4, 2024

[css-syntax] Dashed-ident rules and error recovery #9336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[css-syntax] Review requested of new Parsing text #8834

[css-syntax] Review requested of new Parsing text #8834

tabatkins commented May 12, 2023 •

edited

Loading

tabatkins commented May 12, 2023

tabatkins commented May 18, 2023

cdoublev commented May 19, 2023

tabatkins commented May 19, 2023

emilio commented Jun 4, 2023

tabatkins commented Jun 4, 2023

tabatkins commented Jun 4, 2023

emilio commented Jun 4, 2023

emilio commented Jun 4, 2023

css-meeting-bot commented Jul 21, 2023

zstarvit commented Jul 25, 2023

tabatkins commented Jul 26, 2023

cdoublev commented Aug 10, 2023 •

edited

Loading

tabatkins commented Aug 11, 2023

cdoublev commented Aug 12, 2023

tabatkins commented Aug 14, 2023

cdoublev commented Aug 19, 2023

cdoublev commented Aug 20, 2023 •

edited

Loading

tabatkins commented Jan 4, 2024

cdoublev commented Jan 5, 2024 •

edited

Loading

[css-syntax] Review requested of new Parsing text #8834

[css-syntax] Review requested of new Parsing text #8834

Comments

tabatkins commented May 12, 2023 • edited Loading

tabatkins commented May 12, 2023

tabatkins commented May 18, 2023

cdoublev commented May 19, 2023

tabatkins commented May 19, 2023

emilio commented Jun 4, 2023

tabatkins commented Jun 4, 2023

tabatkins commented Jun 4, 2023

emilio commented Jun 4, 2023

emilio commented Jun 4, 2023

css-meeting-bot commented Jul 21, 2023

zstarvit commented Jul 25, 2023

tabatkins commented Jul 26, 2023

cdoublev commented Aug 10, 2023 • edited Loading

tabatkins commented Aug 11, 2023

cdoublev commented Aug 12, 2023

tabatkins commented Aug 14, 2023

cdoublev commented Aug 19, 2023

cdoublev commented Aug 20, 2023 • edited Loading

tabatkins commented Jan 4, 2024

cdoublev commented Jan 5, 2024 • edited Loading

tabatkins commented May 12, 2023 •

edited

Loading

cdoublev commented Aug 10, 2023 •

edited

Loading

cdoublev commented Aug 20, 2023 •

edited

Loading

cdoublev commented Jan 5, 2024 •

edited

Loading