Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Modular/distributed grammar definitions #6744

Open
LeaVerou opened this issue Oct 19, 2021 · 5 comments
Open

[Meta] Modular/distributed grammar definitions #6744

LeaVerou opened this issue Oct 19, 2021 · 5 comments

Comments

@LeaVerou
Copy link
Member

There are a lot of cases where a certain value type / token consists of a disjunction of potential tokens, each defined in a separate section of the spec. However, there are still sections with "main" grammar definitions, that need to be updated every time new tokens are added or removed.

E.g.: Images:

<image> = <url> | <image()> | <image-set()> | <cross-fade()> | <element()> | <gradient>

Color 4:

<color> = <absolute-color-base> | currentcolor | <system-color> | <device-cmyk()>

<absolute-color-base> = <hex-color> | <named-color> | transparent |
    <rgb()> | <rgba()> | <hsl()> | <hsla()> | <hwb()> |
    <lab()> | <lch()> |
	<oklab()> | <oklch()> |
    <color()>

Color 5:

<color-space> = srgb | hsl | hwb | xyz | lab | lch | oklab | oklch

This is a maintenance overhead for editors, and makes it more likely for these definitions to get out of sync as tokens are added or removed (case in point: I just added oklab() and oklch() to the <color> grammar because they had not been added there too).

It should be possible to define a subtype entirely by adding a section, without also having to update centralized grammars. This also makes it possible to extend types in separate specs or levels.

Perhaps in addition to =, we could define a |= operator for grammars that means "whatever this token can be from the rest of the spec(s), plus these tokens". E.g., right now, adding OKLab and OKLCH needed edits not just to add their section but also to extend <color> and in Color 5, to extend <color-space>.

If |= existed, the entirety of the grammar changes that these introduce could be self-contained in their section:

<absolute-color-base> |= <oklab()> | <oklch()>
<color-space> |= oklab | oklch

oklab() = oklab( [<percentage> | none] [<number> | none] [<number> | none] [ / [<alpha-value> | none] ]? )
oklch() = oklch( [<percentage> | none] [<number> | none] [<hue> | none] [ / [<alpha-value> | none] ]? )

One could make the point that these summary grammars are useful (though not sure how useful they can be if out of sync). However, Bikeshed could generate them, similarly to how we generated indices, and that way they are guaranteed to be in sync.

@LeaVerou
Copy link
Member Author

LeaVerou commented Oct 19, 2021

cc @tabatkins since you're the grammar person

@tabatkins
Copy link
Member

I actually think the centralization (and attendant maintenance burden) is a good thing, because it means you can find all the instances of a production in one place, rather than having to know all the places it's defined in. This is the same issue with partial in WebIDL, which has both good and bad parts to it.

For instance:

and makes it more likely for these definitions to get out of sync as tokens are added or removed (case in point: I just added oklab() and oklch() to the grammar because they had not been added there too).

This proposal wouldn't actually help with this; if you can remember to write the <color> |= ... line, it's exactly equivalent to instead remember to update the <color> = line. If you were going to forget one, you're going to forget the other.

We've seriously considered removing partial from WebIDL and instead requiring specs to just update the core definition, but it's always been rejected because there's just so many specs altering certain core interfaces like Document, so it would end up kinda unreadable. But that's not the case with CSS definitions; our stuff is pretty well centralized and we don't add new productions to things that often.

In the rare cases we have that are analogous, like defining all the things that are <length>s or whatever, we have a solution already - just say in prose that something is a <length>. It being a little clumsy is a bonus, because it encourages us to mostly just use the centralized grammar-based approach instead.

In theory, tooling can help with these issues. But there's no reason to rely on tooling when just authoring it directly is similarly easy.

@LeaVerou
Copy link
Member Author

LeaVerou commented Oct 30, 2021

Your argument may work when the changes needed are within the same spec. However, this centralization can be a big problem when multiple specs with varying levels of maturity extend the same tokens. Consider the following:

Spec A, very mature:

<a> = a | b | c

Spec B, Early draft, extends with d token:

<a> = a | b | c | d

Spec C, Early draft, extends with e token:

<a> = a | b | c | e

None of these definitions is now correct. Should C take B into account? Should B take C into account? Should A take B and C into account? Depending on their maturity levels, that may or may not make sense. But the maturity level can change at any point, which imposes an undue burden on editors of all three specs because the definition in spec C may need to change even without any changes being made to spec C, just because spec B became more mature.

While this is a simplified example, I cannot count the times where definitions got out of sync because one thing was updated and another thing somewhere else (even within the same spec) was not. Specs are essentially coding in natural language, and the same reasons one modularizes code, apply here too. When humans keep making the same mistake, it's not a good policy to argue that they should "just be more careful", when the problem can be solved with tooling.

@astearns astearns moved this from Values to Next week? in APAC Nov 3 2021 TPAC meeting Nov 1, 2021
@astearns astearns added this to Temp in December 8 meeting Dec 7, 2021
@astearns astearns moved this from Pubs and proposals to Temp3 in December 8 meeting Dec 7, 2021
@atanassov atanassov added this to Can wait until 2022 in December 15 meeting Dec 15, 2021
@atanassov atanassov moved this from Can wait until 2022 to Everything else in December 15 meeting Dec 15, 2021
@fantasai fantasai moved this from Everything else to Can wait until 2022 in December 15 meeting Dec 15, 2021
@astearns astearns added this to Other in January 19 meeting Jan 18, 2022
@cdoublev
Copy link
Collaborator

cdoublev commented Mar 4, 2022

I did not check if this issue has been discussed at your last meeting, but I think modular/distributed grammar (value) definitions already exist with the New values field in some property definition tables in some delta specs, eg. the contain property:

  • the "main" value definition in CSS Contain 2 is none | strict | content | [ size || layout || style || paint ]
  • the New values field in CSS Contain 3 is layout || style || paint || [ size | inline-size ]

Obviously, I took this example on purpose because it seems ambiguous (at least to me). New values should replace what is inside [ size || layout || style || paint ], isn't it? To be fair, it may be an isolated case of a wrong (but unspecified, to my knowledge) usage of this New values field: all other New values may just work fine when joined using | to the corresponding "main" grammar definition.

But what if this "main" grammar definition is a || b and a delta spec wants to define c as its New values? It would be a hint that a "modular" grammar definition may also require ||=, noting that there would be nothing wrong with a || b | c as the result from extending a as the "main" grammar definition, with b | c as its New values. But how do you extend a "main" grammar definition containing whitespace (or &&) separated tokens/productions?

The following may be another isolated issue. CSS Sizing 4 extends inline-size and width (among other properties) with stretch | fit-content | contain as their New values, and CSS Logical 1 defines inline-size with <'width'>. When processing the grammar definitions extracted by @webref/css from the specifications, I concatenate New values to the corresponding "main" value definition using | as the "glue" (which may be a wrong assumption, as noted above). Therefore it can result to parsing an input for inline-size twice against stretch | fit-content | contain. The value definition of inline-size (and some other logical properties) should not be extended with stretch | fit-content | contain, imo.

Finally, I think a delta spec may need to "rewrite" a grammar definition instead of extending it (while still preserving back-compatibility) for other different reasons, making the usage of a modular grammar definition inconsistent.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed modular/distributed grammar definitions.

The full IRC log of that discussion <fantasai> topic: modular/distributed grammar definitions
<fantasai> github: https://github.com//issues/6744
<fantasai> lea: This is about cases where certain tokens are defined as union of other tokens
<fantasai> lea: in some cases we want to expand these tokesn in different specs
<fantasai> lea: and right now it gets quite difficult to maintain
<fantasai> lea: happened multiple times in color-4 and color-5
<fantasai> lea: don't imagine we'll keep adding color spaces, but when lab and lch were added, were added in some places and not others
<fantasai> lea: some happened also in some of the color colorspaces
<fantasai> lea: basically every time we added colorspace, it was added in some places and not others
<fantasai> lea: because right now we don't have a DRY way to do it
<chris> q+
<fantasai> lea: multiple different places need to be updated to ad a colorspace
<fantasai> lea: I was wondering if we could define some new grammar operators
<fantasai> lea: that extends a token with whatever it already has from other specs plus these tokens
<fantasai> lea: they could be in different specs, different modules
<fantasai> lea: argument against this is that it adds clarity to have a centralized definition
<fantasai> lea: in usability we say if humans keep making the same error with your interface, then it's a problem with your interface
<astearns> ack chris
<TabAtkins> q+
<fantasai> chris: I agree with lea that this problem does occur
<fantasai> chris: e.g. we added <number> | <percentage> | none
<fantasai> chris: and we have out of sync with prose
<fantasai> chris: which assumed percent
<fantasai> chris: I don't want to see too much indirection in the actual spec, though
<lea> q+ to say indirection is better resolved by machines rather than humans
<fantasai> chris: but we already have this indirection because we have bikeshed source and generated HTML
<fantasai> chris: so if editors have partial addition like this, if Bikeshed to expand that out and have the complete list in each spec
<fantasai> chris: that would let us see everything in one place
<fantasai> chris: but then I don't know if that's complicated to implement
<astearns> ack dbaron
<fantasai> astearns: Bikeshed automatically expanding makes it *more* likely to have mismatch between grammar and prose
<fantasai> dbaron: I sympathasize with the problem here
<fantasai> dbaron: there are multiple possible results depending on tooling to fix this
<fantasai> dbaron: one possible result that I would be very unhappy with is that you end up in situation where readers of the spec can't figure out what a production expands to
<fantasai> dbaron: because other specs are extending it from other places, and you can't figure out what the list of things that extend it *is*
<fantasai> dbaron: bzbarsky used to call those "come from" statements instead of 'goto' statements, and they're worse
<astearns> ack TabAtkins
<fantasai> dbaron: If we solve through better tooling, great, but I don't want to end up in a situation where you can't figure out what a production means by looking at a spec
<fantasai> TabAtkins: Few points
<fantasai> TabAtkins: Earlier, chris had given an example of prose falling out of sync with grammar
<fantasai> TabAtkins: Alan pointed out if distrubuted, easier for that to happen
<fantasai> TabAtkins: In general, prose falling out of sync has nothing to do with grammar definitions. Has to be manually maintained anyway.
<fantasai> TabAtkins: if you have to keep prose in sync, might as well keep grammar in sync
<fantasai> TabAtkins: I agree with Lea's usability principle, but we mostly get it right
<fantasai> TabAtkins: .... automatically tooled
<fantasai> TabAtkins: Chris asked if this was done in Bikeshed, would this be simple or complicated to implement
<fantasai> TabAtkins: Answer is, substantially more complicated
<lea> +q to say that at least if grammar and prose are inconsistent, it's more obvious there is a problem. If prose and grammar are in sync between them but out of sync with other parts of a spec or other specs, it's easier for an implementer to miss this
<fantasai> TabAtkins: you might have noticed that you can see what a production expands to by hovering
<fantasai> TabAtkins: Process is not very smart, it goes through the database and links things together with vars
<fantasai> TabAtkins: it doesn't read text, it reads the definitions
<fantasai> TabAtkins: to make it work smarter, it would be a brand-new project to parse our grammar definitions
<fantasai> TabAtkins: I do want to do that at some point to link our defintions, to help catch mistakes
<chris> Yeah I saw that give bad results (complete list of color keywords, for example)
<fantasai> TabAtkins: but it's not done now, and would be a major project
<fantasai> TabAtkins: so definitely not in the near future
<fantasai> TabAtkins: If having tooling is necesary, we can't do it now
<fantasai> TabAtkins: finally, just generally, I agree with dbaron and bzbarsky's older points, that having this sort of come-from definition where you can look at the canonical definition of something and not know that something else is modifying it
<fantasai> TabAtkins: we do do this sometimes, in WebIDL and partial propdefs, but we don't do it very often
<fantasai> TabAtkins: So absent tooling that can indicate to readers that there is more to this definition that listed here
<fantasai> TabAtkins: I'm opposed to this
<astearns> ack lea
<Zakim> lea, you wanted to say indirection is better resolved by machines rather than humans and to say that at least if grammar and prose are inconsistent, it's more obvious there is a
<Zakim> ... problem. If prose and grammar are in sync between them but out of sync with other parts of a spec or other specs, it's easier for an implementer to miss this
<fantasai> lea: Originally was going to respond to Chris, that better solved by tooling than humans for centralization
<dbaron> I have proposed (in the TAG) that `partial` should be considered a bad practice for mature specs, although I couldn't quite get consensus on the point.
<fantasai> lea: agree that without tooling, could introduce more problems than it solves
<fantasai> lea: In this discussion seems to be a confusion of extensibility vs monkey-patching
<fantasai> lea: That wouldn't fix prose that's inconsistent to grammar, but implementer can more easily spot that error
<fantasai> lea: whereas if the grammar and prose are consistent with each other, but not with another spec or another part of the spec
<fantasai> lea: that's more likely to create incompatibility
<astearns> q?
<astearns> ack fantasai
<dholbert> fantasai: I wanted to comment on the automated tooling aspect
<dholbert> fantasai: we have lots of specs in different states of being finished
<dholbert> fantasai: if we had a single spec and this was it, and we split it across multiple modules and automated tooling copied things from here to there, that'd be fine
<dholbert> fantasai: if we have things in rec that automatically expanded into people's editor's drafts [...]
<lea> q+ different specs at different maturity levels is actually part of the motivation for this :/ When you maintain centralized definitions it's hard to judge *what* to include in said centralized definition
<dholbert> fantasai: I think having an automated approach would modify things that we don't expect to modify
<lea> q?
<lea> q+
<dholbert> fantasai: e.g. if someone fixes a typo, it might cause changes to other specs
<astearns> ack lea
<dholbert> astearns: the automation would have to deal with that concern
<fantasai> lea: The fact that there are different specs at different maturity levels extending the token is actually a motivation for it
<fantasai> lea: do I inclued this early-stage draft, or not?
<fantasai> lea: The grammars that tooling generate, could have different levels/states
<dholbert> fantasai: let me put it this way; I'm fine with us having centralized definitions and needing to update them manually
<dholbert> fantasai: if we want to have a new or-equals operator that extends an existing token, that's also fine since it's relatively straightforward
<dholbert> fantasai: if you combine specs, you get the union of those tokens
<lea> +1 to fantasai
<dholbert> fantasai: I'm not fine with having an automated system that *decides for you* whether this thing gets extended or not
<dholbert> fantasai: If it's automated, it happens magically and you might not notice
<dholbert> fantasai: don't' want the tooling to change the prose of the spec such that something gets magically expanded. that'll happen in unexpected ways
<fantasai> astearns: Anything more to discuss on this?
<tantek> +1 fantasai, keep things more manual until there’s experience with it, rather than "automate all the things"
<fantasai> astearns: I see 2 ways of going forward with this
<fantasai> astearns: that don't necessarily conflict
<TabAtkins> (Note that the "random ideas are automatically merged into the definition" happens today with the production title="" attribute, but it's also a less-obvious and less-official source of data.)
<fantasai> astearns: 1. Come up with a distributed scheme that we can agree is the right way to develop specs with these issues
<fantasai> astearns: 2. Work on tooling that isn't going to create more problems than it solves
<lea> That sounds like a huge undertaking for what was essentially a proposal for |=
<fantasai> astearns: Tab, you mentioned tooling from you would take a long time.
<fantasai> astearns: would you be OK with someone taking a branch of bikeshed source and fiddling around it?
<fantasai> TabAtkins: yeah, that's fine
<fantasai> chris: It also doesn't have to be bikeshed at all. We have a database of properties and values, potentially a separate tool could run over that and identify problems
<lea> q+
<fantasai> astearns: That tool could be something spec authors could use to figure out what they need to put in their source
<fantasai> astearns: instead of an automated expansion
<astearns> ack lea
<fantasai> lea: This makes it sound like a huge undertaking of coming up with an extension something
<fantasai> lea: whereas the main thing that's needed here is |= or &= or osmething
<fantasai> lea: I've seen this thing being done manually , but can't remember which specs
<fantasai> lea: but I've basically seen prose that's done this, so would be good to have a formalized way to do it
<fantasai> TabAtkins: we do do it sometimes, but I think it's good that it's clumsy and awkward, because encourages people to update the centralized definition
<fantasai> lea: if awkward, fix it
<fantasai> TabAtkins: sometimes it makes sense to make the bad thing hard, so that people avoid doing it
<fantasai> lea: I think calling it a bad thing is a value judgement
<fantasai> lea: it's a spectrum
<fantasai> lea: sometimes centralized defiintions are a worse thing
<fantasai> astearns: I think we've spent enough time for today
<fantasai> astearns: we can go back to the issue and come up with specific proposals on grammar productions and what they would solves and possible tools for spec authors to make things easier
<lea> (+1 that we've spent too long on this today)
<astearns> ack fantasai
<fantasai> astearns: let's come up with things
<dholbert> fantasai: because we need a canonical order for things, the *only* operator that we can do in this manner is the "or" operator
<lea> fantasai: && doesn't impose a specific order either, does it?
<fantasai> astearns: ok, next issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Development

No branches or pull requests

5 participants