Skip to content

Group CSS features #1519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nzakas opened this issue Apr 8, 2025 · 8 comments
Open

Group CSS features #1519

nzakas opened this issue Apr 8, 2025 · 8 comments

Comments

@nzakas
Copy link

nzakas commented Apr 8, 2025

One of the valuable parts of the mdn-data package is how it separates CSS features into different categories:

  • At-rules
  • functions
  • properties
  • selectors
  • syntaxes
  • types
  • units

In the current webref package, it's just a collection of objects that we then need to dig into to figure out what types are contained within. It would be helpful if the categories could be exposed at the top level of the package and list every entry for that category regardless of spec.

@tidoust
Copy link
Member

tidoust commented Apr 9, 2025

I note the current @webref/css package already separates at the root level between:

  • at-rules
  • properties
  • selectors
  • and "values", which is a mixed bag of things.

The mixed bag of things exists because CSS specs do not really distinguish between other types when they define concepts. There is a notion of function but the specs do not necessarily use that consistently. That ambiguity seems to appear in mdn-data too. For example, the abs() function appears both as a "function" and as a "syntax" in mdn-data.

CSS specs do use a type definition type too, which could perhaps be used to populate a related category. There seems to be many more type definitions in specs than in what mdn-data currently lists as types. For example, line-color-list, linear-color-stop, ident-token are all type definitions from a spec perspective. If they are not in the list on purpose, is there a way to distinguish between types?

CSS specs define units as value definitions that are for something. It may be relatively easy to assemble the list of units automatically with a short list of underlyling types. For example, looking at all values defined for <angle>, <length> and a few others.

Essentially, the question is: can CSS features be categorized automatically? If not, what amount of manual data would need to be maintained?

@nzakas
Copy link
Author

nzakas commented Apr 9, 2025

Thanks for the response. A follow-up question: assuming everyone wants webref packages to be as useful as possible, is there a reason the specs themselves can't be updated to encode this information where appropriate?

@tidoust
Copy link
Member

tidoust commented Apr 10, 2025

No reason in theory and, on top of trying to reduce the amount of work needed to maintain Webref, we also restrict the amount of data that needs to be manually injected in Webref to a bare minimum as a way to push fixes and improvements back to the underlying specs.

In practice there are ~120 CSS specs at various levels of maturity and activity, with dozens of editors and >3800 open issues. We already maintain a few patches in Webref for things that need fixing in CSS specs to get consistent data (these patches link back to issues raised against the specs). If most CSS specs need to be updated to provide additional semantics, that's likely going to require elbow grease both to convince CSS WG participants that the effort is worth prioritizing and to help with the actual updates. That's also why I'm trying to assess whether missing categories can already be determined automatically from available information.

@nzakas
Copy link
Author

nzakas commented Apr 10, 2025

Ah gotcha, thanks for explaining. 👍

@tidoust
Copy link
Member

tidoust commented Apr 28, 2025

I explored a bit the differences between MDN data and Webref, see underlying code in tidoust/mdn-webref, along with the results:

  1. The webref.json file, which could represent what we may want to end up with in Webref to ease consumption of data.
  2. The report, which highlights differences between the two projects.

As far as I can tell, missing data in Webref is mostly stuff that is non standard or that has been obsoleted, but that is still present in MDN data (and sometimes documented on MDN). I do not know to what extent that data is a must have in Webref. There's more data missing in MDN data, perhaps because the underlying features are more recent and not yet documented.

There may be a few cases where data needs to be slightly improved in specs so that it can start appearing in Webref. One example is <general-enclosed> which is currently defined in a <pre> tag without any class, skipped by the crawler as too generic. That seems easily fixable.

I still do not understand what syntaxes are meant to encompass. I managed to cover most of them by assembling functions and types, but that also creates hundreds of syntaxes that are not accounted for in MDN data. Are syntaxes used in practice? How?

(On top of the features themselves, I note that the grouping information in MDN data does not exist in Webref. That grouping seems more specific to MDN though. Same thing for links to MDN pages).

@nzakas
Copy link
Author

nzakas commented Apr 28, 2025

Syntaxes are used in CSSTree to enable validation:
https://github.com/csstree/csstree/blob/9558ba790daeda2b24935838bf89990699ece66e/lib/data.js#L7

Basically, the parser creates an AST and the lexer validates the AST against these syntax definitions.

@tidoust
Copy link
Member

tidoust commented May 1, 2025

Thanks @nzakas. I had not realized that entries in the "types" category in MDN data do not have a syntax key and that the "syntaxes" category collects that information. I'm not sure why functions are listed under the "syntaxes" category too, as that seems to duplicate the information already present in the functions.json file. All in all, I think the "syntaxes" category can be assembled by merging the "functions" and "types" categories, provided entries there do have a syntax key of course.

That initial exploration suggests that the categorization itself can be done automatically, with straightforward reasons that explain why some data is missing in Webref. That's a good first result!

I'll now look into actual syntax values to understand where and why Webref differs from MDN data. I somewhat expect to find more substantive differences as MDN data syntaxes are manually curated to match reality in main browsers if I understand things correctly, while Webref data is more meant to be a view of what latest specs drafts currently define, regardless of what browsers support. When specs lag behind implementations, they need fixing, knowing about the problem creates a good feedback loop. When specs are more recent than implementations, it may be challenging to select the right syntax automatically. Anyway, let's find out ;)

@nzakas
Copy link
Author

nzakas commented May 1, 2025

Thanks for the update and all of our work on this. 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants