Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvesting terms that aren't exported? #1072

Closed
mattgarrish opened this issue Nov 7, 2023 · 7 comments
Closed

Harvesting terms that aren't exported? #1072

mattgarrish opened this issue Nov 7, 2023 · 7 comments

Comments

@mattgarrish
Copy link
Member

We had an issue filed against the Publication Manifest specification that we're exporting the term "URL", causing a conflict with the URL spec, but we aren't exporting any definitions from that specification that I can find (plus it's three years since we published it, so odd it would pop up now).

It wasn't our intention to export terms and we don't use the "export" class on any of the definitions. Anyone here able to decipher what is going on?

/cc @iherman

@tidoust
Copy link
Member

tidoust commented Nov 8, 2023

Happy to decipher ;)

The Editor's Draft has no problem. Webref correctly extracts the definition of URL from the Editor's Draft as an internal one (that's what the "access": "private" property signals).

The problem is the published Recommendation. That version does not follow the usual definitions data model.

When it bumps into a spec that does not follow the definitions data model, the crawler extracts definitions as exported by default. This rule allows us to crawl old specs, published at a time when there was no real notion of internal/exported definitions. This is what happens here and the definition of URL for the /TR version has an "access": "public" property as a result.

I'm surprised that the Recommendation, published not so long ago (end of 2020), does not follow the definitions data model. I suppose that means that some ad-hoc generation tool got used to produce it?

I'll add the spec to a list of exceptions in the crawler to have all definitions extracted as private fom now on. I will also add Audiobooks to that list. Let me know if there are other specs that follow the same pattern.

tidoust added a commit to w3c/reffy that referenced this issue Nov 8, 2023
The published versions of these specs do not follow the definitions data model.
Reffy thinks it's dealing with old specs as a result, and flag the definitions
as exported.

See w3c/webref#1072 for details.
tidoust added a commit to w3c/reffy that referenced this issue Nov 8, 2023
The published versions of these specs do not follow the definitions data model.
Reffy thinks it's dealing with old specs as a result, and flag the definitions
as exported.

See w3c/webref#1072 for details.
@iherman
Copy link
Member

iherman commented Nov 8, 2023

@tidoust, first, thanks for taking care of this.

I want to react on one of your remarks, though, to avoid future issues (with other documents):

The problem is the published Recommendation. That version does not follow the usual definitions data model.

Our specification does use, systematically, the <dfn> element to define terms. However, the "definitions data model" was, so far, unknown to us (at least to me); the reference you gave is indeed a reference to the bikeshed tooling, and we used respec.

Looking at some of the details, I presume you refer to the usage (or not) of the data-dfn-type attribute which, in the documentation you provided, has indeed a rich set of values. Well, though the attribute is mentioned in the respec documentation, is only defined for two values, namely for idl or dfn, and it is not mentioned as a "required" attribute. We indeed did not use this attribute in the spec, it was not clear what role it plays.

The only reason I mention this is to understand what the real problem was, to try to avoid similar clashes with other Recommendations in the making (that use respec). It may be necessary to sync up the documentation of bikeshed and respec on that topic.

@tidoust
Copy link
Member

tidoust commented Nov 8, 2023

Our specification does use, systematically, the <dfn> element to define terms. However, the "definitions data model" was, so far, unknown to us (at least to me); the reference you gave is indeed a reference to the bikeshed tooling, and we used respec.

The reason I'm surprised is precisely because you use ReSpec ;) The documentation is not explicit about that, but ReSpec and Bikeshed (Wattsi as well for the HTML spec) are aligned on the definitions data model.

Looking at some of the details, I presume you refer to the usage (or not) of the data-dfn-type attribute which, in the documentation you provided, has indeed a rich set of values

Yes, and also the data-export attribute that signals when definitions are to be viewed as exported (and a couple of other attributes that you probably do not need in Publication Manifest and Audiobooks).

Well, though the attribute is mentioned in the respec documentation, is only defined for two values, namely for idl or dfn, and it is not mentioned as a "required" attribute. We indeed did not use this attribute in the spec, it was not clear what role it plays.

ReSpec supports more than two values, I believe, but life would be too easy if documentation was up-to-date...

You indeed do not need to add data-dfn-type attributes yourself in the ReSpec source in most cases. ReSpec takes care of that for you when it generates the resulting HTML. That's also the reason why the definitions in the Editor's Draft are fine, ReSpec does the right magic there.

The only reason I mention this is to understand what the real problem was, to try to avoid similar clashes with other Recommendations in the making (that use respec).

The alignment to the definitions data model allows us to use a single set of extraction rules to generate data in Webref, which is then fed back into Bikeshed and Respec to populate their cross-references databases.

A single set... or so we wish. As mentioned, we still need to deal with older specs that don't know anything about the definitions data model. And there remain a few specs that are generated by other tools, either because they are developed elsewhere (e.g., IETF RFCs, TC39 specs) or because the groups have specific requirements and prefer to maintain the source in another format, using dedicated tools to generate the HTML versions (e.g., WebAssembly specs).

There is no requirement to follow the definitions data model in any case. Again, the reason I'm surprised here is that this should come for free if you use ReSpec. How did you generate the TR version from the Editor's Draft?

It may be necessary to sync up the documentation of bikeshed and respec on that topic.

Oh yes. For the definitions data model, we've tried to do that in https://github.com/speced/spec-dfn-contract. That work needs some love before it can replace the current description in Bikeshed though.

@mattgarrish
Copy link
Member Author

Again, the reason I'm surprised here is that this should come for free if you use ReSpec. How did you generate the TR version from the Editor's Draft?

This is what confuses me, unless there was some bug in the respec export back when we did pub-manifest and audiobooks.

So far as I remember, we've always used the built-in html export to get the static documents. For the EPUB spec, everything seems to be fine. The one term we didn't explicitly export ("value") isn't in the database while all the other are.

@tidoust
Copy link
Member

tidoust commented Nov 8, 2023

OK, thanks for the confirmation. Then I think it's just me being lost in time and thinking that complete support for the definitions data model shipped earlier than it actually did in ReSpec!

And indeed, looking at history, I see that ReSpec only started adding "data-dfn-type"="dfn" attributes everywhere in July 2021: https://github.com/w3c/respec/pull/3667

In your case, since the spec only contains dfn types of definitions, that would indeed mean no data-dfn-type attribute in the 2020 Recommendation. I think I got confused because, when we started Webref, we focused on CSS/IDL specs, and these specs have had data-dfn-type attributes way before July 2021, precisely because they include definitions whose type is not dfn.

Presence of a data-dfn-type attribute is what we use as a marker to assert that a spec follows the definitions data model or not. I'll see how I can amend the rule in the crawler to consider that specs generated with ReSpec before July 2021 should still be seen as following the right model.

In the meantime, note I added the specs as exceptions to the rules in the crawler, and the data in Webref is now correct for the TR version as well. That should resolve the initial conflict.

@mattgarrish
Copy link
Member Author

That should resolve the initial conflict.

Fantastic, thanks for all the help figuring this out!

@iherman
Copy link
Member

iherman commented Nov 8, 2023

Thanks @tidoust.

I guess, at this point, we can close this issue... @mattgarrish I let you the honour of doing so 😀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants