[dfns] Add HTML prose definition when possible #1444

tidoust · 2023-12-12T16:08:32Z

Implements the logic discussed in https://github.com/w3c/respec/issues/4522

Ready for review but discussion is still ongoing, so code may still need to change. For instance, the data-defines attribute may end up taking another name. Also happy to change the name of the prose property in the dfns extract.

For each term defined in the specification being processed, the code now looks for some element flagged with a data-defines="#term-id" attribute. If such an element exists, a prose property gets added to the definition in the dfns extract with the HTML contents of that element.

The code applies some clean up to the HTML markup it attaches to the prose property:

All asides that authoring tools may add here and there get dropped
Any element that is not a simple block or inline content element gets dropped
All attributes are dropped

The clean up logic may need refinement over time once we gain experience with actual definitions. Open questions include:

Should we be stricter, e.g., only allowing <p>, <br>, and very common inline elements?
Should we keep href attributes (with an absolute URL) for <a> elements?
Should we keep title attributes for <abbr> elements?
Should we keep class attributes for <pre> elements to help with syntax highlighting?
Should we keep tables? Images?

There is no good mechanism in Reffy to report potential issues encountered during extraction for the time being. In the meantime, warnings get logged when the code bumps into elements that seem surprising in the context of a term definition.

Implements the logic discussed in https://github.com/w3c/respec/issues/4522 For each term defined in the specification being processed, the code now looks for some element flagged with a `data-defines="#term-id"` attribute. If such an element exists, a `prose` property gets added to the definition in the `dfns` extract with the HTML contents of that element. The code applies some clean up to the HTML markup it attaches to the `prose` property: - All asides that authoring tools may add here and there get dropped - Any element that is not a simple block or inline content element gets dropped - All attributes are dropped The clean up logic may need refinement over time once we gain experience with actual definitions. Open questions include: - Should we be stricter, e.g., only allowing `<p>`, `<br>`, and very common inline elements? - Should we keep `href` attributes (with an absolute URL) for `<a>` elements? - Should we keep `title` attributes for `<abbr>` elements? - Should we keep `class` attributes for `<pre>` elements to help with syntax highlighting? - Should we keep tables? Images? There is no good mechanism in Reffy to report potential issues encountered during extraction for the time being. In the meantime, warnings get logged when the code bumps into elements that seem surprising in the context of a term definition.

src/browserlib/extract-dfns.mjs

schemas/browserlib/extract-dfns.json

src/browserlib/extract-dfns.mjs

Per #1444 (comment)

Per comment for `dir`, `href`, and `lang`: #1444 (comment) The `title` attribute seems useful to keep as well for potential tooltips and expansion of abbreviations.

New feature: - [dfns] Add HTML prose definition when possible (#1444) Dependencies bumped: - Bump undici from 5.28.2 to 6.1.0 (#1449) - Bump rollup from 4.6.1 to 4.9.1 (#1447) - Bump puppeteer from 21.5.2 to 21.6.1 (#1446)

As discussed in speced#4522 See also w3c/reffy#1444

tidoust requested a review from dontcallmedom December 12, 2023 16:08

dontcallmedom reviewed Dec 13, 2023

View reviewed changes

tidoust mentioned this pull request Dec 13, 2023

Contextualizing definitions for export speced/respec#4522

Open

tidoust added 2 commits December 13, 2023 17:01

Rename prose to htmlProse

2c15ffa

Per #1444 (comment)

Keep dir, href, lang, title attributes

b8edf4f

Per comment for `dir`, `href`, and `lang`: #1444 (comment) The `title` attribute seems useful to keep as well for potential tooltips and expansion of abbreviations.

dontcallmedom approved these changes Dec 18, 2023

View reviewed changes

tidoust merged commit 2ed2ac5 into main Dec 20, 2023
1 check passed

tidoust deleted the extract-dfn-prose branch December 20, 2023 08:52

dontcallmedom added a commit to dontcallmedom/respec that referenced this pull request Dec 21, 2023

Add data-defines on well-known patterns of term/definition association

82a3cf3

As discussed in speced#4522 See also w3c/reffy#1444

dontcallmedom mentioned this pull request Dec 21, 2023

feat(core/dfn): add data-defines on some patterns of term/definition association speced/respec#4620

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dfns] Add HTML prose definition when possible #1444

[dfns] Add HTML prose definition when possible #1444

tidoust commented Dec 12, 2023

[dfns] Add HTML prose definition when possible #1444

[dfns] Add HTML prose definition when possible #1444

Conversation

tidoust commented Dec 12, 2023