Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dfns] Add HTML prose definition when possible #1444

Merged
merged 3 commits into from
Dec 20, 2023
Merged

Conversation

tidoust
Copy link
Member

@tidoust tidoust commented Dec 12, 2023

Implements the logic discussed in https://github.com/w3c/respec/issues/4522

Ready for review but discussion is still ongoing, so code may still need to change. For instance, the data-defines attribute may end up taking another name. Also happy to change the name of the prose property in the dfns extract.

For each term defined in the specification being processed, the code now looks for some element flagged with a data-defines="#term-id" attribute. If such an element exists, a prose property gets added to the definition in the dfns extract with the HTML contents of that element.

The code applies some clean up to the HTML markup it attaches to the prose property:

  • All asides that authoring tools may add here and there get dropped
  • Any element that is not a simple block or inline content element gets dropped
  • All attributes are dropped

The clean up logic may need refinement over time once we gain experience with actual definitions. Open questions include:

  • Should we be stricter, e.g., only allowing <p>, <br>, and very common inline elements?
  • Should we keep href attributes (with an absolute URL) for <a> elements?
  • Should we keep title attributes for <abbr> elements?
  • Should we keep class attributes for <pre> elements to help with syntax highlighting?
  • Should we keep tables? Images?

There is no good mechanism in Reffy to report potential issues encountered during extraction for the time being. In the meantime, warnings get logged when the code bumps into elements that seem surprising in the context of a term definition.

Implements the logic discussed in https://github.com/w3c/respec/issues/4522

For each term defined in the specification being processed, the code now looks
for some element flagged with a `data-defines="#term-id"` attribute. If such
an element exists, a `prose` property gets added to the definition in the
`dfns` extract with the HTML contents of that element.

The code applies some clean up to the HTML markup it attaches to the `prose`
property:
- All asides that authoring tools may add here and there get dropped
- Any element that is not a simple block or inline content element gets dropped
- All attributes are dropped

The clean up logic may need refinement over time once we gain experience with
actual definitions. Open questions include:

- Should we be stricter, e.g., only allowing `<p>`, `<br>`, and very common
inline elements?
- Should we keep `href` attributes (with an absolute URL) for `<a>` elements?
- Should we keep `title` attributes for `<abbr>` elements?
- Should we keep `class` attributes for `<pre>` elements to help with syntax
highlighting?
- Should we keep tables? Images?

There is no good mechanism in Reffy to report potential issues encountered
during extraction for the time being. In the meantime, warnings get logged when
the code bumps into elements that seem surprising in the context of a term
definition.
src/browserlib/extract-dfns.mjs Show resolved Hide resolved
src/browserlib/extract-dfns.mjs Outdated Show resolved Hide resolved
src/browserlib/extract-dfns.mjs Show resolved Hide resolved
schemas/browserlib/extract-dfns.json Outdated Show resolved Hide resolved
src/browserlib/extract-dfns.mjs Show resolved Hide resolved
Per comment for `dir`, `href`, and `lang`:
#1444 (comment)

The `title` attribute seems useful to keep as well for potential tooltips and
expansion of abbreviations.
@tidoust tidoust merged commit 2ed2ac5 into main Dec 20, 2023
1 check passed
@tidoust tidoust deleted the extract-dfn-prose branch December 20, 2023 08:52
tidoust added a commit that referenced this pull request Dec 20, 2023
New feature:
- [dfns] Add HTML prose definition when possible (#1444)

Dependencies bumped:
- Bump undici from 5.28.2 to 6.1.0 (#1449)
- Bump rollup from 4.6.1 to 4.9.1 (#1447)
- Bump puppeteer from 21.5.2 to 21.6.1 (#1446)
dontcallmedom added a commit to dontcallmedom/respec that referenced this pull request Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants