Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: extend list of HTML entities #489

Merged
merged 2 commits into from
May 24, 2023

Conversation

zorkow
Copy link
Contributor

@zorkow zorkow commented May 17, 2023

Adds the full list of 2125 named character references that can be used in HTML.
This is a simplified version of https://html.spec.whatwg.org/entities.json.

lib/entities.js Fixed Show resolved Hide resolved
lib/entities.js Fixed Show fixed Hide fixed
lib/entities.js Fixed Show fixed Hide fixed
lib/entities.js Fixed Show fixed Hide fixed
lib/entities.js Fixed Show fixed Hide fixed
lib/entities.js Show resolved Hide resolved
* They contain all entries from `XML_ENTITIES`.
*
* @see XML_ENTITIES
* @see DOMParser.parseFromString
* @see DOMImplementation.prototype.createHTMLDocument
* @see https://html.spec.whatwg.org/#named-character-references WHATWG HTML(5) Spec
* @see https://html.spec.whatwg.org/entities.json JSON
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know how often this file changes?
Maybe it is in some version control where this information could be looked up?
(Not required for landing this PR, but I'm wondering whether it would make sense to run a scheduled action that checks whether our version is still in sync and maybe even creates a branch/PR with an update if there is one.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karfau
Copy link
Member

karfau commented May 24, 2023

This is a simplified version of

Can you provide a short overview how it is simplified?
(Anything beside removing redundant keys and only having the characters as a value?)

I also spotted that some keys in the original are mapped with and without trailing ; and some are not. Do you know why?

Copy link
Member

@karfau karfau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investing the time to prepare this

@karfau karfau changed the title Updates the HTML entities to the full list of 2125 elements. fix: extend list of HTML entities May 24, 2023
@karfau karfau merged commit ddfa511 into xmldom:master May 24, 2023
21 checks passed
@zorkow
Copy link
Contributor Author

zorkow commented May 25, 2023

Thanks for merging the PR. Do you already have an ETA for the 0.9.0 release?

@zorkow
Copy link
Contributor Author

zorkow commented May 25, 2023

Can you provide a short overview how it is simplified? (Anything beside removing redundant keys and only having the characters as a value?)

I also spotted that some keys in the original are mapped with and without trailing ; and some are not. Do you know why?

Simplification was really only removing duplicates, in particular, since the sax parser assume that all entities are formatted as &xyz, there was no point in keeping them around. As the note at https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references says, entities are only for legacy compatibility (IE?).

@karfau
Copy link
Member

karfau commented May 25, 2023

Thanks for merging the PR. Do you already have an ETA for the 0.9.0 release?

Sorry, no ETA for that yet.
But I promise a new pre-release within a month.

@karfau
Copy link
Member

karfau commented May 28, 2023

@zorkow what is your opinion on merging this change into 0.8 and naxbe even 0.8 branch as a fix?

@zorkow zorkow deleted the all_xml_entities branch May 30, 2023 11:44
@zorkow
Copy link
Contributor Author

zorkow commented May 30, 2023

@karfau I am all for that. That would allow us to release the next versions of the node library of SRE (v4.1) and MathJax (v4) based on @xmldom/xmldom, rather then further maintaining xmldom-sre.

karfau pushed a commit that referenced this pull request May 30, 2023
Adds the full list of 2125 named character references that can be used
in HTML.
This is a simplified version of
https://html.spec.whatwg.org/entities.json.

(cherry picked from commit ddfa511)
karfau pushed a commit that referenced this pull request May 30, 2023
Adds the full list of 2125 named character references that can be used
in HTML.
This is a simplified version of
https://html.spec.whatwg.org/entities.json.

(cherry picked from commit ddfa511)
@karfau
Copy link
Member

karfau commented May 30, 2023

This change is now part of the latest version 0.8.8
and 0.7.11

@zorkow
Copy link
Contributor Author

zorkow commented May 31, 2023 via email

@karfau
Copy link
Member

karfau commented Jun 1, 2023

Quick question: does that also include the compareDocumentPosition changes?

I'm currently not considering doing that for the feature you contributed in #488, this would only be part of 0.9.0, hope that still works for the things you mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants