Skip to content

Support character references for &, <, >, ' and "#5

Merged
mpdude merged 1 commit intomasterfrom
character-references
Jun 15, 2022
Merged

Support character references for &, <, >, ' and "#5
mpdude merged 1 commit intomasterfrom
character-references

Conversation

@mpdude
Copy link
Copy Markdown
Member

@mpdude mpdude commented Jun 15, 2022

Polyglot HTML 5 markup (i. e. HTML 5 written in a way to be valid XML) only uses very few named entity references:

Polyglot markup uses only the following named entity references:
amp lt gt apos quot

https://www.w3.org/TR/html-polyglot/#named-entity-references

To support working with content that has been created before HTML5 – that is, XHTML1 – we substitute all named and character references with their plain values, which should not pose a problem in UTF-8 content. Only &amp;, &lt;, &gt;, &quot; and &apos; shall be kept.

We missed, however, that e. g. &amp; can also be written as &#38;; similar for the other characters. This PR adds support for these cases as well.

@mpdude mpdude merged commit 296711b into master Jun 15, 2022
@mpdude mpdude deleted the character-references branch June 15, 2022 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant