Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML needs a mechaism for extending the parsing algorithm #8114

Open
plinss opened this issue Jul 18, 2022 · 4 comments
Open

HTML needs a mechaism for extending the parsing algorithm #8114

plinss opened this issue Jul 18, 2022 · 4 comments
Labels
addition/proposal New features or enhancements topic: parser

Comments

@plinss
Copy link

plinss commented Jul 18, 2022

The HTML parsing algorithm is supposed to allow generating a consistent DOM from any given HTML input. However as new elements are added, from time to time changes are made to the parsing algorithm, e..g adding new elements as flow content.

While this is ultimately convenient for authors, it results in a different DOM structure in older clients.

HTML should have a mechanism (ideally declarative) for expressing parsing behavior so that older clients can produce the correct DOM when handling new content. This would also allow web component authors to opt-in to the same kinds of authoring improvements.

@domenic
Copy link
Member

domenic commented Jul 18, 2022

This is just XHTML, right?

@plinss
Copy link
Author

plinss commented Jul 18, 2022

In theory linking to a formal schema document could satisfy this, but this needn't have such a heavy solution.

One possibility could be a meta tag that describes a single element's parsing behavior, another could be a micro-syntax within the element's open tag (like maybe a sigil just before the >).

While XHTML had its issues, it did offer some flexibility which we lost. We traded that flexibility for authoring simplicity and a parser algorithm that was supposed to be invariant. That invariance has been broken several times, and likely will be again. Let's try to find a better solution that allows the flexibility to innovate while not breaking code.

@annevk
Copy link
Member

annevk commented Aug 29, 2022

While in theory this seems interesting, in practice I haven't seen a proposal for this that maintains all the good qualities of HTML syntax. Meta-syntax is just not very ergonomic (or internally consistent, at this point) and also introduces its own set of risks.

@annevk annevk added the addition/proposal New features or enhancements label Aug 29, 2022
@hsivonen
Copy link
Member

Having site-supplied declarations that affect parsing would cause parsing actions at a distance that would be hard to connect in a sensible way to all entry points to parsing. (It seems unlikely that a meta would travel into fragment parsing invocations, for example.)

Moreover, a new solution introduced now would only work prospectively when we come across this problem the next time in the future. It wouldn't solve the issue at hand relative to implementations of the current parsing algorithm already out there.

However, if a site is willing to take extra steps to accommodate already-deployed implementations, we already have syntax for that: using explicit end tags, i.e. not omitting any end tags (</p> in particular) that the spec says are permissible to omit. This can even be automated on the server side by parsing (with an up-to-date implementation) and immediately reserializing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements topic: parser
Development

No branches or pull requests

4 participants