Create an XML5 standard #4436

ExE-Boss · 2019-03-19T10:13:26Z

As has been mentioned in WICG/webcomponents#752 (comment), XML5 would give us the features of XML with the error recovery of the HTML5 parser.

Expected features:

XML namespaces
All tags can be self‑closing (no more need to do <script src="…"></script>, it can now be <script src="…"/>), this also future‑proofs us for whenever new void tags are added.
<![CDATA[…]]> sections
Processing instructions
No upper‑casing of tag names
HTML5 style error recovery
Nested <p> tags (a side effect of not having hard‑coded auto-closing behaviour in the parser)

This would give us the best of both worlds.

The text was updated successfully, but these errors were encountered:

domenic · 2019-03-19T13:57:32Z

Historically this has had no implementer interest (maybe Servo?), but we can have a tracking issue I suppose.

SelenIT · 2019-03-26T07:16:14Z

Nested <p> tags

What is the use case for this?

ExE-Boss · 2019-03-26T08:06:22Z

The nested <p> tags is in the expected features since XML5 wouldn’t have the HTML <p> tag auto‑close behavuour, since the XML parser is unaffected by the namespace, so it stands to reason that you could nest <p> tags.

<div>
  <p>
  <p>
  <p>
</div>

Would result in:

<div>
 └ <p>
    └ <p>
       └ <p>

domenic · 2019-03-26T11:40:58Z

That sounds like an argument against XML5, since it produces weird and un-semantic results. Whereas the OP's framing make it sound like nested p tags are a desired feature.

annevk · 2019-03-26T13:26:28Z

I think the main case for XML5 is that we've had multiple good ideas for text/html dropped because modifying the HTML parser was too involved and too risky:

SVG support in template.
Using a custom element where parsing rules prevent that, e.g., as a table row.
Representing shadow roots.
Void custom elements.

All of these will likely continue to come up however.

Another reason is that if we're going to have to maintain an XML parser forever anyway, we might as well make it do something useful.

Now changing parser behavior might become a risk for the XML parser too if its usage actually becomes more widespread due to these (and other) changes. We should somewhat carefully consider how to manage that.

(The other question is how likely it is that we'll end up with DOMChangeList or equivalent as the logical conclusion of that is a byte-based node tree representation.)

kosek · 2019-03-26T17:39:42Z

Everything mentioned above except "HTML5 style recovery" can be done with normal XML. The question is whether this only additional feature is worth creating syntax slightly different from XML. There were many other attempts to redefine/simplify XML similar to XML5 but none succeeded because there are simply too many existing XML parsers around, many of them not maintained anymore. So as long as this new-XML is not strict subset of XML it will not work everywhere.

On the other hand I don't think there is anything that stops browsers from applying some correction steps to XML documents that are not well-formed in order to parse and display them. I can imagine that if XML document is not well-formed that browser will emit message to console and then switch to "XML5 parsing" in order to fix issues like missing end tags, quotes around attributes etc.

So better then creating another markup language standard I think it would be much better to just define recovery parsing algorithm for non well-formed XML documents that browsers will invoke when parsing non well-formed XML. Also I think that such lenient parser parser should be used only for pages not for content loaded through XHR. It would be too risky to automatically correct broken XML received from some API.

ExE-Boss · 2019-03-28T16:52:21Z

So better then creating another markup language standard I think it would be much better to just define recovery parsing algorithm for non well-formed XML documents that browsers will invoke when parsing non well-formed XML.

That’s kind‑of what I expect from XML5.

Maybe use that HTML5 style recovery supporting algorithm from the get‑go.

ExE-Boss · 2019-04-01T00:16:07Z

I’ve found this: https://ygg01.github.io/xml5_draft/

Davilink · 2020-10-13T19:28:56Z

The only reason i would want an XML5 standard is that we will be able to parse web page by using an xml parser and xml tools XPath and other, because for now in HTML5 it is recommend (why ???!?!?) that void element doesn't have a auto-close tag, like <br /> is not considered valid, it should be <br> but doing so break the use of a XML parser, and now we need and HTML parser engine that support all the HTML exception. In the early 2000 year, the XHTML standard was recommended by multiple tutorial to use because of the more strict nature of the XML that encouraged to write better HTML code.

https://crisp.tweakblogs.net/blog/321/html5-why-not-use-xml-syntax.html

src: https://google.github.io/styleguide/htmlcssguide.html#HTML_Validity
but after we have

src: https://google.github.io/styleguide/htmlcssguide.html#Optional_Tags

this is non-sense for me, just put the endtag... is not difficult and it is more coherent and readable

SelenIT · 2020-10-19T02:46:44Z

we will be able to parse web page by using an xml parser and xml tools XPath and other

All of these is already possible in the XML syntax, which gets enabled by serving the documents with the proper Content-type HTTP header (e.g. application/xhtml+xml).

like <br /> is not considered valid, it should be <br>

That's not true. Both <br /> and <br> are valid in HTML syntax of HTML5, same goes for other void elements. The only thing to remember is that, unlike XML, this slash doesn't have anything to do with "closing", it's just kind of syntactic sugar to make transitioning from XHTML1 easier. In HTML syntax, technically, both are considered a start tag, and auto-closing right after the start tag for the void elements is hard-coded in the parsing algorithm.

domenic added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create an XML5 standard #4436

Create an XML5 standard #4436

ExE-Boss commented Mar 19, 2019 •

edited

domenic commented Mar 19, 2019

SelenIT commented Mar 26, 2019 •

edited

ExE-Boss commented Mar 26, 2019

domenic commented Mar 26, 2019

annevk commented Mar 26, 2019

kosek commented Mar 26, 2019

ExE-Boss commented Mar 28, 2019 •

edited

ExE-Boss commented Apr 1, 2019

Davilink commented Oct 13, 2020 •

edited

SelenIT commented Oct 19, 2020

Create an XML5 standard #4436

Create an XML5 standard #4436

Comments

ExE-Boss commented Mar 19, 2019 • edited

Expected features:

domenic commented Mar 19, 2019

SelenIT commented Mar 26, 2019 • edited

ExE-Boss commented Mar 26, 2019

domenic commented Mar 26, 2019

annevk commented Mar 26, 2019

kosek commented Mar 26, 2019

ExE-Boss commented Mar 28, 2019 • edited

ExE-Boss commented Apr 1, 2019

Davilink commented Oct 13, 2020 • edited

SelenIT commented Oct 19, 2020

ExE-Boss commented Mar 19, 2019 •

edited

SelenIT commented Mar 26, 2019 •

edited

ExE-Boss commented Mar 28, 2019 •

edited

Davilink commented Oct 13, 2020 •

edited