Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create an XML5 standard #4436

Open
ExE-Boss opened this issue Mar 19, 2019 · 10 comments
Open

Create an XML5 standard #4436

ExE-Boss opened this issue Mar 19, 2019 · 10 comments
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest

Comments

@ExE-Boss
Copy link

ExE-Boss commented Mar 19, 2019

As has been mentioned in WICG/webcomponents#752 (comment), XML5 would give us the features of XML with the error recovery of the HTML5 parser.

Expected features:

  • XML namespaces
  • All tags can be self‑closing (no more need to do <script src="…"></script>, it can now be <script src="…"/>), this also future‑proofs us for whenever new void tags are added.
  • <![CDATA[…]]> sections
  • Processing instructions
  • No upper‑casing of tag names
  • HTML5 style error recovery
  • Nested <p> tags (a side effect of not having hard‑coded auto-closing behaviour in the parser)

This would give us the best of both worlds.

@domenic domenic added addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest labels Mar 19, 2019
@domenic
Copy link
Member

domenic commented Mar 19, 2019

Historically this has had no implementer interest (maybe Servo?), but we can have a tracking issue I suppose.

@SelenIT
Copy link

SelenIT commented Mar 26, 2019

Nested <p> tags

What is the use case for this?

@ExE-Boss
Copy link
Author

The nested <p> tags is in the expected features since XML5 wouldn’t have the HTML <p> tag auto‑close behavuour, since the XML parser is unaffected by the namespace, so it stands to reason that you could nest <p> tags.

<div>
  <p>
  <p>
  <p>
</div>

Would result in:

<div>
 └ <p>
    └ <p>
       └ <p>

@domenic
Copy link
Member

domenic commented Mar 26, 2019

That sounds like an argument against XML5, since it produces weird and un-semantic results. Whereas the OP's framing make it sound like nested p tags are a desired feature.

@annevk
Copy link
Member

annevk commented Mar 26, 2019

I think the main case for XML5 is that we've had multiple good ideas for text/html dropped because modifying the HTML parser was too involved and too risky:

  • SVG support in template.
  • Using a custom element where parsing rules prevent that, e.g., as a table row.
  • Representing shadow roots.
  • Void custom elements.

All of these will likely continue to come up however.

Another reason is that if we're going to have to maintain an XML parser forever anyway, we might as well make it do something useful.

Now changing parser behavior might become a risk for the XML parser too if its usage actually becomes more widespread due to these (and other) changes. We should somewhat carefully consider how to manage that.

(The other question is how likely it is that we'll end up with DOMChangeList or equivalent as the logical conclusion of that is a byte-based node tree representation.)

@kosek
Copy link

kosek commented Mar 26, 2019

Everything mentioned above except "HTML5 style recovery" can be done with normal XML. The question is whether this only additional feature is worth creating syntax slightly different from XML. There were many other attempts to redefine/simplify XML similar to XML5 but none succeeded because there are simply too many existing XML parsers around, many of them not maintained anymore. So as long as this new-XML is not strict subset of XML it will not work everywhere.

On the other hand I don't think there is anything that stops browsers from applying some correction steps to XML documents that are not well-formed in order to parse and display them. I can imagine that if XML document is not well-formed that browser will emit message to console and then switch to "XML5 parsing" in order to fix issues like missing end tags, quotes around attributes etc.

So better then creating another markup language standard I think it would be much better to just define recovery parsing algorithm for non well-formed XML documents that browsers will invoke when parsing non well-formed XML. Also I think that such lenient parser parser should be used only for pages not for content loaded through XHR. It would be too risky to automatically correct broken XML received from some API.

@ExE-Boss
Copy link
Author

ExE-Boss commented Mar 28, 2019

So better then creating another markup language standard I think it would be much better to just define recovery parsing algorithm for non well-formed XML documents that browsers will invoke when parsing non well-formed XML.

That’s kind‑of what I expect from XML5.

Maybe use that HTML5 style recovery supporting algorithm from the get‑go.

@ExE-Boss
Copy link
Author

ExE-Boss commented Apr 1, 2019

I’ve found this: https://ygg01.github.io/xml5_draft/

@Davilink
Copy link

Davilink commented Oct 13, 2020

The only reason i would want an XML5 standard is that we will be able to parse web page by using an xml parser and xml tools XPath and other, because for now in HTML5 it is recommend (why ???!?!?) that void element doesn't have a auto-close tag, like <br /> is not considered valid, it should be <br> but doing so break the use of a XML parser, and now we need and HTML parser engine that support all the HTML exception. In the early 2000 year, the XHTML standard was recommended by multiple tutorial to use because of the more strict nature of the XML that encouraged to write better HTML code.

https://crisp.tweakblogs.net/blog/321/html5-why-not-use-xml-syntax.html

image
src: https://google.github.io/styleguide/htmlcssguide.html#HTML_Validity
but after we have
image
src: https://google.github.io/styleguide/htmlcssguide.html#Optional_Tags

this is non-sense for me, just put the endtag... is not difficult and it is more coherent and readable

@SelenIT
Copy link

SelenIT commented Oct 19, 2020

we will be able to parse web page by using an xml parser and xml tools XPath and other

All of these is already possible in the XML syntax, which gets enabled by serving the documents with the proper Content-type HTTP header (e.g. application/xhtml+xml).

like <br /> is not considered valid, it should be <br>

That's not true. Both <br /> and <br> are valid in HTML syntax of HTML5, same goes for other void elements. The only thing to remember is that, unlike XML, this slash doesn't have anything to do with "closing", it's just kind of syntactic sugar to make transitioning from XHTML1 easier. In HTML syntax, technically, both are considered a start tag, and auto-closing right after the start tag for the void elements is hard-coded in the parsing algorithm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
addition/proposal New features or enhancements needs implementer interest Moving the issue forward requires implementers to express interest
Development

No branches or pull requests

6 participants