New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HTML4/XHTML1 Strict doctypes non-conforming #2048
Conversation
@@ -98010,62 +98010,6 @@ dictionary <dfn>StorageEventInit</dfn> : <span>EventInit</span> { | |||
<p>The <span>DOCTYPE legacy string</span> should not be used unless the document is generated from | |||
a system that cannot output the shorter string.</p> | |||
|
|||
<hr> | |||
|
|||
<!-- see the parser section before changing this bit --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was meant by this?
Also, in XHTML you still need to use a DOCTYPE kinda like this for entities and we still don't have a replacement. But I guess we shouldn't really let that influence what is okay for text/html
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<!-- see the parser section before changing this bit -->
What was meant by this?
Dunno for sure. Maybe @zcorpan knows better. But anyway I took it as a statement about effects as far as changing the contents of that section—not about dropping the whole thing entirely.
Also, in XHTML you still need to use a DOCTYPE kinda like this for entities and we still don't have a replacement
hmm yeah I had not thought about that, because the spec doesn’t give it as a reason
The parser has parse errors for doctypes other than the permitted ones I believe. The spec has this:
Does that not address the issue for the checker? I'd like to check the reasons we permitted these doctypes in the first place, why they are no longer relevant. Or what the effects will be if we change this (and the behavior of the checker). Will people replace all instances of "new" elements with |
I think we basically should not allow that kind of behavior. There should be only one path for checking HTML. Not version-dependent paths. |
No, because the checker does not (any longer) switch into any different modes based on the doctype—because I agree with what @annevk said:
|
OK, so then we should remove that paragraph as well. And change the HTML parser to emit more parse errors. Are you going to remove support for checking HTML4 from the checker completely? |
Good point—made it so.
I’d prefer to do that in a separate follow-up PR—since changing the parsing algorithm potentially affects browsers and all other parser implementations, while this PR as currently scoped only affects document conformance/authors and conformance checkers. |
Yes, from the HTML checker I’d like to remove any traces of HTML4-related checking that still remain. However, I guess the vnu source still needs to contain an HTML4-checking path as long as the https://validator.nu/ Web UI continues to offer an HTML4-checking option (which https://checker.html5.org/ and https://validator.w3.org/nu/ do not). The W3C will continue to offer HTML4 and XHTML1 checking using the legacy backend for those that https://validator.w3.org/ relies on. That is anyway what most people who want HTML4/XHTML1 checking actually use (not the https://validator.nu/ HTML4-checking option). |
I think we should do the parse errors in this PR too? They won't affect browsers, just checkers, and it seems good for them to be consistent with the requirements changed here. |
The gecko HTML parser exposes parse errors in its View source but yeah changes to parse errors otherwise don’t affect gecko parsing behavior, or behavior in any other browsers. That said, we do have other parsers that do error reporting—at least two of them I can think of.
OK, I can add them here. (FWIW my thinking had been that it would not be ideal to conflate into one PR both (A) document-conformance changes that have no normative requirements for parser implementors and (B) parser changes that do have normative requirements for implementors who have implemented the error-reporting parts of the parsing algorithm). |
I tend to agree that we want to land conformance changes on both sides. The parser and syntax section ought to be updated together since they rely on each other to some extent. The specification would be inconsistent otherwise. |
See 34c4d1b and lemme know if anything more beyond that needs changing in the parsing algorithm. |
See #2056 which eliminates the need for authors to be forced to forever continue putting obsolete XHTML1 doctypes in HTML documents that are served with XML mime types. Instead it just changes the spec to say:
|
Looks great, but can you or someone help work on a nice explanatory commit message for this? To avoid misunderstandings, I think we should stress exactly what this does and does not do, i.e. it removes the legacy XHTML and HTML 4 doctypes as conformant, so that only |
|
It was never intended that HTML4 Strict and XHTML1/1.1 Strict doctypes would remain conforming forever. Given that HTML4 is nearly 20 years old (and XHTML1 is just a reformulation of HTML4 in XML), it’s time to consider making the HTML4 Strict and XHTML1/1.1 Strict doctypes non-conforming—just as are all other HTML4 and XHTML1/1.1 doctypes (and HTML 3.2, etc., doctypes are).
The spec currently defines the HTML4 Strict and XHTML1/1.1 Strict doctypes are obsolete but still conforming—obsolete permitted DOCTYPEs—and says that “Authors should not use obsolete permitted DOCTYPEs, as they are unnecessarily long”.
The reason the spec states for allowing them in conforming documents is in order to “help authors transition from HTML4 and XHTML1”.
But at this point continuing to allow HTML4 Strict and XHTML1/1.1 Strict doctypes as conforming isn’t helping authors transition; instead it seems to be having the effect of continuing to proliferate use of those doctypes long past what rightly should have been their proper expiration date.
For the HTML checker I still get issue reports from authors requesting that if they put an HTML4 or XHTML1 doctype on a document, the checker should evaluate it using HTML4/XHTML1 requirements (as the SGML/DTD-based legacy W3C validator does and as validator.nu used to do) instead of requirements in the current HTML spec.
In other words, some authors are continuing to intentionally use the HTML4/XHTML1 doctypes so that their documents can be “valid” even though they contain markup that the current HTML spec defines as non-conforming.
So, it’d be helpful if we made the spec clearly disallow use of all legacy HTML doctypes, including the HTML4 Strict and XHTML1/1.1 Strict doctypes (the only remaining legacy docytpes still allowed).