Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict the main element to be used once per document #3354

Merged
merged 12 commits into from Jan 23, 2018

Conversation

@annevk
Copy link
Member

annevk commented Jan 15, 2018

Multiple main elements are still allowed if all but one have the hidden attribute set.

This also slightly alters the definition of the body element as it effectively encompasses all content of a document, except for some metadata.

Fixes #100.


/acknowledgements.html ( diff )
/dom.html ( diff )
/grouping-content.html ( diff )
/indices.html ( diff )
/sections.html ( diff )

Multiple main elements are still allowed if all but one have the hidden attribute set.

This also slightly alters the definition of the body element as it effectively encompasses all content of a document, except for some metadata.

Fixes #100.
nit
@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 15, 2018

@tobie, I think there's a differ bug here, outside of the changes made. Note how https://html.spec.whatwg.org/multipage/dom.html#phrasing-content and https://whatpr.org/html/3354/dom.html#phrasing-content both say "Text, in the context of content models" but in https://whatpr.org/html/3354/97d8432...38465fa/dom.html it's "Text , in the context of content models" with more whitespace. Seems like this is the case in lots of other places too. Can you take a look?

@tobie

This comment has been minimized.

Copy link

tobie commented Jan 15, 2018

@foolip: yeak that's an unfortunate but known issue of the HTML diff service that pr-preview uses. It adds line breaks all over the place, which obviously creates issues with inline elements.

Copy link
Member

foolip left a comment

Looking at which elements still have just "flow content" as their content model, and which get "flow content excluding main", I'm not sure I understand the reasoning.

The places where main is already disallowed are nav, aside, header and footer. And then header and footer are disallowed in address, dt, th.

Also disallowing main in address, dt and th seems fine, if not particularly useful. I don't quite see why it should be disallowed in article, section, blockquote, td, figure or form. It doesn't particularly make sense, but neither does dd, dialog, figcaption or li where it is still allowed.

I'd like to err on the side of allowing, basically.

@@ -15172,7 +15175,7 @@ interface <dfn>HTMLBodyElement</dfn> : <span>HTMLElement</span> {};
<dd w-nohtml>Uses <code>HTMLBodyElement</code>.</dd>
</dl>

<p>The <code>body</code> element <span>represents</span> the main content of the document.</p>
<p>The <code>body</code> element <span>represents</span> the contents of the document.</p>

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

To be clear, this is a change that could have been made without the other changes in this PR, right? This was inaccurate before as well, since headers and footers could also be inside the body. If that's not the right way to think of it, I might be missing something that I should be paying attention to elsewhere in these changes.

This comment has been minimized.

Copy link
@annevk

annevk Jan 15, 2018

Author Member

Yeah, I haven't actually looked at when this was written, but that seems correct.

This comment has been minimized.

Copy link
@annevk

annevk Jan 15, 2018

Author Member

It seems this description of the body element goes back to 2006, before the main element was introduced.

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

Thanks for checking that.

source Outdated

<p class="note">While there is no restriction as to the number of <code>main</code> elements in a
document, web developers are encouraged to stick to a single element.</p>
<p>A document must not have more than one <code>main</code> element, unless all but one of the

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

I don't think the difference between should and must will matter much, but is there some logic to when we use which? Looks like we have 12 "element must not" and 3 "element should not", but I can't really tell why.

Is it intentional that having only <main hidden> isn't allowed? Seems like a harmless thing to allow, if, for example, a script populates it and then removes the hidden attribute.

This comment has been minimized.

Copy link
@annevk

annevk Jan 15, 2018

Author Member

I think must is preferable if you can, but sometimes the requirement might be hard to fulfill and then should can be reasonable.

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

"must" wins in number of occurrences, so OK.

data-x="attr-hidden">hidden</code> attribute on those that are not current:

<pre>&lt;!doctype html>
&lt;html lang=en-CA>

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

Good example!

source Outdated
@@ -16118,7 +16121,8 @@ Space is not the only void</pre>
<dt><span data-x="concept-element-contexts">Contexts in which this element can be used</span>:</dt>
<dd>Where <span>flow content</span> is expected.</dd>
<dt><span data-x="concept-element-content-model">Content model</span>:</dt>
<dd><span>Flow content</span>, but with no <code>header</code>, <code>footer</code>, or <code>main</code> element descendants.</dd>
<dd><span>Flow content excluding main</span>, but with no <code>header</code> or

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

I'd prefer "Flow content excluding main" to be expanded in place, treating main more like other elements. Here: "Flow content, but with no header, footer or main element descendants." (unchanged, I think)

This comment has been minimized.

Copy link
@annevk

annevk Jan 15, 2018

Author Member

I started with that and expected the review feedback to go the other way given that repeating yourself too much is never great. Happy to flip it back though.

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

I may well have asked for the opposite if you did, who knows :)

I think the changes will be fewer if we trim the cases where main is disallowed, but let's start with figuring that out, and then inline the changes if the changes are no more than currently.

source Outdated
@@ -42744,7 +42779,8 @@ interface <dfn>HTMLTableCellElement</dfn> : <span>HTMLElement</span> {
<dt><span data-x="concept-element-contexts">Contexts in which this element can be used</span>:</dt>
<dd>Where <span>flow content</span> is expected.</dd>
<dt><span data-x="concept-element-content-model">Content model</span>:</dt>
<dd><span>Flow content</span>, but with no <code>form</code> element descendants.</dd>
<dd><span>Flow content excluding main</span>, but with no <code>form</code> element

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

This is the form element, any reason to change this?

This comment has been minimized.

Copy link
@annevk

annevk Jan 15, 2018

Author Member

For the same reason I changed td and blockquote. If there's one main in a document it doesn't make sense as a descendant of them.

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 15, 2018

Given where main makes sense out of these elements it seems it basically has to be a child of either body or div. Given that it doesn't make sense as being "flow content". I've updated this PR as appropriate. While doing so I also found that the indexes needed updating too.

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 15, 2018

It'd be good to get feedback from @stevefaulkner on that last bit though as the W3C fork was in a somewhat inbetween state, where you could still nest main in td, blockquote, or section for example. I suspect that was an oversight, but would be good to be sure.

@stevefaulkner

This comment has been minimized.

Copy link
Contributor

stevefaulkner commented Jan 15, 2018

@annevk, section not an oversight, as there is no rule to stop use of a section which effectively contains everything in the document it seemed reasonable to allow in this case. Also given sections practical meaning as = div, it seemed OK to allow. As for blockquote and td yes they should be included in restrictions, it was an oversight.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 15, 2018

@tobie, seems like something's wrong with https://whatpr.org/html/3354/97d8432...8f1b75f/dom.html, commit 8f1b75f reverted "Flow content excluding main" but it's shown as an addition.

source Outdated
@@ -15142,7 +15140,8 @@ interface <dfn>HTMLStyleElement</dfn> : <span>HTMLElement</span> {
<dt><span data-x="concept-element-contexts">Contexts in which this element can be used</span>:</dt>
<dd>As the second element in an <code>html</code> element.</dd>
<dt><span data-x="concept-element-content-model">Content model</span>:</dt>
<dd><span>Flow content</span>.</dd>
<dd>Zero or one <code>main</code> element optionally intermixed with <span>flow

This comment has been minimized.

Copy link
@foolip

foolip Jan 15, 2018

Member

Should be zero or more to allow for <main hidden>, right?

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 15, 2018

Since we're down to debating what should happen with each case where flow content is allowed, here an exhaustive listing of the elements where "content model" has "flow content" in its description, and the validity of <main> within:

elementwas allowednow allowed
addressyesno (change)
articleyesno (change)
asidenono
blockquoteyesno (change)
bodyyesyes
captionyesno (change)
ddyesno (change)
detailsyesno (change)
dialogyesno (change)
divyesyes
dtyesno (change)
fieldsetyesno (change)
figcaptionyesno (change)
figureyesno (change)
footernono
formyesno (change)
headernono
liyesno (change)
mainyesno (change)
navnono
sectionyesno (change)
tdyesno (change)
thyesno (change)

This is somewhat simplified because the content model of some elements, like figure, is more complicated.

@tobie

This comment has been minimized.

Copy link

tobie commented Jan 15, 2018

@foolip: OK, seems there's an AWS issue that's not being properly handled by pr-preview. Have you observed this elsewhere?

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 15, 2018

@tobie, this is the first time I know that I've looked at a preview again after PR changes, so couldn't say if this has happened elsewhere.

annevk added 2 commits Jan 15, 2018
@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 15, 2018

@stevefaulkner I think I'd prefer trying it out with restricting it from within section too as I think it makes little sense, but that's subject to review from @sideshowbarker.

Speaking of which, it'd be good to have an HTML checker PR lined up as well so when this change is made folks can verify the impact on their own sites.

@stevefaulkner

This comment has been minimized.

Copy link
Contributor

stevefaulkner commented Jan 15, 2018

@annevk I am sanguine in regards to its restriction, as I recall it was restricted, but after discussion with Marcos Caceres I changed it to allow, but can't find the thread :-(

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 15, 2018

I'd also like to hear from @sideshowbarker if there are some principles about validator warnings/errors we should keep in mind, or indeed principles about conformance generally.

As it is, I'm inclined to err on the side of allowing even things that seemingly don't make much sense unless we think they are both plausible mistakes and would cause user confusion/harm. Going for just body and div as the allowed parents, I'd like to take a look at what proportion of documents out there would become invalid. I don't want to go overboard with measuring and statistics, but it would be very bad if we made lots of harmless things invalid because we hadn't thought them plausible.

Copy link
Member

marcoscaceres left a comment

Just noting that @foolip question, "Is it intentional that having only <main hidden> isn't allowed?" wasn't fully addressed yep. Might be worth clarifying that - so noting it in case it was forgotten.

Apart from that, LGTM!

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 16, 2018

I tried to think that through and I think requiring at least one without hidden is better, for progressive enhancement. Otherwise you'd end up with a mostly blank document if JavaScript is disabled, assuming <main> is used correctly to mark up the main contents.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 16, 2018

I don't think we should make any attempt to ensure that documents have some useful content with JavaScript disabled. Currently, <body></body> is valid, and so is <table></table>, both cases that don't make sense on their own.

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 16, 2018

That is incorrect, we have a should requirement on that being not the case. See palpable content.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 16, 2018

https://html.spec.whatwg.org/multipage/dom.html#palpable-content says:

As a general rule, elements whose content model allows any flow content or phrasing content should have at least one node in its contents that is palpable content and that does not have the hidden attribute specified.

Note: Palpable content makes an element non-empty by providing either some descendant non-empty text, or else something users can hear (audio elements) or view (video or img or canvas elements) or otherwise interact with (for example, interactive form controls).

This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.

Conformance checkers are encouraged to provide a mechanism for authors to find elements that fail to fulfill this requirement, as an authoring aid.

That's a lot of leeway for validators to point out things that might be wrong, and I'd really prefer to err on the side of allowing that which might be reasonable.

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 16, 2018

When a document has multiple main elements, at least all but one must have the hidden attribute specified. The remaining main element should not have the hidden attribute specified.

Note: this requirement is analogous to that for [palpable content].

@sideshowbarker

This comment has been minimized.

Copy link
Member

sideshowbarker commented Jan 16, 2018

FYI, I’m in the process of adding use counters to the HTML checker backend to get some relevant data back on main usage — for example, the percentage of documents checked that contain main at all, any main with the hidden attribute, multiple unhidden main, any main within section, etc.

I had hoped to get those use counters added today my time but it’s looking like I won’t until some time tomorrow (= 8+ hours from now).

sideshowbarker added a commit to validator/validator that referenced this pull request Jan 17, 2018
Relates to whatwg/html#3354
@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 19, 2018

@sideshowbarker

@foolip, given #3354 (comment), it seems maybe you already agree the cases other than main in section or form aren’t worth spending more time together on — so that is, for those other cases, the spec should define those as non-conforming (as @annevk wrote it already in this PR). Right?

Thanks for doing all that research! Are your tables for just the parent element so that <body><form><div><main> would could in the <div> category?

In any case, your numbers to confirm that it's really only section and form that are worth worrying about, as the rest makes less sense and is less common.

So I dunno maybe that httparchive data needs to doublechecked to see why it unexpectedly shows main in section being used only 1.3 times as often as main in form.

Bugs are certainly possible, but shouldn't we expect rather different results from these two ways of measuring, just like how Chrome's use counters can show high usage for something that's still very hard to find in httparchive? For the checker the bias is people who tried to get it right. Since the <form> bits seem to be from an ASP framework, maybe that's just used a lot and people can't control the all of the markup and thus don't bother to validate?

So, should we allow form ancestors? I think it will look a bit curios that we allowed it but not for example li. @stevefaulkner, would it make sense to allow it only if the form has no accessible name, and what does that map to? Having a name attribute?

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 19, 2018

@annevk another thing that kinda fell off the radar is <div hidden><main>. Should we bother with that case? @sideshowbarker, at what point does it just start to get annoying to explain in the error messages for the validator?

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 19, 2018

@foolip per feedback from @AliceWonderMiscreations I think we should allow all main elements to be hidden, if any are present, which would include that case.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 19, 2018

I certainly support that change, but I was wondering about the case where an ancestors has the hidden attribute. The spec as written requires it to be on the main elements.

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 19, 2018

Oh I see, I'd rather not go there and keep things as simple as we can.

@stevefaulkner

This comment has been minimized.

Copy link
Contributor

stevefaulkner commented Jan 19, 2018

@foolip

would it make sense to allow it only if the form has no accessible name, and what does that map to? Having a name attribute?

Makes sense. In this case accessible name is provided from aria-label/aria-labelledby with fallback to title attribute if present and neither of previous are present.

Note: as far as I am aware the name attribute is not used in accname calculation on any element.

@sideshowbarker

This comment has been minimized.

Copy link
Member

sideshowbarker commented Jan 19, 2018

I was wondering about the case where an ancestors has the hidden attribute.

Oh I see, I'd rather not go there and keep things as simple as we can.

Me too. At least unless/until we get feedback from users who are frustrated by it being that way (e.g., bug reports against the HTML checker).

@sideshowbarker, at what point does it just start to get annoying to explain in the error messages for the validator?

Well for the record here and since you asked, I don’t think it’d not be any more annoying to explain in a checker message than what the current message already says, which is just:

Error: A document must not include more than one visible main element.

Seems like I wouldn’t even need to change that message at all — since it already doesn’t what element the hidden attribute happens to be on.

And also as far as implementing “the case where an ancestors has the hidden attribute” on the checker backend, it wouldn’t be much additional work over what’s already implemented. (The existing check is already written in custom Java code rather than being expressed in the RelaxNG schema.)

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 19, 2018

@foolip @stevefaulkner I thought there was agreement that allowing <form> was fine because of the ASP.NET pattern? If we require something of that <form> that would basically not help existing content. I also don't see why we'd equate form with li.

@stevefaulkner

This comment has been minimized.

Copy link
Contributor

stevefaulkner commented Jan 19, 2018

@annevk @foolip

I thought there was agreement that allowing

was fine because of the ASP.NET pattern?

yes

If we require something of that that would basically not help existing content.

i thought @foolip meant disallowing only when the form element had an accessible name, which i would be surprised if it it effects any of the asp.net cases, but am fine with not restricting, as its likely that it will be caught by nesting restrictions anyway. Yeah I am unsure about the li connection as it is seriously unlikely that it would be used to wrap the main content of a document.

annevk added 4 commits Jan 19, 2018
@sideshowbarker

This comment has been minimized.

Copy link
Member

sideshowbarker commented Jan 19, 2018

Are your tables for just the parent element so that <body><form><div><main> would could in the <div> category?

Not just for the parent element but instead for any ancestor. So it records a count for <body><form><div><main> in three places: as part of the total 47684, for main in form as well as for main in div.

For the checker the bias is people who tried to get it right.

I kind of wish that were the case, but for better or worse I think the reality is that most documents which get fed to the checker are not necessarily from people trying to get it right. Whenever I grab a random URL from the checker logs, I find the document at the URL typically has many markup errors and many CSS errors — often literally dozens of both.

I think in part the circumstance behind that fact is, a lot of URLs are getting sent to the checker by some kind of automatic means not be the authors who wrote them but instead … in some other weird way for some odd reason(s).

And that’s the situation even after having set up a lot of filtering on the checker backend to block certain bad-behaving browser extensions and other things that are clearly junk requests.

So I think it’s not necessarily safe to assume much at all on scale about the average usage pattern for the checker. I think for every document it processes that’s submitted from an author/developer trying to do the right thing, there’s at least one document it processes that’s a unbelievably bad mess of really horrible markup and CSS.

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 19, 2018

I made it to allow only "form without accessible name" as that seems reasonable. I also updated how I defined the hierarchy restriction to make it more similar to what we do for area as that avoids a lot of duplication throughout the document. I added Alice Wonder to the acknowledgments section (@AliceWonderMiscreations let me know if you prefer a different name there).

I think this is ready for a fresh round of reviews as I've addressed all the nits I found myself.

@foolip

This comment has been minimized.

Copy link
Member

foolip commented Jan 19, 2018

OK, so let's leave the hidden attribute ancestor stuff alone, it'd be a pretty straightforward tweak later.

@foolip
foolip approved these changes Jan 19, 2018
Copy link
Member

foolip left a comment

This all looks good to me now!


<p class="note">While there is no restriction as to the number of <code>main</code> elements in a
document, web developers are encouraged to stick to a single element.</p>
<p>A <dfn>hierarchically correct <code>main</code> element</dfn> is one whose ancestor elements

This comment has been minimized.

Copy link
@foolip

foolip Jan 19, 2018

Member

This is great, simplified a lot of the other bits. Thanks!

@annevk

This comment has been minimized.

Copy link
Member Author

annevk commented Jan 19, 2018

I'll land this next Tuesday, barring any objections. It'd be great to get explicit confirmation from @AliceWonderMiscreations on #3354 (comment) before then though.

Also, thanks everyone for the help and my apologies about the delay in getting this done and all the angst it has caused over the intervening years. (To be clear, I realize more needs to be done and I will focus on that next.)

@aardrian

This comment has been minimized.

Copy link

aardrian commented Jan 19, 2018

@iandevlin:

@stevefaulkner Yes, ASP.NET requires everything to be wrapped in a <form> because the entire framework revolves around POST commands.

I thought that mostly disappeared in ASP.NET. If Microsoft is moving away / has moved away from it, that may not be sufficient justification.

@AliceWonderMiscreations

This comment has been minimized.

Copy link

AliceWonderMiscreations commented Jan 19, 2018

#3354 is fine way to reference me, thank you.

@annevk annevk merged commit 1dec930 into master Jan 23, 2018
2 checks passed
2 checks passed
Participation annevk participates on behalf of Mozilla Corporation
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@annevk annevk deleted the annevk/main-restriction branch Jan 23, 2018
@sideshowbarker

This comment has been minimized.

Copy link
Member

sideshowbarker commented Jan 31, 2018

The HTML checker now also implements the “hierarchically correct” restriction; validator/validator@8024cd8

@JaninaSajka

This comment has been minimized.

Copy link

JaninaSajka commented Feb 1, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
10 participants
You can’t perform that action at this time.