Proposal to Rephrase Success Criterion 4.1.1 #2525

cstrobbe · 2022-06-22T21:01:30Z

Since many accessibility testers and other accessibility experts erroneously interpret the phrase "elements are nested according to their specifications" as referring to content models instead of syntactical nesting, a rewording that avoids this misunderstanding is highly desirable. Below is a proposed rewording.

In content implemented using markup languages, the following are true, except where the specifications for the markup languages being used allow exceptions to these requirements:

elements have complete start and end tags,

elements are nested according to the syntactical rules of their specifications,

elements do not contain duplicate attributes, and

any IDs are unique.

Note: Start and end tags that are missing a critical character in their formation, such as a closing angle bracket or a mismatched attribute value quotation mark are not complete.

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

Note: When a scripting language is used to manipulate elements or attributes (or both) in the Document Object Model, the resulting in-memory representation is still regarded as "content implemented using markup languages".

Description of the changes:

The second condition has been reworded to highlight syntactical correctness (as opposed to validity of content models).
The second note, which is new, draws attention to this. The first note is identical to the note in the WCAG 2.1 recommendation from June 2018.
The numbering is new; the phrase "the following are true" is copied from other success criteria, such as SC 1.2.1 and SC 2.2.2.
The third note addresses an issue unrelated to the distinction between correct syntax and validity.

Compatibility with existing versions of WCAG 2 and EN 301 549: Since the proposed rewording results in a requirement that is less strict than the interpretation of most accessibility testers, all documents that pass the current version of SC 4.1.1 should also pass its proposed rewording. In this sense, the proposed rewording is compatible with the current version.

Clause 9.4.1.1 of EN 301 549 says, "Where ICT is a web page, it shall satisfy WCAG 2.1 Success Criterion 4.1.1 Parsing". Unless the editors of EN 301 549 want to retain the current version of the success criterion, no rewording of clause 9.4.1.1 is needed beyond, at some future point in time, an update of the referenced version of WCAG.

Not addressed by this proposal: The proposal does not address whether unbalanced attribute quoting counts as a failure of SC 4.1.1. (See the discussion on the failure examples in F70.) The first note mentions "a mismatched attribute value quotation mark" as an example an incomplete start tag, but notes are non-normative and the SC does not say anything about attribute syntax. Adding a fifth condition, such as "attribute syntax is used according to specification", might therefore be interpreted as making the SC stricter than in WCAG 2.1.

The intent of this rephrasing is not to “defend” the many types of validation errors that accessibility testers flag using this success criterion. My intent is merely to eliminate a common misunderstanding about what the success criterion actually means. (See my comment on issue #978 for notes about how XML's concept of well-formedness informed the wording of the success criterion.) If non-syntactical validation errors which impact accessibility are found, these should be caught either by existing success criteria or by new ones that still need to be created.

Clarification in response to Alastair Campbell's comment on content models (18.07.2022).
"Content models" refers to the descriptions of what each element may contain. For example, in HTML 5 a div may not be nested inside a span. In SGML and in the early days of XML, content models were described in DTDs. For example,
<!ELEMENT chapter (chaptertitle, (para | heading)+)>
This line declares the element chapter and says it must contain a chaptertitle followed by at least one para or heading. In HTML 5, content models are not expressed in DTDs, XML Schemas or similar formal languages, but described in text. See, e.g. "content model" under The section element, which says that this element may only contain flow content.

The text was updated successfully, but these errors were encountered:

bruce-usab · 2022-06-27T12:44:53Z

@cstrobbe I am very grateful for your close analysis and explanation. That said, AGWG has had great difficulty advancing two considerably more trivial (and essentially editorial) changes to normative phrasing in 2.0.

My own preference would be to incorporate all of this into Understanding and other related supporting materials, as you are doing, for example with #2187.

GreggVan · 2022-06-27T21:28:54Z

I think this all goes back to the intent and understand of the people who first created the SC. That would determine if it was errata or a change in their understanding of what the SC was intended to mean. Unfortunately — a) we don’t have the ability to talk to all those on the working group and. b) all public reviewers - both those that commented and those that were satisfied or relied on the wording and therefore did not comment are involved in the process as well. Thus we cannot judge intent or understanding of these — so cannot judge if it is errata. We would need treat is as a change and a loosening of the SC. This things that complied with 2.2 would fail earlier versions - which is something the group so far has been reluctant to do. (Errata change previous versions so the problem does not arise for errata.) Even changing the understanding document to say something other than what the SC says would be a problem. The Understanding doc is just to explain the SC - not to say that something should be different. So the only way this can advance would be for the WG to decide to change current policy. We should gather these types of things up in one place and decide if we want to / should do that or not. g

cstrobbe · 2022-06-28T09:36:27Z

@GreggVan
My rewording is based on what I remember from the discussions in the WG at the time, the statement in Success Criterion 4.1.1: Parsing that XML's well-formedness is close to what the SC requires (in other words, requiring correct content models goes beyond well-formedness and beyond the SC) and what I could reconstruct from publicly available older discussions (see my comment on What does nested according to the specification mean in SC 4.1.1). The rationale for my proposal is that it represents what was always intended.

I understand that we can't contact everyone who was involved in those discussions, but a clarification on what "nested according to their specifications" means has been requested for years. If WCAG 2.2 adopted the proposed wording, non-compliance with older versions would be caused by auditors going beyond the intent of the current SC 4.1.1. If this is better handled by means of errata, then I'm perfectly happy with that.

JAWS-test · 2022-07-02T03:37:26Z

@zcorpan was also of the opinion like @cstrobbe , but then changed his mind

alastc · 2022-07-18T10:10:59Z

I think the test for an errata would be whether this would clarify the SC without changing the intended meaning:

"In content implemented using markup languages, elements have complete start and end tags, elements are nested according to the syntactical rules of their specifications, elements do not contain duplicate attributes, and any IDs are unique, except where the specifications allow these features."

The understanding document includes phrases like:

the content is created according to the rules defined in the formal grammar for that technology. In markup languages, errors in element and attribute syntax and failure to provide properly nested start/end tags lead to errors that prevent user agents from parsing the content reliably. Therefore, the Success Criterion requires that the content can be parsed using only the rules of the formal grammar.

The concept of "well formed" is close to what is required here.

I'm not entirely sure what @cstrobbe meant by "content models", but the above seems like a reasonable suggestion for an errata.

GreggVan · 2022-07-18T20:09:25Z

hmmmm I agree with

... the test for an errata would be whether this would clarify the SC without changing the intended meaning:

The concern here is that the edit narrows and rigidifies beyond the original SC. For example If the spec says "there is no requirement to nest them strictly, but if you do then use this syntax" Your edit would have the effect of deleting the first half of the sentence - and only use PART of the spec that is syntax, and ignoring the part of the spec that says there is no requirement to. So I think the edit can change it. What do you think gregg ——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

…

On Jul 18, 2022, at 3:11 AM, Alastair Campbell ***@***.***> wrote: I think the test for an errata would be whether this would clarify the SC without changing the intended meaning: "In content implemented using markup languages, elements have complete start and end tags, elements are nested according to the syntactical rules of their specifications, elements do not contain duplicate attributes, and any IDs are unique, except where the specifications allow these features." The understanding document includes phrases like: the content is created according to the rules defined in the formal grammar for that technology. In markup languages, errors in element and attribute syntax and failure to provide properly nested start/end tags lead to errors that prevent user agents from parsing the content reliably. Therefore, the Success Criterion requires that the content can be parsed using only the rules of the formal grammar. The concept of "well formed" is close to what is required here. I'm not entirely sure what @cstrobbe <https://github.com/cstrobbe> meant by "content models", but the above seems like a reasonable suggestion for an errata. — Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNGDXUEGL2KLJRLGV4K2VLVUUUT3ANCNFSM5ZRXNZ6A>. You are receiving this because you were mentioned.

cstrobbe · 2022-07-19T06:36:55Z

Gregg Vanderheiden wrote,

If the spec says "there is no requirement to nest them strictly, but if you do then use this syntax"
Your edit would have the effect of deleting the first half of the sentence - and only use PART of the spec that is syntax, and ignoring the part of the spec that says there is no requirement to.

I think the wording of my suggestion avoided that issue:

except where the specifications (...) allow exceptions to these requirements

GreggVan · 2022-07-19T07:34:04Z

It does indeed - but I think it still leaves the ambiguity since people will have trouble determining what that means. Kind of like a double negative — but not exactly. Hmmm how to fix How about a slight tweak to your language My question is - DOES HTML5 strictly require "nested according to it's syntactical rules " I don’t know the answer. BY THE WAY - the INTENT was 1 3 and 4 but NOT requiring strict nesting because HTML then did not require it. So you could have H1 H3 H1 H2 nesting and be ok There was a LOT of discussion and people who argued no H3 without H2, but in the end — what was agreed on did not require this. Best Gregg

patrickhlauke · 2022-07-19T09:39:13Z

BY THE WAY - the INTENT was 1 3 and 4 but NOT requiring strict nesting because HTML then did not require it. So you could have H1 H3 H1 H2 nesting and be ok
There was a LOT of discussion and people who argued no H3 without H2, but in the end — what was agreed on did not require this.

but that's not nesting, that's heading levels ... or am i missing something here?

stevefaulkner · 2022-07-19T10:24:03Z

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

So the following incorrect content model nesting is OK as far as the Criterion is concerned?
<button><a href="#"></a></button>

or

<ul>
<div>
<li>
<li>
</div>
</ul>

alastc · 2022-07-19T14:20:06Z

@GreggVan - My understanding that 4.1.1 is requiring correct nesting of tags, nothing to do with the order of headings (which are under 1.3.1).

Like Steve's button / list example above (which fails in the validator as "Element div not allowed as child of element ul in this context").

The proposed errata (adding "syntactical") could help to disambiguate the perception of it being more than what was intended.

I'm fairly sure the spec on headings doesn't require any particular ordering of heading tags.

However, it would catch things like <button><h2>Thing</h2></button>.

cstrobbe · 2022-07-19T14:56:38Z

Regarding the examples in Steve Faulkner's comment: these don't contain syntactic problems, so they meet the SC. (Obviously, their content models are wrong, which is why the validator will report errors for those two examples.)

As Alastair Campbell has pointed out, the hierarchy of headings is irrelevant to this SC. That is not a syntactical issue.

The validator catches <button><h2>Thing</h2></button> because the content model is wrong. However, from a purely syntactical point of view, the nesting is fine (so it wouldn't violate SC 4.1.1). Browsers can build up an unambiguous parse tree based on that code. However, there may be an issue with the role exposed to the accessibility API. If that is the case, the code snippet violates a different SC, namely SC 4.1.2.

patrickhlauke · 2022-07-19T14:58:45Z

However, it would catch things like <button><h2>Thing</h2></button>

but that's still going beyond syntax (well-formedness) ... so which is it?

GreggVan · 2022-07-19T16:36:02Z

Ah I see what you mean. OK Would be good to have examples in Understanding doc so people can distinguish between syntactic and content model gregg ——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

…

On Jul 19, 2022, at 7:56 AM, cstrobbe ***@***.***> wrote: Regarding the examples in Steve Faulkner's comment <#2525 (comment)>: these don't contain syntactic problems, so they meet the SC. (Obviously, their content models are wrong, which is why the validator will report errors for those two examples.) As Alastair Campbell has pointed out, the hierarchy of headings is irrelevant to this SC. That is not a syntactical issue. The validator catches <button><h2>Thing</h2></button> because the content model is wrong. However, from a purely syntactical point of view, the nesting is fine. Browsers can build up an unambiguous parse tree based on that code. However, there may be an issue with the role exposed to the accessibility API. If that is the case, the code snippet violates a different SC, namely SC 4.1.2. — Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNGDXXM4VPRREJZX2VSLATVU263BANCNFSM5ZRXNZ6A>. You are receiving this because you were mentioned.

alastc · 2022-07-21T16:40:49Z

The validator catches <button><h2>Thing</h2></button> because the content model is wrong. However, from a purely syntactical point of view, the nesting is fine (so it wouldn't violate SC 4.1.1).

I guess there is a difference between syntactical and content model that I'm missing.

I thought that those types of errors were exactly what 4.1.1 were supposed to catch, i.e. nested according to the spec.

cstrobbe · 2022-07-21T16:55:11Z

I guess there is a difference between syntactical and content model that I'm missing.

Please see my update from 18.07.2022 at the top of the page.

I thought that those types of errors were exactly what 4.1.1 were supposed to catch, i.e. nested according to the spec.

That's the misunderstanding this issue attempts to address. This is about syntactical nesting so web content can be "accurately parsed into a data structure" (quoted from Understanding SC 4.1.1).

The understanding document also says,

The concept of "well formed" is close to what is required here. However, exact parsing requirements vary amongst markup languages, and most non XML-based languages do not explicitly define requirements for well formedness. Therefore, it was necessary to be more explicit in the success criterion in order to be generally applicable to markup languages.

Validating content models goes beyond what is intended here.

patrickhlauke · 2022-07-21T17:11:43Z

I guess there is a difference between syntactical and content model that I'm missing.

it's the difference between syntax and grammar - think of it in terms of a word document...spell checking checks the syntax (words are spelled right), but not grammar (that words are arranged in such a way that they make proper sentences)

cstrobbe · 2022-07-21T20:30:39Z

If you want to make comparisons with natural language, Chomsky's famous Colorless green ideas sleep furiously is a better analogy: it is syntactically correct and can be parsed into a tree structure. But it makes no sense semantically.

giacomo-petri · 2022-07-22T14:46:00Z

I had a parallel discussion (I was not aware about this proposal yet) in ACT-Rules about something similar (act-rules/act-rules.github.io#1893).

<label for="first-name">
    <span>First name</span>
    <input type="text" name="fn" value="">
</label>

and

<input type="file" id="test" />
<label for="test">Flash the screen 
    <select size="1">
        <option selected="selected">1</option>
        <option>2</option>
        <option>3</option>
    </select>
    times.
</label>

Initially, I was unconsciously supporting this new issue proposal, assuming that "elements are nested according to their specifications" and "H74: Ensuring that opening and closing tags are used according to specification (HTML)" referred to the syntactical rules of their specifications.

But @Jym77 pointed out that

H74 cover the bit about correct nesting (Step 3  in the test procedure). So, I would say that a label with a for pointing elsewhere than the nested labellable does not pass H74 (and certainly does not pass the "elements are nested according to their specifications" bit of 4.1.1).

where, essentially, code examples above are failing 4.1.1 in the first place.
In fact, the examples provided are something not allowed by the label content model specs.

But, per this new proposal,

elements are nested according to the syntactical rules of their specifications

...

Note: Syntactically correct nesting is distinct from nesting according to the content models specified in a technical specification. The second condition of the success criterion does not require correct content models; only correct syntax.

if the content model is no longer relevant in terms of 4.1.1 Parsing, they are no longer failing the 4.1.1 SC.

In addition, per Input Accessible Name and Description Computation rules is still not clear how to calculate the label, as point n.2 states

Otherwise use the associated label element(s) accessible name(s) - if more than one label is associated; concatenate by DOM order, delimited by spaces.

which is quite ambiguous as in the first code example, the for attribute does not exist, in the second code example instead we have a combination of both for/id attributes and nested content, which is quite unpredictable.

Last, but not least, browsers behave inconsistently (more details in act-rules/act-rules.github.io#1893); for example Safari provides a label, while Chrome doesn't.

Do we expect this scenario is failing 1.1.1, 1.3.1, 2.5.3, 3.3.2, 4.1.2 success criteria due to the discrepancy with content model but passing 4.1.1 thanks to the correct syntax?

patrickhlauke · 2022-07-22T14:58:42Z

it would likely fail 4.1.2 if the end result is a lack of accessible name, and probably 1.3.1 for lack of explicit association/relationship

cstrobbe · 2022-07-22T15:46:47Z

But @Jym77 pointed out that

H74 cover the bit about correct nesting (Step 3  in the test procedure). So, I would say that a label with a for pointing elsewhere than the nested labellable does not pass H74 (and certainly does not pass the "elements are nested according to their specifications" bit of 4.1.1).

That does not mean that the code fails SC 4.1.1; it means that the code is not using technique H74. Not using technique H74 does not automatically mean you fail the SC the technique addresses.

Neither of those code examples contains syntactical issues in the context of HTML syntax. In the context of XML syntax, the first example would not be well-formed. But in HTML syntax, the input element has no end tag (it's a void element). Whether the label element can contain an input element is not a syntactical question but a matter of content models, and the content model allows it (i.e. as a way of labelling that input).

The second code example does not exhibit any syntax issues but the label element seems to label two controls, i.e. both the one above it and the one inside it. The HTML specification does not seem to define which type of labelling takes precedence, the one defined by the for attribute or the one based on nesting. Hence, the relationship between the label and the control it labels visually (i.e. the control below it) cannot be determined programmatically in an unambiguous manner, so the code seems to violate SC 1.3.1.

giacomo-petri · 2022-07-22T17:22:11Z

But @Jym77 pointed out that

H74 cover the bit about correct nesting (Step 3  in the test procedure). So, I would say that a label with a for pointing elsewhere than the nested labellable does not pass H74 (and certainly does not pass the "elements are nested according to their specifications" bit of 4.1.1).

That does not mean that the code fails SC 4.1.1; it means that the code is not using technique H74. Not using technique H74 does not automatically mean you fail the SC the technique addresses.

It was not exactly the case; I was supporting the thesis that the label example was not failing 4.1.1 because of point 4 of 4.1.1 sufficient techniques 4 (that includes 3 sufficient techniques), as in my opinion all of them were passed.
@Jym77 just pointed out that the first sufficient technique of these group of 3 one is not passing; he is not saying that for this reason it's a 4.1.1 failure.

JAWS-test · 2022-07-23T13:23:29Z

Hi @alastc,

I would like to ask you to bring the issue to a decision in a timely manner. Background: In the European Union thousands of web sites are checked according to EN 301 549. Most of the violations are found for SC 4.1.1. As an example I would like to mention the German monitoring report: https://www.bfit-bund.de/DE/Downloads/eu-bericht-pdf.pdf;jsessionid=7266E7F6DCC8058D664888E08830EC21?__blob=publicationFile&v=2, page 102. So it seems that the most important accessibility problem is 4.1.1, because it is violated the most. The problems found with 4.1.1 are largely due to incorrect nesting (which is what this issue is about). Only rarely does a duplicate ID show up as a problem. The other errors described in 4.1.1 do not occur in practice because they are automatically corrected by the browser. If 4.1.1 were reworded as suggested by @cstrobbe, 4.1.1 would finally regain the weight it deserves: namely, a low weight. And we could take care of the really important problems of accessibility!

GreggVan · 2022-07-23T14:48:41Z

+1 I sent some suggestions to @cstrobbe <https://github.com/cstrobbe> for wording to make it clearer. So I concur with importance of making this clear and avoiding semantic model nesting issues from syntactic - which is what this is about. It is about breaking AT by giving it content it can’t PARSE. NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do. gregg ——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

…

On Jul 23, 2022, at 6:23 AM, JAWS-test ***@***.***> wrote: Hi @alastc <https://github.com/alastc>, I would like to ask you to bring the issue to a decision in a timely manner. Background: In the European Union thousands of web sites are checked according to EN 301 549. Most of the violations are found for SC 4.1.1. As an example I would like to mention the German monitoring report: https://www.bfit-bund.de/DE/Downloads/eu-bericht-pdf.pdf;jsessionid=7266E7F6DCC8058D664888E08830EC21?__blob=publicationFile&v=2 <https://www.bfit-bund.de/DE/Downloads/eu-bericht-pdf.pdf;jsessionid=7266E7F6DCC8058D664888E08830EC21?__blob=publicationFile&v=2>, page 102. So it seems that the most important accessibility problem is 4.1.1, because it is violated the most. The problems found with 4.1.1 are largely due to incorrect nesting (which is what this issue is about). Only rarely does a duplicate ID show up as a problem. The other errors described in 4.1.1 do not occur in practice because they are automatically corrected by the browser. If 4.1.1 were reworded as suggested by @cstrobbe <https://github.com/cstrobbe>, 4.1.1 would finally regain the weight it deserves: namely, a low weight. And we could take care of the really important problems of accessibility! — Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNGDXRQGXJ6QQ3FFQCVXPDVVPW53ANCNFSM5ZRXNZ6A>. You are receiving this because you were mentioned.

JAWS-test · 2022-07-23T16:28:45Z

@GreggVan

NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do.

In the past there was AT, which accessed the source code and not the DOM. That's why correct source code was important. As far as I know, there is no AT today that accesses the source code. If there were, it would be outdated and quite useless, since web content today is not primarily source code, but source code + CSS + Javascript. The browsers create the (corrected) DOM from this and pass this on to the Accessibility API. The AT uses either the API or the DOM. AT, which would use the source code, would not be able to recognize correct content on many pages, because the content is generated or changed dynamically and thus does not appear in the source code at all. That's why I think that for 4.1.1 we should only care about what is generated as DOM by the browsers.

patrickhlauke · 2022-07-23T18:49:14Z

NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do.

note that the error correction mechanisms are now a documented part of the HTML specification (while in the past, this was all undocumented and left up to mysterious black box browser heuristics, which is in part the reason for 4.1.1 because it was trying to avoid that devs just relied on testing in their favourite browser and missed how other browsers would parse broken content)

GreggVan · 2022-07-23T19:43:28Z

+1 gregg ——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

…

On Jul 23, 2022, at 11:49 AM, Patrick H. Lauke ***@***.***> wrote: NOTE - the fact that browsers accommodate errors was pointed out when working on WCAG 2.0 - but that does not help AT that needs to parse the content. It is only if the browsers actually repair the content - and the AT can use that repaired content — that we can ignore the errors that browsers accommodate. AT developers don’t have as deep of pockets to detect and repair bad content as browsers do. note that the error correction mechanisms are now a documented part of the HTML specification (while in the past, this was all undocumented and left up to mysterious black box browser heuristics, which is in part the reason for 4.1.1 because it was trying to avoid that devs just relied on testing in their favourite browser and missed how other browsers would parse broken content) — Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNGDXXUC7Z7EG664HRJ35DVVQ5DJANCNFSM5ZRXNZ6A>. You are receiving this because you were mentioned.

cstrobbe · 2022-07-23T19:43:39Z

Double negative?

What follows is another attempt to address Gregg Vanderheiden's comment about something that may look like a double negative.

All SGML-like languages (HTML 4.x, XHTML, HTML 5, SVG, MathML, etc.) rely on hierarchy and a type of syntax tree (of which the DOM is the best-known example). Parsing a document into a syntax tree requires correct nesting at the syntactical level. Each element is always a child of another element, unless it is the document element or root (html in HTML formats). As a consequence, code such as <para>...<bold> ... </para> ... </bold> is prohibited in all SGML-like languages and leads to a parse error, regardless whether the language defines (content models for) para and bold or not. So when we say, "elements are nested according to the syntactical rules of their specifications", we are actually paraphrasing something that SGML, XML and HTML 5 have in common. [1] The exception "except where the specifications allow these features" would not be relevant to these languages.

The exception may be relevant to other types of markup languages, such as TeX and LaTex, but these (1) are geared towards typesetting, (2) would present serious challenges to meet many of the other WCAG success criteria when read directly by user agents and (3) are usually rendered into another format, most frequently PDF. PDF is not a markup language, so SC 4.1.1 does not apply to it. (In practice, we would probably lose nothing by deleting "except where the specifications allow these features", but I don't want to increase resistance to my proposal by adding that change.)

One issue with the current wording is that I haven't commented on yet is how "except where the specifications allow these features" seems to work. "These features" seems to refer to

"elements have complete start and end tags",
"elements are nested according to [the syntactical rules of] their specifications",
"elements do not contain duplicate attributes" and
"any IDs are unique".

SGML-like languages do not merely "allow" these features, they are rather basic requirements. The intended meaning is "except where the specifications allow exceptions to these requirements", but (1) the current wording says the opposite and (2) I don't know of any markup languages on the web that allow the intended exceptions.

One inconvenience with "elements are nested according to the syntactical rules of their specifications" is that HTML, unlike XML, does not cleanly separate syntax and validity, so a parser will always refer to content models when parsing an HTML document into a tree. For example, <p><h1>Content Model Validity</h1></p> can be perfectly parsed into a tree if syntax is the only thing you look at. But if you look at the content model for the p element in HTML 5, you'll notice that p cannot contain heading elements and that it is an element whose end tag can be omitted when it is followed by a heading element. So the browser turns the code into the following: <p></p><h1>Content Model Validity</h1></p>: it closes the first p element, and the end tag </p> is orphaned, which causes a parsing error. If we want to get around this interference between validity and syntax, it may be better to write something like "elements don't overlap at the syntactical level" or "elements don't overlap in the syntax tree". This would fail something like <strong><em></strong></em> but not <p></p><h1>Content Model Validity</h1></p>. If we also want to fail the latter code sample, we would need to add a condition such as "the syntax tree does not contain orphaned end tags". (I am avoiding the term "Document Object Model", because the DOM is not a data structure or a set of data structures.)

Examples for the Understanding doc

I think it would be beneficial to add some examples to Understanding Success Criterion 4.1.1: Parsing, which currently doesn't have an examples section.

Example 1

<span><div>...</div></span> is not valid HTML because the content model for the span element does not allow div elements. However, the code is syntactically unambiguous and can be parsed into a data structure. Therefore, it does not fail SC 4.1.1.

Example 2

<a href="https://www.w3.org/"><a href="https://www.w3.org/WAI/">Web Accessibility Initiative</a></a> is not valid because the content model for the a element does not allow any other interactive content. However, the code is syntactically unambiguous and can be parsed into a data structure. Therefore, it does not fail SC 4.1.1.

Example 3

<p>The raven himself is hoarse  
<p>That croaks the fatal entrance of Duncan  
<p>Under my battlements. Come, you spirits`

This code snippet meets the success criterion when used in an HTML document (but not in XHTML): it is both syntactically correct and valid. The p element is an element where the end tag is optional. A browser is able to parse it and implicitly add the end tags as follows:

<p>The raven himself is hoarse</p>
<p>That croaks the fatal entrance of Duncan</p>
<p>Under my battlements. Come, you spirits</p>

Example 4

<ul>
  <div>
    <li>List item 1
    <li>List item 2
  </div>
</ul>

This code snippet is not valid because the content model of ul does not allow a div element as a child element. However, the code is syntactically unambiguous and can be parsed into a data structure. Therefore, it does not fail SC 4.1.1.

Example 5

<p><input id="username" name="username" type="text" /></p>
<p><label for="username">College:
  <select size="1">
    <option selected="selected">Foxe College</option>
    <option>Jordan College</option>
    <option>Wordsworth College</option>
  </select>
</label>

This code snippet is invalid because a select element that is a descendant of a label element with a for attribute must have an ID that matches the value of the for attribute. Since the code can be parsed unambiguously into a data structure, it does not fail SC 4.1.1. However, the HTML specification does not define which type of labelling takes precedence: the one defined by the for attribute or the one based on nesting. Hence, the relationship between the label and the control it labels visually (i.e. the control below it) cannot be determined programmatically in an unambiguous manner. If browsers determine that the first input's accessible name is the label element (including its descendant, the select element), the code violates SC 1.3.1.

[1] What is perhaps confusing is that non-XML languages such as HTML 4 and HTML 5 allow certain elements to omit the end tag. (See Optional tags in the HTML 5 spec.) For example, <div><p>Elements have complete start and end tags.</div> is perfectly valid in both HTML 4 and HTML 5 but not in XHTML. The p element is entirely contained in the div element because the browser "knows" that when it encounters the end tag </div>, it can close all child elements where end tags may be omitted, so it silently inserts a </p> just before the </div> and it is perfectly possible to create an unambiguous parse tree. For the purpose of SC 4.1.1, <div><p>Elements have complete start and end tags.</div> is perfectly fine (except in XHTML) since the elements are correctly nested. It is important to bear in mind that the SC requires elements to be properly nested without referring to the details of how optional tags work.

(For tree construction and syntactically correct nesting in HTML 5, I refer to the stack of open elemennts, the section Tree construction and the sections on misnested tags: Misnested tags: <b><i></b></i> and Misnested tags: <b><p></b></p>.)

JAWS-test · 2022-11-18T19:31:38Z

@bruce-usab

I do not agree with this characterization. The intention of 4.1.1, as with all SC, is to find actual errors

I would disagree with that. On the one hand it says differently in 4 and 4.1 and on the other hand 4.1.1 is about avoiding potential problems that could result from wrong source code / DOM, but don't have to:

Ensuring that Web pages have complete start and end tags and are nested according to specification helps ensure that assistive technologies can parse the content accurately and without crashing.

It's not like every violation of 4.1.1 will crash an AT, but it increases the likelihood that the AT won't work correctly - at least that's the Understanding's explanation, which thus supports my reading rather than yours

JAWS-test · 2022-11-18T19:44:29Z

@bruce-usab

Citation please for "not always"?

According to the HTML specification, an ul element may contain only li elements. If an ul element contains a div element, this is not a violation of 4.1.2 because an ul is not a user interface component.

Whether it is a violation of 1.3.1 depends on the text content and visual presentation. Visually, it can be a list that contains list items and text that refers to all list items, i.e. is part of the list, but not a list item itself. This is not representable in HTML, but it is visually representable and structurally describable. Thus, an ul with div would not be a violation of 1.3.1 because it correctly reflects the visual information. But: An AT may have problems with div in ul leading to an incorrect structure in the Accessibility API. Another AT has no problems with this. So something like this is only objectionable with 4.1.1 (or 4.1.4).

stevefaulkner · 2022-11-21T12:22:18Z

If I encounter something like this: test page

<button onclick="alert('Goodbye Twitter')">
<a href="https://mastodon.social/@SteveFaulkner">SteveF on Mastodon</a>
</button>

What SC will it unambiguously fail (apart from 4.1.1)?

mraccess77 · 2022-11-21T13:02:49Z

@stevefaulkner that's the example I raised a year ago and it and other structures cause issues for screen readers and other assistive technology. At the time people were suggesting it could fall under SC 2.1.1 Keyboard but I believe the impact could be greater than that.

Jym77 · 2022-11-21T13:16:14Z

@stevefaulkner @mraccess77 From the discussion here, I understand that this is an example that does not fail 4.1.1 since it parses unambiguously. It would, however, fail the proposed 4.1.4 if it is created at some point.

OTOH, <button> <a href="…">Hello</button> </a> would fail 4.1.1 but is actually fixed by the HTML parsing algorithm, so it is never encountered (when using conforming UAs).

So, this example may be a case where there are accessibility issues that are not caught by any other SC. But these issues are also not caught by 4.1.1 (or at least should not be with the intended meaning of 4.1.1 when it was created) and removing 4.1.1 does not change the result of the audit.

alastc · 2022-11-21T14:39:39Z

What SC will it unambiguously fail (apart from 4.1.1)?

As @Jym77 mentioned, this thread highlights that it wouldn't be caught by (the original intent of) 4.1.1.

In that case, I can think of a few options, in order of personal preference:

Name role value, the button lacks a name (but has a separate function from the link).
Focus order: There is a lack of meaning when you have two tab-stops that appear to be in one place.

In general terms, if you have nested controls like that, the function of one of them will be compromised either in functionality or understanding. It might depend on the details of each, but in general I'd look at 4.1.2, 1.3.1, 2.1.1, focus order, and/or headings & labels. Link purpose might be possible for some, although I think "ambiguous to all" probably applies in this case.

aardrian · 2022-11-21T14:40:18Z

Since this is still an open discussion and lists are referenced again, I cite this comment from @giacomo-petri on 5 August:

For example, removing the list-style CSS property (list-style="none"), with Safari + VO elements are no longer announced as list and list items and correct list structure and incorrect list structure are both announced as regular text. […]

Even if I generally agree with your proposal @cstrobbe of clarifying what's impacted by 4.1.1, distinguishing content models and syntactical nesting, I'm a little worried about the possible interpretations and impact of these changes, especially for scenarios like the previous one.

The comment never got a response. I am still curious if every instance of a list with bullets styled away is a failure of 4.1.1 as @cstrobbe explained it. If so, only in Safari? Or if Safari's known removal of list semantics means using list style: none is an automatic 4.1.1 failure according to the original intent.

awkawk · 2022-11-21T15:01:55Z

The comment never got a response. I am still curious if every instance of a list with bullets styled away is a failure of 4.1.1 as @cstrobbe explained it. If so, only in Safari? Or if Safari's known removal of list semantics means using list style: none is an automatic 4.1.1 failure according to the original intent.

I would say no, it isn't a 4.1.1 failure. The browser has made a deliberate choice about how to render content for assistive technologies, and users can choose to use a different browser if they prefer the user experience. If basing a conformance claim on the way a site works with Safari, an author might opt for a different bullet treatment.

https://bugs.webkit.org/show_bug.cgi?id=170179#c1

If there was a failure here, it would be a 1.3.1 failure anyway due to the visual appearance of a list (assuming that the author finds some other, non-semantic, way to covey that there is a list) not being consistent with the information communicated to users, not a 4.1.1 issue.

giacomo-petri · 2022-11-21T15:05:55Z

@aardrian, it was a broader discussion and my feedback was specific to answer/discuss quoted items.

That said, considering the original intent of 4.1.1 SC, I'm in favour of removing it, especially to avoid the high numbers of false positives that do not impact users with disabilities; many times, the 4.1.1 issues are detected by automatic scans and focus the remediation phase on items that shouldn't be treated with this priority.

Moreover, 4.1.4 should help covering these scenarios.

Last, but not least, about the list example: this behaviour is recurrent with Safari (similar behaviour happens for example if you use display:flex on a table). Aware of that, usually if I want to remove the list bullets with "list-style-type:none" CSS property, I also set role="list" to the <ul> or <ol> element, ensuring it works also with Safari. That said, I think is not an author responsibility.

cstrobbe · 2022-11-21T15:10:58Z

For example, removing the list-style CSS property (list-style="none"), with Safari + VO elements are no longer announced as list and list items and correct list structure and incorrect list structure are both announced as regular text. […]
Even if I generally agree with your proposal @cstrobbe of clarifying what's impacted by 4.1.1, distinguishing content models and syntactical nesting, I'm a little worried about the possible interpretations and impact of these changes, especially for scenarios like the previous one.

The comment never got a response. I am still curious if every instance of a list with bullets styled away is a failure of 4.1.1 as @cstrobbe explained it. If so, only in Safari? Or if Safari's known removal of list semantics means using list style: none is an automatic 4.1.1 failure according to the original intent.

I don't see anything in success criterion 4.1.1 that refers to styling, so I don't see why that would fail the SC. Even using the validation interpretation wouldn't have caused a failure of SC 4.1.1 because an HTML validator doesn't check CSS.

aardrian · 2022-11-21T15:12:15Z

@awkawk I agree.

@giacomo-petri I understand it was part of a larger discussion.

My question was for @cstrobbe, partly because of this comment:

However, for the list with the div inside it, the individual list items were not announced as list items in Firefox. This seems to mean that the list items are not programmatically determinable. This looks like a failure of SC 1.3.1 so there is no need leverage SC 4.1.1 to fail it. In Edge, the div makes no difference at all; each item in the list is announced as a list item (NVDA says "bullet").

Though I should have asked if it fails WCAG by his reading, no matter which SC.

aardrian · 2022-11-21T15:13:49Z

@cstrobbe

I don't see anything in success criterion 4.1.1 that refers to styling, so I don't see why that would feel the SC. Even using the validation interpretation wouldn't have caused a failure of SC 4.1.1 because an HTML validator doesn't check CSS.

Our comments passed one another in the night/morning. I mean to ask if failed any interpretation of WCAG. I should have cited your original comment as well (which I did in the previous comment).

dd8 · 2022-11-29T11:57:22Z

Another argument favour of deprecation.

Trying to redefine SC 4.1.1. to "nested according to the syntactical rules of their specifications" to align with the original intent won't work with the HTML Living Standard. That's because the LS includes content model restrictions as part of the HTML syntax:
https://html.spec.whatwg.org/multipage/syntax.html

This is reflected in the state machine for the HTML parser which builds the DOM. The state machine makes no distinction between handling mis-nested code like <b>1<p>2</b>3</p> and handling content model restrictions like <table><b>Bold</b><tr><td>aaa</td></tr>bbb</table> where <b> is not allowed at the top level of the table.

https://html.spec.whatwg.org/multipage/parsing.html#an-introduction-to-error-handling-and-strange-cases-in-the-parser
https://html.spec.whatwg.org/multipage/parsing.html#creating-and-inserting-nodes

This is theoretically impure, but allows the HTML parser to handle/repair many common HTML authoring errors.

This is very different to XML parsers where there's a core syntax that parses XML into a DOM, and there's a clear separation between the core syntax and the content model provided in a schema/DTD.

Edit: you can see an HTML parser implementation here - this is used by validator.w3.org/nu and by Firefox:
https://github.com/validator/htmlparser/blob/master/src/nu/validator/htmlparser/impl/TreeBuilder.java

cstrobbe · 2022-12-04T22:33:04Z

The outcome from the discussion was that there was good support for removing 4.1.1 from WCAG 2.2. We can circle back later to decide whether to add an errata to 2.0/2.1 for the syntactical aspect.

Since SC 4.1.1 was the main reason Principle 4 is called "Robust", shouldn't the principle be renamed as well?
SC 4.1.2 and 4.1.3 aren't really about robustness but about certain types of information being programmatically determinable. Most other SC that require things to be programmatically determinable are under Guideline 1.3
Perhaps SC 4.1.2 and SC 4.1.3 can be moved elsewhere so Principle 4 removed entirely.

zcorpan · 2022-12-05T21:50:12Z

For correctness...

Trying to redefine SC 4.1.1. to "nested according to the syntactical rules of their specifications" to align with the original intent won't work with the HTML Living Standard. That's because the LS includes content model restrictions as part of the HTML syntax: https://html.spec.whatwg.org/multipage/syntax.html

Content model restrictions are actually generally not part of the HTML syntax.

There's a section called "Restrictions on content models", where the content model for table and pre is special because HTML parsing is special, but it doesn't follow that all content models are also part of the syntax or the parser.

This is reflected in the state machine for the HTML parser which builds the DOM. The state machine makes no distinction between handling mis-nested code like <b>1<p>2</b>3</p> and handling content model restrictions like <table><b>Bold</b><tr><td>aaa</td></tr>bbb</table>

Yes it does, the first case is handled by the Adoption Agency Algorithm and the second is handled by Foster Parenting. The goal of these algorithms are to be web compatible, not to match content model restrictions. (However, the behavior of old browsers' HTML parsers can maybe be traced to an HTML DTD, so there is still some correlation.)

Example: this is not a parse error: <b>1<p>2</p>3</b> (despite not being allowed by the content model)

GreggVan · 2022-12-05T23:01:24Z

Programmatic determination is definitely something to make the guidelines more robust. 4.1.2 and 4.1.3 are more than just making things Perceivable — which is why they are in 4. gregg ——————————— Professor, University of Maryland, College Park Founder and Director Emeritus , Trace R&D Center, UMD Co-Founder Raising the Floor. http://raisingthefloor.org The Global Public Inclusive Infrastructure (GPII) http://GPII.net The Morphic project https://morphic.org

…

On Dec 4, 2022, at 2:33 PM, Christophe Strobbe ***@***.***> wrote: The outcome from the discussion was that there was good support for removing 4.1.1 from WCAG 2.2. We can circle back later to decide whether to add an errata to 2.0/2.1 for the syntactical aspect. Since SC 4.1.1 was the main reason Principle 4 is called "Robust", shouldn't the principle be renamed as well? SC 4.1.2 and 4.1.3 aren't really about robustness but about certain types of information being programmatically determinable. Most other SC that require things to be programmatically determinable are under Guideline 1.3 Perhaps SC 4.1.2 and SC 4.1.3 can be moved elsewhere so Principle 4 removed entirely. — Reply to this email directly, view it on GitHub <#2525 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNGDXU57IZVCYD7U22L5LLWLUL23ANCNFSM5ZRXNZ6A>. You are receiving this because you were mentioned.

alastc · 2022-12-05T23:46:56Z

Since SC 4.1.1 was the main reason Principle 4 is called "Robust", shouldn't the principle be renamed as well?...
Perhaps SC 4.1.2 and SC 4.1.3 can be moved elsewhere so Principle 4 removed entirely.

Chair hat on, I don't think that's something that the group would would interested in doing. (Consider that it's not related to a core accessibility requirement, so a couple of objections would prevent it happening as i.)

Even if it were just 4.1.2 left, I don't think we'd update POUR in WCAG 2.x.

alastc · 2023-01-13T10:12:07Z

Noting that this has been removed this from WCAG 2.2, but leaving open for potential updates to 2.1/2.0

cstrobbe mentioned this issue Jun 25, 2022

Bewertung von 9.4.1.1 BIK-BITV/BIK-Web-Test#269

Open

giacomo-petri mentioned this issue Jul 22, 2022

"Image button has non-empty accessible name" [59796f]: HTML spec require alt on image button, making several examples fail 4.1.1 act-rules/act-rules.github.io#1895

Closed

alastc added Survey - Added and removed Surveyed - Left Open labels Nov 18, 2022

This was referenced Nov 29, 2022

[73f2c2] Autocomplete invalid - Pass example 8 is a fail, and justification is inconsistent act-rules/act-rules.github.io#1967

Open

Success Criterion 4.1.1 is removed in WCAG 2.2 act-rules/act-rules.github.io#1980

Open

dd8 mentioned this issue Nov 29, 2022

Issue2525 parsing syntactical #2793

Open

aardrian mentioned this issue Dec 2, 2022

Duplicate IDs for link targets (duplicate-id-active) dequelabs/axe-core#3809

Closed

1 task

cstrobbe mentioned this issue Dec 7, 2022

Do not remove 4.1.1 Parsing from WCAG 2.2 #2820

Closed

alastc added WCAG 2.1 WCAG 2.0 and removed WCAG 2.2 Survey - Added labels Jan 13, 2023

mitchellevan mentioned this issue Feb 4, 2023

Programmatically Determined example describes outdated direct parsing of markup #3001

Open

Proposal to Rephrase Success Criterion 4.1.1 #2525

Proposal to Rephrase Success Criterion 4.1.1 #2525

Comments

cstrobbe commented Jun 22, 2022 • edited Loading

bruce-usab commented Jun 27, 2022 • edited Loading

GreggVan commented Jun 27, 2022 via email

cstrobbe commented Jun 28, 2022

JAWS-test commented Jul 2, 2022

alastc commented Jul 18, 2022

GreggVan commented Jul 18, 2022 via email

cstrobbe commented Jul 19, 2022 • edited Loading

GreggVan commented Jul 19, 2022 via email • edited by alastc Loading

patrickhlauke commented Jul 19, 2022

stevefaulkner commented Jul 19, 2022

alastc commented Jul 19, 2022

cstrobbe commented Jul 19, 2022 • edited Loading

patrickhlauke commented Jul 19, 2022 • edited Loading

GreggVan commented Jul 19, 2022 via email

alastc commented Jul 21, 2022 • edited Loading

cstrobbe commented Jul 21, 2022

patrickhlauke commented Jul 21, 2022

cstrobbe commented Jul 21, 2022

giacomo-petri commented Jul 22, 2022

patrickhlauke commented Jul 22, 2022

cstrobbe commented Jul 22, 2022

giacomo-petri commented Jul 22, 2022

JAWS-test commented Jul 23, 2022

GreggVan commented Jul 23, 2022 via email

JAWS-test commented Jul 23, 2022

patrickhlauke commented Jul 23, 2022

GreggVan commented Jul 23, 2022 via email

cstrobbe commented Jul 23, 2022

Double negative?

Examples for the Understanding doc

JAWS-test commented Nov 18, 2022 • edited Loading

JAWS-test commented Nov 18, 2022

stevefaulkner commented Nov 21, 2022

mraccess77 commented Nov 21, 2022

Jym77 commented Nov 21, 2022

alastc commented Nov 21, 2022

aardrian commented Nov 21, 2022

awkawk commented Nov 21, 2022

giacomo-petri commented Nov 21, 2022

cstrobbe commented Nov 21, 2022 • edited Loading

aardrian commented Nov 21, 2022

aardrian commented Nov 21, 2022

dd8 commented Nov 29, 2022 • edited Loading

cstrobbe commented Dec 4, 2022

zcorpan commented Dec 5, 2022

GreggVan commented Dec 5, 2022 via email

alastc commented Dec 5, 2022

alastc commented Jan 13, 2023

cstrobbe commented Jun 22, 2022 •

edited

Loading

bruce-usab commented Jun 27, 2022 •

edited

Loading

cstrobbe commented Jul 19, 2022 •

edited

Loading

GreggVan commented Jul 19, 2022 via email •

edited by alastc

Loading

cstrobbe commented Jul 19, 2022 •

edited

Loading

patrickhlauke commented Jul 19, 2022 •

edited

Loading

alastc commented Jul 21, 2022 •

edited

Loading

JAWS-test commented Nov 18, 2022 •

edited

Loading

cstrobbe commented Nov 21, 2022 •

edited

Loading

dd8 commented Nov 29, 2022 •

edited

Loading