Working mode updates for test plan versions and scope #914

mcking65 · 2023-03-30T18:47:48Z

Update working mode to address answers to the following:

Should we scope test plans to a group of AT with identical testing requirements?
Can a test plan return to an earlier phase of the working mode?
Are there scenarios where we need to simultaneously manage multiple versions of a test plan?

jugglinmike · 2023-03-30T20:19:58Z

The ARIA and Assistive Technologies Community Group just discussed this issue. Meeting minutes are available on w3.org and also included below.

The full IRC log of that discussion

<jugglinmike> Matt_King: Some background: we're working on some analysis of the current app functionality and comparing it to the working mode
<jugglinmike> Matt_King: I'm building a [GitHub] Project to map out exactly what requirements of the working mode are not supported (either correctly or at all) that are necessary to delivering "recommended" reports
<jugglinmike> Matt_King: I didn't reference that [GitHub] Project here, yet. We'll talk more about that later
<jugglinmike> Matt_King: As I'm doing that, I'm going through the working mode and looking at various scenarios for how we use it
<jugglinmike> Matt_King: The first scenario -- the "happy path" or "scenario 0"
<jugglinmike> Matt_King: A perfect draft goes into the working more and goes straight to community feedback. Everyone runs it with no feedback and there's no conflict. It goes to the "candidate" phase, so the implementers look at it and approve it without comment. So it reaches the "recommended" phase
<jugglinmike> Matt_King: When reviewing with "scenario 0" in mind, I came up with three questions
<jugglinmike> Matt_King: Those are listed in the GitHub Issue we're discussing, now
<jugglinmike> Matt_King: First question: "Should we scope test plans to a group of AT with identical testing requirements?"
<jugglinmike> Matt_King: Right now, the scope of all of our test plans is JAWS, NVDA, and VoiceOver for macOS
<jugglinmike> Matt_King: There are two reasons why scope is super-important
<jugglinmike> Matt_King: One is that we seek consensus from major stakeholders which include developers of the ATs
<jugglinmike> Matt_King: Two is that it determines which ATs we consider when we're trying to prove whether the test is good
<jugglinmike> Matt_King: At some point in the future, we will be testing VoiceOver for iOS and TalkBack for Android. We'll also be testing Narrator and maybe Chrome Box. Beyond that, we'll hopefully be testing voice recognition and eye gaze (way down the road)
<jugglinmike> Matt_King: What should we do when we add additional ATs? Should they be new test plans? Or should they get added to an existing test plan?
<jugglinmike> James_Scholes: Second question: do all future ATs have the same testing requirements?
<jugglinmike> James_Scholes: I ask because when you create a test plan, it's possible to have only a subset of tests that apply to a given AT. For instance, the "mode switching" tests apply to NVDA and JAWS, but they do not apply to VoiceOver
<jugglinmike> James_Scholes: If we were to update an existing test plan to add voice recognition commands (for example). We could add them to an existing test plan either by extending all of the existing tests to support speech recognition commands, but if we decided that a particular test did not apply, we could simply omit it.
<jugglinmike> James_Scholes: So I'm inclined to do that rather than create a whole new test plan
<jugglinmike> michael_fairchild: My process is similar to what James_Scholes has outlined
<jugglinmike> Matt_King: Let's talk about different possible approaches before discussing pros and cons of particular approaches
<jugglinmike> Matt_King: We could look at AT that have essentially the same functionality--desktop screen readers as a category. They largely perform the same functions in very similar ways. But they're quite different from mobile screen readers in fundamental ways. And very different from eye gaze, voice control, and magnification
<jugglinmike> Matt_King: We could have a test plan scoped to just a specific type of AT where they essentially mirror one another. Where we have the need to support similar tests. Maybe not identical tests, but where we only have ocassional need for minor differences
<jugglinmike> Matt_King: Or we could group them in broad categories: "all screen readers" or "all eye gaze ATs"
<jugglinmike> michael_fairchild: what if we limited each test plan to a single AT?
<jugglinmike> Matt_King: If we did that, we'd have to determine which test plans require agreement with one another in order to establish interoperability
<jugglinmike> Matt_King: If I compare ARIA-AT to wpt.fyi... In wpt.fyi, we have a spec like the one for CSS flexbox. It contains normative requirements, and those requirements are translated to tests
<jugglinmike> Matt_King: I kind of look at the set of tests in a test plan as equivalent to the tests in wpt.fyi
<jugglinmike> Matt_King: For everyone who makes "widget X", the test plan is a way of saying, "here is the set of tests to verify that you have created an interoperable implementation of 'widget X'"
<jugglinmike> michael_fairchild: So a test plan is a way to verify that several ATs are interoperable. Is that the only way to verify interoperability?
<jugglinmike> Matt_King: For sure not--keep thinking outside the box!
<jugglinmike> James_Scholes: If we just limit ourselves to the screen reader and browser combinations that we have now, we are basically right now saying that it's acceptable to compare across all of those
<jugglinmike> James_Scholes: Is it reasonable to make the same assertion after adding additional screen readers? Do we expect to hold iOS VoiceOver to the same standards as the macOS version?
<jugglinmike> James_Scholes: Would it be reasonable to compare the set of results between a screen reader and a voice recognition tool (given that the tests could be significantly different)?
<jugglinmike> Matt_King: Right now, we list the test plans along the left-hand column. But actually, right now, those test plans are synonymous with a test case.
<jugglinmike> Matt_King: Let's say that we're adding support for Combobox with Eye Gaze tools... The tests are completely different, but we can still give a report about how well a particular eye gaze tool satisfies the expectations
<jugglinmike> James_Scholes: It doesn't make sense to compare the support of JAWS and Dragon Naturally Speaker for a given pattern
<jugglinmike> James_Scholes: It makes sense mathematically, but users may be using both of those ATs
<jugglinmike> James_Scholes: It also makes me think that the table would grow much too large
<jugglinmike> Matt_King: The presentation doesn't concern me so much. We could aggregate the data in many ways
<jugglinmike> James_Scholes: I still think that they would be better-served by separate categories
<jugglinmike> James_Scholes: e.g. one for screen readers and one for magnifiers
<jugglinmike> James_Scholes: as opposed to having them all mixed: "here are the results for five screen readers and four magnifiers" etc.
<jugglinmike> Matt_King: I can imagine for some patterns, the expectations for all desktop screen readers are the same
<jugglinmike> Matt_King: But when it comes to desktop screen readers versus mobile screen readers, we may end up with dedicated tests that are quite different
<jugglinmike> Matt_King: We have to consider when/why we are asking AT developers to revisit test plans. If we change an existing test plan by adding VoiceOver for iOS, does it make sense to be asking Vispero to review the new version of the test plan?
<jugglinmike> Matt_King: Do we have to "re-do: the transition from Candidate whenever we add new ATs to a Recommended test plan?
<jugglinmike> Matt_King: We might say that two products are different enough that they need separate test plans for the same pattern
<jugglinmike> Matt_King: But if we add Narrator to the test plan that JAWS, NVDA, and VoiceOver already went through, I would expect that those three already agree.
<jugglinmike> jugglinmike: Doesn't that give undue preference to the ATs which happen to participate earlier?
<jugglinmike> Matt_King: Yes
<jugglinmike> James_Scholes: It seems undesirable to have to revisit consensus that we've already obtained whenever adding a new AT
<jugglinmike> James_Scholes: I'd like to explore a concrete scenario in which adding a new AT would require the tests in a recommended test plan to be changed
<jugglinmike> Matt_King: We're out of time. We will continue this discussion. We'll get answers to these questions and make whatever changes to the working mode they imply. Thanks, all!

mcking65 added the documentation Related to documentation about the ARIA-AT project or its deliverables label Mar 30, 2023

mcking65 self-assigned this Mar 30, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working mode updates for test plan versions and scope #914

Working mode updates for test plan versions and scope #914

mcking65 commented Mar 30, 2023 •

edited

jugglinmike commented Mar 30, 2023

Working mode updates for test plan versions and scope #914

Working mode updates for test plan versions and scope #914

Comments

mcking65 commented Mar 30, 2023 • edited

jugglinmike commented Mar 30, 2023

mcking65 commented Mar 30, 2023 •

edited