How to summarize and display test results? #19

spectranaut · 2019-11-22T18:37:29Z

Hey all! This is a somewhat urgent issue because I need to get something in by Wednesday the 27th, so hoping spend all of Tuesday the 26th at the latest implementing a summary page. Each test has a lot of information recorded in it, and I am not sure which information is the most relevant to show. I'm hoping to get feedback in this issue to simply make a first draft of a test result summary.

Here is an example html page with the summary after you complete the "read-checkbox.html" test, which I used because it is the longest test:

read checkbox results

That is the result for just one test. What we need to discuss here is:

Is this a reasonable way to show a summary of one test
How should we ultimately show results for "read checkbox" along with results from "operate checkbox" and "read checkbox grouping"?
What does it mean for a test to pass?

Current implementation of algorithm for test passing or failing:

The test passes if:
1. All assertions pass for every AT command.
2. There is no unexpected bad behaviors (such as irrelevant extra information or the AT crashing) after any AT command.

Therefore, any failing assertion or undesirable behavior after any key command will result in the test failing. We could have a different state for a test result if all assertions pass but there is some additional undesirable behavior occurs.

What will cause an assertion to pass or fail?

An assertion will fail if any screen reader commands results in incorrect information (in this case, the tester will have marked "incorrect output"). In the checkbox case, if the accessible name or role or state is actually wrong for any tested key command, then the assertion fails.

Additionally, an assertion will fail if the test author includes an additional assertion that is not related to the output of the screen reader and that assertion fails. So far we only have one example of this kind of assertion, that is the assertion about the JAWS and NVDA changing modes when you use TAB to read a checkbox.

An assertion will be considered to pass if there is only "missing" information (for example, if a test marks "no output" because the role "checkbox" is not announced).

Not all assertions are equal

The test design so far does not have a way to record the necessity of an assertion (for example, "mandatory" or "nice to have"). This will take some thinking to fit into the test design (as the tests are already quite complicated) so I do not think that the ability to mark some assertions as necessary for passing and others are not necessary will make it into the prototype for this phase of the project.

spectranaut · 2019-11-22T18:39:21Z

@mcking65 and @jfhector and @mfairchild365 and @Yohta89, can you take a look at this issue? :)

ghost · 2019-11-22T23:22:36Z

I liked the overall structure of the result page! I'm assuming the primary purpose of the result page is to facilitate stakeholders to understand the deficiencies of each screen reader based on the results. i.e. Help them be on the same page to have a conversation about what's wrong with current experiences. With those in mind, here are some thoughts.

Is this a reasonable way to show a summary of one test

-In case of a more complicated and longer test, adding a summary score after the test results. e.g. test result 1 of 5 failed.
-To help implementers skim the results quickly, mark fail in different colors or italic.

How should we ultimately show results for "read checkbox" along with results from "operate checkbox" and "read checkbox grouping"?

-The only structure I can think of for now is a tree structure. Under the root umbrella of a checkbox, having three sorts of test results. The root page is for the summary of the test result that could potentially record scoring (though I think scoring would be the next phase).

What does it mean for a test to pass?

-The current definition you've shared makes sense to me. I'll get back to this once I could come up with other thoughts.

And this relates to the testing page. But since we don't have free form text sections to capture Other detail, I was wondering how we could see the detail of why the tester marked something as fail in the current result page.

mfairchild365 · 2019-11-23T00:36:55Z

Is this a reasonable way to show a summary of one test

I think it is certainly a good start, and I agree with @Yohta89's comments. Additionally, it would be good to give the tables row and column headers.

How should we ultimately show results for "read checkbox" along with results from "operate checkbox" and "read checkbox grouping"?

If I'm understanding this correctly, we should have a summary of support at the test suite level, where the user could dive into the summaries for each test. Side note, what are we calling a group of related tests? A test plan? A test suite?

What does it mean for a test to pass?

I can't help but wonder if we should mark tests as 'partial' when some assertions pass and others fail. That could help stakeholders quickly determine 'oh, it looks like this one is completely failing but that one at least has some support'.

isaacdurazo · 2019-11-25T20:39:33Z

@spectranaut I was taking a look at the summary page and was wondering how we could simplify it.

I was thinking that, ultimately, what we want to get out of it is a picture of what's failing, so why display what is passing?

Showing only that's failing, in addition to the summary score that @Yohta89 is suggesting, could make parsing the summary page information easier.

What do you all think?

mcking65 · 2019-11-26T00:47:27Z

What is passing is as important with these tests as what passes.

mcking65 · 2019-11-26T04:46:18Z

I have a hard time parsing this because of all the words, headings, tables ... Feels difficult to get a picture of what happened during the testing.

Here is my suggestion.

Put the entire report into a single table with the following columns:

Task
Must-Have Pass/Fail
Should-Have Pass/Fail
Nice-Have Pass/Fail
Unexpected Behavior Count

At the bottom there are two summary rows:

Totals: column 1 has the task count, e.g., 5 tasks. The pass/fail columns total all the passes and all the fails in the column. The unexpected column totals the number of unexpected behaviors.
Percentages: Column 1 has "Percent supported" and the pass/fail columns have (passes/(passes+fails))*100

To calculate pass/fail counts, let's consider a command/assertion pair as a single expected behavior.
That is, down arrow conveying role is 1 expected behavior; down arrow announcing name is another.
If there was no output, or if the output was incorrect, that command/assertion pair is counted as a fail.
So, if there were 20 expected must-have behaviors and 18 passed, put 18/2 in that column.

For unexpected behaviors, such as excess verbosity, just count them for column 5.

The task column is a link that opens details for that task in a new tab. The title is "Results for task TASK_NAME"

When you get down to the task detail behavior, the command is typically going to be the primary element of concern, e.g., how well is a specific command supported. Screen reader bugs will be typically associated with either a command or an unexpected behavior.

This page would have a single table with columns:

command
Support level
Details

Note: The command will serve as a row header for each row.

The support level column will have one of the following values:

Full: All assertions had expected behavior and there were no unexpected behaviors. This value is only possible if there are should-have or nice-to-have assertions.
All Required: All Must-Have assertions had expected behavior and there were no unexpected behaviors.
Failing: There was some kind of failure or unexpected behavior.

Later, I'd like to refine the above to include a partial support option that is distinct from failing support. but we would first need to make some adjustments to the way we categorize unexpected behaviors. For instance, some excess speech does not introduce errors, while other excess speech could actually be incorrect and is worth noting as a worse kind of failure.

The details column shows the output and then lists assertions, grouping according to pass/fail.

Output = ...

Passing assertions:

Must: The role 'checkbox' is conveyed

Must: The name 'Lettuce' is spoken
...
Failing assertions:

Must: assertion that failed...

...

Unexpected behaviors: none

mfairchild365 · 2019-11-26T14:05:25Z

@mcking65 I like the direction this is going. Clarification: should "Must-Have Pass/Fail" be two columns? 1 for pass and 1 for fail? I'm struggling to visualize what cell data under that single column would look like, maybe "x/y" where x is the number of passing commands and y is the number of failing commands?

mcking65 · 2019-11-27T05:57:22Z

@mfairchild365 commented:

@mcking65 I like the direction this is going. Clarification: should "Must-Have Pass/Fail" be two columns? 1 for pass and 1 for fail? I'm struggling to visualize what cell data under that single column would look like, maybe "x/y" where x is the number of passing commands and y is the number of failing commands?

Yes, 18/2 would mean 18 pass and 2 fail. This is a way to 1) give more space for column 1 and 2) make it easier to get more info quickly by reading down a single column. As a screen reader user, I can get a lot more info with fewer key strokes this way. the number of passes and number of fails are numbers the user will often want to consume simultaneously when skimming. The only disadvantage is if you are purely focused on the failures. In that case, you would have to listen to extrainfo. Given the nature of the data, seems like a good trade off to me.

spectranaut closed this as completed Mar 2, 2020

zcorpan mentioned this issue Apr 8, 2020

Show "pass / total" instead of "pass / fail" in test results #151

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to summarize and display test results? #19

How to summarize and display test results? #19

spectranaut commented Nov 22, 2019

spectranaut commented Nov 22, 2019

ghost commented Nov 22, 2019

mfairchild365 commented Nov 23, 2019

isaacdurazo commented Nov 25, 2019

mcking65 commented Nov 26, 2019

mcking65 commented Nov 26, 2019

mfairchild365 commented Nov 26, 2019

mcking65 commented Nov 27, 2019

How to summarize and display test results? #19

How to summarize and display test results? #19

Comments

spectranaut commented Nov 22, 2019

Current implementation of algorithm for test passing or failing:

What will cause an assertion to pass or fail?

Not all assertions are equal

spectranaut commented Nov 22, 2019

ghost commented Nov 22, 2019

mfairchild365 commented Nov 23, 2019

isaacdurazo commented Nov 25, 2019

mcking65 commented Nov 26, 2019

mcking65 commented Nov 26, 2019

mfairchild365 commented Nov 26, 2019

mcking65 commented Nov 27, 2019