-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to summarize and display test results? #19
Comments
@mcking65 and @jfhector and @mfairchild365 and @Yohta89, can you take a look at this issue? :) |
I liked the overall structure of the result page! I'm assuming the primary purpose of the result page is to facilitate stakeholders to understand the deficiencies of each screen reader based on the results. i.e. Help them be on the same page to have a conversation about what's wrong with current experiences. With those in mind, here are some thoughts.
-In case of a more complicated and longer test, adding a summary score after the test results. e.g. test result 1 of 5 failed.
-The only structure I can think of for now is a tree structure. Under the root umbrella of a checkbox, having three sorts of test results. The root page is for the summary of the test result that could potentially record scoring (though I think scoring would be the next phase).
-The current definition you've shared makes sense to me. I'll get back to this once I could come up with other thoughts. And this relates to the testing page. But since we don't have free form text sections to capture Other detail, I was wondering how we could see the detail of why the tester marked something as fail in the current result page. |
I think it is certainly a good start, and I agree with @Yohta89's comments. Additionally, it would be good to give the tables row and column headers.
If I'm understanding this correctly, we should have a summary of support at the test suite level, where the user could dive into the summaries for each test. Side note, what are we calling a group of related tests? A test plan? A test suite?
I can't help but wonder if we should mark tests as 'partial' when some assertions pass and others fail. That could help stakeholders quickly determine 'oh, it looks like this one is completely failing but that one at least has some support'. |
@spectranaut I was taking a look at the summary page and was wondering how we could simplify it. I was thinking that, ultimately, what we want to get out of it is a picture of what's failing, so why display what is passing? Showing only that's failing, in addition to the summary score that @Yohta89 is suggesting, could make parsing the summary page information easier. What do you all think? |
What is passing is as important with these tests as what passes. |
I have a hard time parsing this because of all the words, headings, tables ... Feels difficult to get a picture of what happened during the testing. Here is my suggestion. Put the entire report into a single table with the following columns:
At the bottom there are two summary rows:
To calculate pass/fail counts, let's consider a command/assertion pair as a single expected behavior. For unexpected behaviors, such as excess verbosity, just count them for column 5. The task column is a link that opens details for that task in a new tab. The title is "Results for task TASK_NAME" When you get down to the task detail behavior, the command is typically going to be the primary element of concern, e.g., how well is a specific command supported. Screen reader bugs will be typically associated with either a command or an unexpected behavior. This page would have a single table with columns:
Note: The command will serve as a row header for each row. The support level column will have one of the following values:
Later, I'd like to refine the above to include a partial support option that is distinct from failing support. but we would first need to make some adjustments to the way we categorize unexpected behaviors. For instance, some excess speech does not introduce errors, while other excess speech could actually be incorrect and is worth noting as a worse kind of failure. The details column shows the output and then lists assertions, grouping according to pass/fail.
|
@mcking65 I like the direction this is going. Clarification: should "Must-Have Pass/Fail" be two columns? 1 for pass and 1 for fail? I'm struggling to visualize what cell data under that single column would look like, maybe "x/y" where x is the number of passing commands and y is the number of failing commands? |
@mfairchild365 commented:
Yes, 18/2 would mean 18 pass and 2 fail. This is a way to 1) give more space for column 1 and 2) make it easier to get more info quickly by reading down a single column. As a screen reader user, I can get a lot more info with fewer key strokes this way. the number of passes and number of fails are numbers the user will often want to consume simultaneously when skimming. The only disadvantage is if you are purely focused on the failures. In that case, you would have to listen to extrainfo. Given the nature of the data, seems like a good trade off to me. |
Hey all! This is a somewhat urgent issue because I need to get something in by Wednesday the 27th, so hoping spend all of Tuesday the 26th at the latest implementing a summary page. Each test has a lot of information recorded in it, and I am not sure which information is the most relevant to show. I'm hoping to get feedback in this issue to simply make a first draft of a test result summary.
Here is an example html page with the summary after you complete the "read-checkbox.html" test, which I used because it is the longest test:
read checkbox results
That is the result for just one test. What we need to discuss here is:
Current implementation of algorithm for test passing or failing:
Therefore, any failing assertion or undesirable behavior after any key command will result in the test failing. We could have a different state for a test result if all assertions pass but there is some additional undesirable behavior occurs.
What will cause an assertion to pass or fail?
An assertion will fail if any screen reader commands results in incorrect information (in this case, the tester will have marked "incorrect output"). In the checkbox case, if the accessible name or role or state is actually wrong for any tested key command, then the assertion fails.
Additionally, an assertion will fail if the test author includes an additional assertion that is not related to the output of the screen reader and that assertion fails. So far we only have one example of this kind of assertion, that is the assertion about the JAWS and NVDA changing modes when you use TAB to read a checkbox.
An assertion will be considered to pass if there is only "missing" information (for example, if a test marks "no output" because the role "checkbox" is not announced).
Not all assertions are equal
The test design so far does not have a way to record the necessity of an assertion (for example, "mandatory" or "nice to have"). This will take some thinking to fit into the test design (as the tests are already quite complicated) so I do not think that the ability to mark some assertions as necessary for passing and others are not necessary will make it into the prototype for this phase of the project.
The text was updated successfully, but these errors were encountered: