Showing percentage (not test/subtest) summary numbers #2825

foolip · 2022-03-25T09:46:42Z

@mfreed7 had this suggestion/request in email:

I wanted to quickly revive this part of the discussion on Interop2022 scoring. I wanted to point out that it would be so nice to have a view like this one on WPT.fyi, where the numbers in the table are the actual scores used for Interop2022. So for example, in this list, the css/ folder currently shows "412 / 476", meaning Chrome passes 412 sub-tests out of 476 sub-tests. It would be great if instead it showed "43.95 / 90", which is my best estimate of the actual Interop2022 weight for the css/ folder. There are 90 individual test files, of which we fail 46, and one in which we get 325/343 subtests right.

I ask because as we're looking for ways to improve our score, the above is what matters.

Do you think this change is feasible/easy? I was initially thinking that this could be a separate display "mode", but the more I think about it, the more this scoring system seems generally better at conveying the pass/fail state of things.

I 100% agree that this would be valuable, and had this in mind when writing the scoring code for Interop 2022. I believe the rules are simple enough to implement on the frontend. I didn't file an issue though...

This is very similar to #2290, where the request it to show test counts instead. Whether we end up with 2 or 3 different ways of summarizing scores, I'd like a configuration ⚙️ for that, as well as the ability to sort, tracked by #2289.

@KyleJu @past let's discuss this for Q2 planning, getting the request from @mfreed7 makes me even more certain it's important :)

The text was updated successfully, but these errors were encountered:

foolip · 2022-03-25T10:04:11Z

We will have to revisit #62 when deciding on how to compute a percentage for testharness.js tests. Our options will be:

Ignore harness status entirely, like Interop 2022
Score as 0 if there's a harness error
Count harness status and subtests, scoring as (subtestPasses + (harnessError ? 0 : 1)) / (subtestTotal + 1)

Only option 3 is possible to compute from the summary numbers that the frontend currently uses by default, but even though simple I think it's a very bad approach. It would score many failing testharness.js tests with a single failing subtest as 50%, just because the harness status is OK.

Between option 1 and 2, I think we should try both and see where the difference between the two methods is the biggest. I'm partial to "Score as 0 if there's a harness error" since it never hides a harness error, but in particular for harness timeouts it can be a bit harsh.

mfreed7 · 2022-03-25T23:03:24Z

@foolip Thanks for opening this issue! I'm excited that it might be a possibility.

DanielRyanSmith · 2022-04-21T20:38:57Z

I'm looking into this, and I'm curious if it's feasible to use this logic:

If a test has no subtests results and only a harness status, count the harness status as the pass/fail of the test.
If any subtests exist and are run, do not count the harness status toward the pass %.

So that a test like this would count the harness status toward the overall pass/fail of the test, and a test with subtest results would NOT take the harness status into account.

foolip · 2022-04-22T09:32:42Z

If a test has no subtests results and only a harness status, count the harness status as the pass/fail of the test.

This sounds good, but it should be impossible for the harness status to be OK in this situation, so what it actually means is "if there are no subtests, score as 0%", which I think is right.

If any subtests exist and are run, do not count the harness status toward the pass %.

Yeah, this is one of the options. The downside is that it's possible for this to hide a harness error or timeout, because it is possible for there to be a number of passing subtests and then a harness error. Scoring that as 100% hides the problem. Similarly but less serious, it's possible for the harness status to be TIMEOUT, and for some tests not having finished running yet, so you might get 9/10 passing even though there should be 20 tests in total.

The rule I would try first is that if the harness status is not OK, score as 0%. It's actually an inconclusive result and no percentage is "correct", so NaN% would be the honest score locally, but one has to pick some number to aggregate scores.

foolip · 2022-04-22T11:22:10Z

@DanielRyanSmith if you haven't seen it, the scoring for Interop 2022 is highly relevant here:

https://github.com/web-platform-tests/results-analysis/blob/92b8b8a7237f5c5cc878e647ff61429f74576e6a/interop-2022/main.js#L202-L220

You'll notice the harness status is ignored, so it's not doing what I'm proposing we do on wpt.fyi. I think ultimately it's going to be a source of confusing if the percentage scoring computations for wpt.fyi and Interop 2022 don't match, so we should try to come up with something that'll work in both contexts.

davidsgrogan · 2022-05-20T23:23:43Z

I'm also very interested in this feature. Awesome that it's being worked on.

Question: if the tests on the screen weren't included in interop-2022, what would the new score UI show?

Would also be great if the interop2022 score could show up in diff runs like this one

foolip · 2022-05-24T15:20:25Z

@mfreed7 @davidsgrogan, there's an RFC up for this now at web-platform-tests/rfcs#114, with a demo, please take a look! And thanks @DanielRyanSmith for working on it, I'm very happy to see it take shape!

foolip · 2022-08-01T09:40:00Z

This was resolved by web-platform-tests/rfcs#114 but isn't deployed to production yet.

DanielRyanSmith · 2022-08-02T20:54:01Z

Now merged to production 🙂

foolip added the interop Issues with the Interop dashboards label Mar 25, 2022

KyleJu mentioned this issue Apr 14, 2022

Display raw failure number #2793

Open

DanielRyanSmith self-assigned this Apr 25, 2022

KyleJu mentioned this issue May 6, 2022

Display test totals information on subtest and directory views #2841

Merged

This was referenced May 16, 2022

Remove harness status aggregation and display percentages on Interop-202X scores #2858

Merged

Use percentages when displaying results #2861

Closed

foolip mentioned this issue May 30, 2022

RFC 114: Change results aggregation on wpt.fyi and visualization on Interop-20** views web-platform-tests/rfcs#114

Merged

davidsgrogan mentioned this issue May 31, 2022

Provide option to show Interop2022 score changes in diff view #2868

Closed

DanielRyanSmith closed this as completed Aug 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Showing percentage (not test/subtest) summary numbers #2825

Showing percentage (not test/subtest) summary numbers #2825

foolip commented Mar 25, 2022

foolip commented Mar 25, 2022

mfreed7 commented Mar 25, 2022

DanielRyanSmith commented Apr 21, 2022 •

edited

Loading

foolip commented Apr 22, 2022

foolip commented Apr 22, 2022

davidsgrogan commented May 20, 2022 •

edited

Loading

foolip commented May 24, 2022

foolip commented Aug 1, 2022

DanielRyanSmith commented Aug 2, 2022

Showing percentage (not test/subtest) summary numbers #2825

Showing percentage (not test/subtest) summary numbers #2825

Comments

foolip commented Mar 25, 2022

foolip commented Mar 25, 2022

mfreed7 commented Mar 25, 2022

DanielRyanSmith commented Apr 21, 2022 • edited Loading

foolip commented Apr 22, 2022

foolip commented Apr 22, 2022

davidsgrogan commented May 20, 2022 • edited Loading

foolip commented May 24, 2022

foolip commented Aug 1, 2022

DanielRyanSmith commented Aug 2, 2022

DanielRyanSmith commented Apr 21, 2022 •

edited

Loading

davidsgrogan commented May 20, 2022 •

edited

Loading