-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Showing percentage (not test/subtest) summary numbers #2825
Comments
We will have to revisit #62 when deciding on how to compute a percentage for testharness.js tests. Our options will be:
Only option 3 is possible to compute from the summary numbers that the frontend currently uses by default, but even though simple I think it's a very bad approach. It would score many failing testharness.js tests with a single failing subtest as 50%, just because the harness status is OK. Between option 1 and 2, I think we should try both and see where the difference between the two methods is the biggest. I'm partial to "Score as 0 if there's a harness error" since it never hides a harness error, but in particular for harness timeouts it can be a bit harsh. |
@foolip Thanks for opening this issue! I'm excited that it might be a possibility. |
I'm looking into this, and I'm curious if it's feasible to use this logic:
So that a test like this would count the harness status toward the overall pass/fail of the test, and a test with subtest results would NOT take the harness status into account. |
This sounds good, but it should be impossible for the harness status to be OK in this situation, so what it actually means is "if there are no subtests, score as 0%", which I think is right.
Yeah, this is one of the options. The downside is that it's possible for this to hide a harness error or timeout, because it is possible for there to be a number of passing subtests and then a harness error. Scoring that as 100% hides the problem. Similarly but less serious, it's possible for the harness status to be TIMEOUT, and for some tests not having finished running yet, so you might get 9/10 passing even though there should be 20 tests in total. The rule I would try first is that if the harness status is not OK, score as 0%. It's actually an inconclusive result and no percentage is "correct", so NaN% would be the honest score locally, but one has to pick some number to aggregate scores. |
@DanielRyanSmith if you haven't seen it, the scoring for Interop 2022 is highly relevant here: You'll notice the harness status is ignored, so it's not doing what I'm proposing we do on wpt.fyi. I think ultimately it's going to be a source of confusing if the percentage scoring computations for wpt.fyi and Interop 2022 don't match, so we should try to come up with something that'll work in both contexts. |
@mfreed7 @davidsgrogan, there's an RFC up for this now at web-platform-tests/rfcs#114, with a demo, please take a look! And thanks @DanielRyanSmith for working on it, I'm very happy to see it take shape! |
This was resolved by web-platform-tests/rfcs#114 but isn't deployed to production yet. |
Now merged to production 🙂 |
@mfreed7 had this suggestion/request in email:
I 100% agree that this would be valuable, and had this in mind when writing the scoring code for Interop 2022. I believe the rules are simple enough to implement on the frontend. I didn't file an issue though...
This is very similar to #2290, where the request it to show test counts instead. Whether we end up with 2 or 3 different ways of summarizing scores, I'd like a configuration ⚙️ for that, as well as the ability to sort, tracked by #2289.
@KyleJu @past let's discuss this for Q2 planning, getting the request from @mfreed7 makes me even more certain it's important :)
The text was updated successfully, but these errors were encountered: