Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Showing percentage (not test/subtest) summary numbers #2825

Closed
foolip opened this issue Mar 25, 2022 · 9 comments
Closed

Showing percentage (not test/subtest) summary numbers #2825

foolip opened this issue Mar 25, 2022 · 9 comments
Assignees
Labels
interop Issues with the Interop dashboards

Comments

@foolip
Copy link
Member

foolip commented Mar 25, 2022

@mfreed7 had this suggestion/request in email:

I wanted to quickly revive this part of the discussion on Interop2022 scoring. I wanted to point out that it would be so nice to have a view like this one on WPT.fyi, where the numbers in the table are the actual scores used for Interop2022. So for example, in this list, the css/ folder currently shows "412 / 476", meaning Chrome passes 412 sub-tests out of 476 sub-tests. It would be great if instead it showed "43.95 / 90", which is my best estimate of the actual Interop2022 weight for the css/ folder. There are 90 individual test files, of which we fail 46, and one in which we get 325/343 subtests right.

I ask because as we're looking for ways to improve our score, the above is what matters.

Do you think this change is feasible/easy? I was initially thinking that this could be a separate display "mode", but the more I think about it, the more this scoring system seems generally better at conveying the pass/fail state of things.

I 100% agree that this would be valuable, and had this in mind when writing the scoring code for Interop 2022. I believe the rules are simple enough to implement on the frontend. I didn't file an issue though...

This is very similar to #2290, where the request it to show test counts instead. Whether we end up with 2 or 3 different ways of summarizing scores, I'd like a configuration ⚙️ for that, as well as the ability to sort, tracked by #2289.

@KyleJu @past let's discuss this for Q2 planning, getting the request from @mfreed7 makes me even more certain it's important :)

@foolip foolip added the interop Issues with the Interop dashboards label Mar 25, 2022
@foolip
Copy link
Member Author

foolip commented Mar 25, 2022

We will have to revisit #62 when deciding on how to compute a percentage for testharness.js tests. Our options will be:

  1. Ignore harness status entirely, like Interop 2022
  2. Score as 0 if there's a harness error
  3. Count harness status and subtests, scoring as (subtestPasses + (harnessError ? 0 : 1)) / (subtestTotal + 1)

Only option 3 is possible to compute from the summary numbers that the frontend currently uses by default, but even though simple I think it's a very bad approach. It would score many failing testharness.js tests with a single failing subtest as 50%, just because the harness status is OK.

Between option 1 and 2, I think we should try both and see where the difference between the two methods is the biggest. I'm partial to "Score as 0 if there's a harness error" since it never hides a harness error, but in particular for harness timeouts it can be a bit harsh.

@mfreed7
Copy link

mfreed7 commented Mar 25, 2022

@foolip Thanks for opening this issue! I'm excited that it might be a possibility.

@DanielRyanSmith
Copy link
Contributor

DanielRyanSmith commented Apr 21, 2022

I'm looking into this, and I'm curious if it's feasible to use this logic:

  1. If a test has no subtests results and only a harness status, count the harness status as the pass/fail of the test.
  2. If any subtests exist and are run, do not count the harness status toward the pass %.

So that a test like this would count the harness status toward the overall pass/fail of the test, and a test with subtest results would NOT take the harness status into account.

@foolip
Copy link
Member Author

foolip commented Apr 22, 2022

If a test has no subtests results and only a harness status, count the harness status as the pass/fail of the test.

This sounds good, but it should be impossible for the harness status to be OK in this situation, so what it actually means is "if there are no subtests, score as 0%", which I think is right.

If any subtests exist and are run, do not count the harness status toward the pass %.

Yeah, this is one of the options. The downside is that it's possible for this to hide a harness error or timeout, because it is possible for there to be a number of passing subtests and then a harness error. Scoring that as 100% hides the problem. Similarly but less serious, it's possible for the harness status to be TIMEOUT, and for some tests not having finished running yet, so you might get 9/10 passing even though there should be 20 tests in total.

The rule I would try first is that if the harness status is not OK, score as 0%. It's actually an inconclusive result and no percentage is "correct", so NaN% would be the honest score locally, but one has to pick some number to aggregate scores.

@foolip
Copy link
Member Author

foolip commented Apr 22, 2022

@DanielRyanSmith if you haven't seen it, the scoring for Interop 2022 is highly relevant here:

https://github.com/web-platform-tests/results-analysis/blob/92b8b8a7237f5c5cc878e647ff61429f74576e6a/interop-2022/main.js#L202-L220

You'll notice the harness status is ignored, so it's not doing what I'm proposing we do on wpt.fyi. I think ultimately it's going to be a source of confusing if the percentage scoring computations for wpt.fyi and Interop 2022 don't match, so we should try to come up with something that'll work in both contexts.

@davidsgrogan
Copy link
Member

davidsgrogan commented May 20, 2022

I'm also very interested in this feature. Awesome that it's being worked on.

Question: if the tests on the screen weren't included in interop-2022, what would the new score UI show?

Would also be great if the interop2022 score could show up in diff runs like this one

https://wpt.fyi/results/?diff&filter=ADC&q=css%2Fcss-grid%2Fabspos%2Fgrid-abspos-staticpos-align-self-vertwm-last-baseline-003.html%20or%20css%2Fcss-grid%2Fabspos%2Fgrid-abspos-staticpos-align-self-vertwm-last-baseline-004.html%20or%20css%2Fcss-grid%2Falignment%2Fgrid-content-alignment-with-abspos-001.html%20or%20css%2Fcss-grid%2Flayout-algorithm%2Fflex-tracks-with-fractional-size.html%20or%20css%2Fcss-grid%2Fparsing%2Fgrid-area-computed.html%20or%20css%2Fcss-grid%2Fparsing%2Fgrid-shorthand.html%20or%20css%2Fcss-grid%2Fparsing%2Fgrid-shorthand-valid.html%20or%20css%2Fcss-ui%2Fappearance-button-001.html%20or%20css%2Fcss-ui%2Fappearance-checkbox-001.html%20or%20css%2Fcss-ui%2Fappearance-cssom-001.html%20or%20css%2Fcss-ui%2Fappearance-listbox-001.html%20or%20css%2Fcss-ui%2Fappearance-menulist-001.html%20or%20css%2Fcss-ui%2Fappearance-menulist-button-001.html%20or%20css%2Fcss-ui%2Fappearance-meter-001.html%20or%20css%2Fcss-ui%2Fappearance-progress-bar-001.html%20or%20css%2Fcss-ui%2Fappearance-push-button-001.html%20or%20css%2Fcss-ui%2Fappearance-radio-001.html%20or%20css%2Fcss-ui%2Fappearance-searchfield-001.html%20or%20css%2Fcss-ui%2Fappearance-slider-horizontal-001.html%20or%20css%2Fcss-ui%2Fappearance-square-button-001.html%20or%20css%2Fcss-ui%2Fappearance-textarea-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-button-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-checkbox-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-listbox-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-menulist-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-menulist-button-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-meter-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-progress-bar-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-push-button-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-radio-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-searchfield-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-slider-horizontal-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-square-button-001.html%20or%20css%2Fcss-ui%2Fwebkit-appearance-textarea-001.html%20or%20html%2Fdom%2Fidlharness.https.html%20or%20html%2Frendering%2Fnon-replaced-elements%2Fform-controls%2Finput-placeholder-line-height.html%20or%20html%2Frendering%2Fnon-replaced-elements%2Fform-controls%2Fresets.html%20or%20html%2Frendering%2Fnon-replaced-elements%2Ftables%2Ftable-ua-stylesheet.html%20or%20html%2Frendering%2Fwidgets%2Fappearance%2Fdefault-styles.html%20or%20html%2Fsemantics%2Fforms%2Fbeforeinput.tentative.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-checkvalidity.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-reportvalidity.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-validity-badinput.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-validity-rangeoverflow.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-validity-rangeunderflow.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-validity-stepmismatch.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-validity-valid.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-validity-valuemissing.html%20or%20html%2Fsemantics%2Fforms%2Fconstraints%2Fform-validation-willvalidate.html%20or%20html%2Fsemantics%2Fforms%2Fform-submission-0%2Fconstructing-form-data-set.html%20or%20html%2Fsemantics%2Fforms%2Fform-submission-0%2Fgetactionurl.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fdefaultselection.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselect-event.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselection.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselection-not-application.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselection-not-application-textarea.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselection-start-end-extra.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselection-start-end.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Fselection-value-interactions.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Ftextarea-selection-while-parsing.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Ftextfieldselection-setrangetext.html%20or%20html%2Fsemantics%2Fforms%2Ftextfieldselection%2Ftextfieldselection-setselectionrange.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fanchor-active-contenteditable.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fcloning-steps.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fdatetime-local.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fdatetime-local-trailing-zeros.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fimage-click-form-data.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Finput-stepdown.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Finput-stepup.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Finput-untrusted-key-event.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Finput-whitespace.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fnumber.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Frange.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fselection.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fshow-picker-cross-origin-iframe.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fshow-picker-disabled-readonly.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fshow-picker.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Fshow-picker-user-gesture.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Ftime.html%20or%20html%2Fsemantics%2Fforms%2Fthe-input-element%2Ftype-change-state.html%20or%20html%2Fsemantics%2Fforms%2Fthe-selectmenu-element%2Fselectmenu-keyboard.tentative.html%20or%20html%2Fsemantics%2Fforms%2Fthe-selectmenu-element%2Fselectmenu-many-options.tentative.html%20or%20html%2Fsemantics%2Fforms%2Fthe-selectmenu-element%2Fselectmenu-parts-structure.tentative.html&run_id=5635446534045696&run_id=5728037285920768

@foolip
Copy link
Member Author

foolip commented May 24, 2022

@mfreed7 @davidsgrogan, there's an RFC up for this now at web-platform-tests/rfcs#114, with a demo, please take a look! And thanks @DanielRyanSmith for working on it, I'm very happy to see it take shape!

@foolip
Copy link
Member Author

foolip commented Aug 1, 2022

This was resolved by web-platform-tests/rfcs#114 but isn't deployed to production yet.

@DanielRyanSmith
Copy link
Contributor

Now merged to production 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interop Issues with the Interop dashboards
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants