Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show diff passes/regression numbers separately #609

Merged
merged 15 commits into from Oct 5, 2018

Conversation

Projects
None yet
4 participants
@lukebjerring
Copy link
Collaborator

commented Oct 1, 2018

Description

See #411

Separates the delta number into 2 numbers (pass increases, pass decreases), such that a new pass and a new fail don't cancel each other out.

Review Information

Visit a diff view, e.g. https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/?label=taskcluster&product=chrome[stable]&product=chrome[beta]&diff=true

@lukebjerring lukebjerring requested a review from foolip Oct 1, 2018

@wpt-pr-bot

This comment has been minimized.

Copy link

commented Oct 1, 2018

Staging instance deployed by Travis CI!
Running at https://diff-3-numbers-dot-wptdashboard-staging.appspot.com

@foolip

foolip approved these changes Oct 2, 2018

Copy link
Contributor

left a comment

Some nits, but very very nice, thank you!

@@ -104,22 +104,28 @@ func GetResultsDiff(before map[string][]int, after map[string][]int, filter Diff
if !filter.Deleted {
continue
}
diff[test] = []int{resultsBefore[1], resultsBefore[1]}
diff[test] = []int{resultsBefore[0], resultsBefore[1] - resultsBefore[0], -resultsBefore[1]}
} else {
if !filter.Changed && !filter.Unchanged {

This comment has been minimized.

Copy link
@foolip

foolip Oct 2, 2018

Contributor

Ah right, this API can filter! Can this be exposed in the URL somehow to show only regressions in follow-up PR or issue?

This comment has been minimized.

Copy link
@lukebjerring

lukebjerring Oct 2, 2018

Author Collaborator

Yeah I can do that in a follow-up.

} else {
if !filter.Changed && !filter.Unchanged {
continue
}
passDiff := abs(resultsBefore[0] - resultsAfter[0])
improved, regressed, delta := 0, 0, resultsBefore[0] - resultsAfter[0]
if delta < 0 {

This comment has been minimized.

Copy link
@foolip

foolip Oct 2, 2018

Contributor

Ah, right, this if-else branch is where the remaining limitation shows up. It's impossible for us to report both an improvement and a regression for an individual test. Maybe a comment with TODO here, link to some issue?

This comment has been minimized.

Copy link
@lukebjerring

lukebjerring Oct 2, 2018

Author Collaborator

Well, actually this issue causes the aggregation issues we're seeing (many of) below. Ultimately, we can't compute these numbers on the client (because by that stage the summary numbers have already been squashed). We'll need to append the diff arrays to the search api.

This comment has been minimized.

Copy link
@foolip

foolip Oct 2, 2018

Contributor

Now I'm a bit confused about where this data is coming from. /api/diff doesn't show up in devtools at all, how does this fit together?

In any event, it doesn't sound quite right that the aggregation bugs are due to this if-else branch at bottom. If we have the [newly-passing, newly-failing, total-delta] array of numbers for each test, then the aggregate array for some directory is just the "vector sum" of all of the test-level arrays.

What is the place where numbers are squashed in a way that isn't the "vector sum"?

This comment has been minimized.

Copy link
@lukebjerring

lukebjerring Oct 2, 2018

Author Collaborator

The diffs are being computed on the client, and I updated both the api endpoint and the client.

I'll add a commit to move to the endpoint as part of this PR soon.

countDiff := abs(resultsBefore[1] - resultsAfter[1])
changed := passDiff != 0 || countDiff != 0
changed := delta != 0 || countDiff != 0

This comment has been minimized.

Copy link
@foolip

foolip Oct 2, 2018

Contributor

Given that countDiff now isn't used for anything else, can't this be simplified to just changed := resultsAfter != resultsBefore? This does end up collapsing to just that, but it took me a while to tell.

This comment has been minimized.

Copy link
@lukebjerring

lukebjerring Oct 2, 2018

Author Collaborator

Done.

Show resolved Hide resolved webapp/components/wpt-results.html
Show resolved Hide resolved webapp/components/wpt-results.html
Show resolved Hide resolved webapp/components/wpt-results.html
@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

Here's a great view:
https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/?product=chrome%5Bstable%5D&product=chrome%5Bbeta%5D&aligned=true&diff=true

Compare that to staging:
https://staging.wpt.fyi/results/?product=chrome%5Bstable%5D&product=chrome%5Bbeta%5D&aligned=true&diff=true

The difference in how WebCryptoAPI is presented is interesting. Because those subtests were simply missing in dev, it doesn't show up as a regression. However, bug alert, the sign of the difference seems wrong, it says +2 but should be -2. And I thought I reviewed the signs of each number very carefully :/

Show resolved Hide resolved shared/run_diff.go
@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

Hmm, something seems to be wrong with how the regression numbers are aggregated too. Check this view:
https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/service-workers/service-worker?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1&diff=true

I see -3 and -1, but stepping out the aggregated number is -2.

@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/css?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1&diff=true is another case, where there's a -33 for css/vendor-imports/ which doesn't propagate up.

@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 2, 2018

I've also found a view where the results are perhaps a bit confusing:
https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/web-animations/animation-model/animation-types?sha=ee2e69bfb1&product=safari-12.0&product=safari-12.1&diff=true

Subtests that were previously missing are now running and failing. We can't fix this with the current data model, but it's something to include in a bug about subtest-level diffing.

@lukebjerring

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 2, 2018

I wrote d7d5a0f to fetch the API endpoint asynchronously. PTAL when it's deployed.

@lukebjerring lukebjerring force-pushed the diff-3-numbers branch from ea1d7f9 to cc3e0f8 Oct 2, 2018

@lukebjerring

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 2, 2018

So, I've noticed a bug with the diff endpoint that it doesn't take into account the aligned param, which causes the runs being compared to differ from those shown in the client. For this reason, I've thrown the /api/diff fetch behind a flag (see /flags to enable it).

EDIT: Now taken care of with #617, but I still see value in having the API fetch happen conditional on a flag.

@wpt-pr-bot

This comment has been minimized.

Copy link

commented Oct 2, 2018

Staging instance deployed by Travis CI!
Running at https://diff-3-numbers-dot-announcer-dot-wptdashboard-staging.appspot.com

@lukebjerring lukebjerring force-pushed the diff-3-numbers branch from cbc49ed to 5f64e5d Oct 3, 2018

@lukebjerring lukebjerring force-pushed the diff-3-numbers branch from 5f64e5d to ba10765 Oct 3, 2018

lukebjerring added some commits Oct 4, 2018

@lukebjerring

This comment has been minimized.

Copy link
Collaborator Author

commented Oct 4, 2018

I've added a filter icon in 0ad3456 if you visit /flags and enable it. I feel like this, if we're happy with it, would cover #411 ?

@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

https://diff-3-numbers-dot-announcer-dot-wptdashboard-staging.appspot.com/ is 404 right now, was it garbage collected too soon, or never deployed?

@Hexcles

This comment has been minimized.

Copy link
Member

commented Oct 4, 2018

@foolip you clicked the wrong link. That's the announcer.

@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

@Hexcles it's the link in #609 (comment). Is that itself a bug? I've found the right URL, can now review :)

@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

Oh, wait, I guess this is staging instances of the different services. That was not easy to notice :)

@Hexcles

This comment has been minimized.

Copy link
Member

commented Oct 4, 2018

@foolip there should be two comments with two links to the webapp and announcer respectively. They are usually next to each other, but somehow it wasn't the case in this PR.

@foolip

This comment has been minimized.

Copy link
Contributor

commented Oct 4, 2018

Cool, aggregation issues seem to be fixed now.

https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/?sha=ee2e69bfb1&product=safari-11.1&product=safari-12.0&diff is a pretty complex diff (the one sent to webkit-dev) which is now more fun to look at. There are a lot of red numbers because more tests are running and failing. I think this is a good default mode, but perhaps some filtering make it only count PASS→not-PASS later.

BUG: with /api/diff enabled, https://diff-3-numbers-dot-wptdashboard-staging.appspot.com/results/css/css-text/i18n/zh?label=taskcluster&product=chrome[stable]&product=chrome[beta]&diff=true has very wrong numbers, and this aggregates into a seeming huge regression in css/css-text/ which actually has no new failures.

@foolip

foolip approved these changes Oct 4, 2018

Copy link
Contributor

left a comment

Just some nits, and /api/diff still seems to have bugs, but that's behind a flag so that's OK

</paper-tooltip>
</template>
</template>
<template is="dom-if" if="{{ isDiff(testRun.browser_name }}">

This comment has been minimized.

Copy link
@foolip

foolip Oct 4, 2018

Contributor

This tooltip shows up as part of the browser tooltips too, which I think isn't intended. I don't even see how it's possible given that the conditions are negated. Found Polymer/polymer#2165 while looking for a construct that would make the bug impossible.

This comment has been minimized.

Copy link
@lukebjerring

lukebjerring Oct 5, 2018

Author Collaborator

Ah; missing a trailing )

</template>
<template is="dom-if" if="{{ isDiff(testRun.browser_name }}">
diff numbers are for:<br>
[Newly passing] / [Newly failing] / [Total count delta]

This comment has been minimized.

Copy link
@foolip

foolip Oct 4, 2018

Contributor

Mind if I lowercase there? Looks a bit odd, to my eyes.

<th>
<test-run test-run="[[diffRun]]"></test-run>
<template is="dom-if" if="[[diffFilter]]">
<paper-icon-button icon="filter-list" onclick="[[toggleDiffFilter]]"></paper-icon-button>

This comment has been minimized.

Copy link
@foolip

foolip Oct 4, 2018

Contributor

This is great! One improvement would be to give this some state or button feel, so that one can tell that clicking it does something.

@@ -353,6 +364,7 @@ <h3>
type: Boolean,
value: false,
},
onlyShowDifferences: Boolean,

This comment has been minimized.

Copy link
@foolip

foolip Oct 4, 2018

Contributor

Future feature request: only show regressions (not sure about UI)

@@ -482,26 +511,45 @@ <h3>
revision: 'diff',
browser_name: 'diff',
};
this.fetchDiff();
}
})
);
}
async fetchResults() {

This comment has been minimized.

Copy link
@foolip

foolip Oct 4, 2018

Contributor

Using async here still looks like a bug. Even if this.load return a promise (not checked) it's not returned, so fetchResults() would look like Promise.resolve(undefined) with side-effects happening after the promise is settled.

);
}
async fetchDiff() {

This comment has been minimized.

Copy link
@foolip

foolip Oct 4, 2018

Contributor

async oddity here too.

foolip and others added some commits Oct 4, 2018

@lukebjerring lukebjerring merged commit 2ed638d into master Oct 5, 2018

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details

@lukebjerring lukebjerring deleted the diff-3-numbers branch Oct 5, 2018

@lukebjerring lukebjerring referenced this pull request Oct 9, 2018

Merged

Fix single-item diffs #629

mdittmer added a commit that referenced this pull request Oct 9, 2018

Fix single-item diffs (#629)
Fixes single-product views, broken by #609
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.