Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Azure Pipelines] reduce Edge parallel jobs from 20 to 10 #18448

Merged
merged 1 commit into from Aug 20, 2019

Conversation

@foolip
Copy link
Member

foolip commented Aug 15, 2019

20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.

@wpt-pr-bot wpt-pr-bot requested a review from jgraham Aug 15, 2019
@foolip foolip requested review from mustjab and thejohnjansen and removed request for jgraham Aug 15, 2019
@foolip
Copy link
Member Author

foolip commented Aug 15, 2019

Sent this PR because I spotted the outdated comment while doing other things today, but I think it's probably best to resolve #18397 before landing this, as it might introduce new ways of failing.

@wpt-pr-bot wpt-pr-bot added the infra label Aug 15, 2019
@wpt-pr-bot wpt-pr-bot requested a review from jgraham Aug 15, 2019
@foolip foolip force-pushed the foolip/azure-edge-parallelism branch from 9ad77ff to e31aa28 Aug 16, 2019
@foolip
Copy link
Member Author

foolip commented Aug 16, 2019

I've started a full run of Edge Dev and Canary in https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27491 to see how long it takes and if the results are affected.

@foolip
Copy link
Member Author

foolip commented Aug 16, 2019

Diff between the runs with 20 and 10 jobs:

There are some shared regressions there that probably aren't because of flakiness, but rather because of test order dependence. There are tests that are fixed by fewer shards, but not as many as the regressions. This makes sense as more jobs means more isolation and less chance for state interference.

More interestingly, the overall run time is slower now. It looks like we actually have 15 agents now, so more capacity than I thought. @mustjab what amount of parallelism do you think we should use? Just leave it at 20 and update the comment?

@foolip foolip force-pushed the foolip/azure-edge-parallelism branch from e31aa28 to 79c01fd Aug 16, 2019
@mustjab
Copy link
Contributor

mustjab commented Aug 19, 2019

@foolip we have now allocated 20 VMs for Windows pipeline and can increase number to 20, if you think it will improve job stability.

@foolip
Copy link
Member Author

foolip commented Aug 20, 2019

@mustjab the number of jobs is already 20, in this PR I tried to reduce the number since it seemed unnecessary / based on the needs for EdgeHTML.

Since we're now running both Edge Dev and Canary, the number of VMs to run them all at once would be 40. I checked https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=27966 and it looks like what happened is that first all the Canary jobs ran, and then the Dev jobs started as the Canary jobs finished and made VMs available. The net effect is that Dev started and ended about an hour later than Canary.

Reducing the number to 10 would mean that they all start at the same time, but take about twice as long to finish.

In the end I don't think it matters all that much, but decreasing the number of jobs to match the available VMs is probably makes more sense. So, consider this open for review.

20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.
@foolip foolip force-pushed the foolip/azure-edge-parallelism branch from 79c01fd to 9674814 Aug 20, 2019
@foolip
Copy link
Member Author

foolip commented Aug 20, 2019

Started https://dev.azure.com/web-platform-tests/wpt/_build/results?buildId=28010 to see how this will look now with 20 VMs.

@foolip
Copy link
Member Author

foolip commented Aug 20, 2019

These are the runs: https://wpt.fyi/results/?run_id=306270015&run_id=273340005

They took 1.7 and 1.8 hours to run, so looking good. I'll merge this and check if the scheduled runs then also look good.

@foolip foolip merged commit 4b0d632 into master Aug 20, 2019
34 checks passed
34 checks passed
website-build-and-publish
Details
manifest-build-and-tag
Details
update-pr-preview
Details
Azure Pipelines Build #20190820.72 succeeded
Details
Azure Pipelines (./wpt test-jobs) ./wpt test-jobs succeeded
Details
Azure Pipelines (affected tests without changes: Safari Technology Preview) affected tests without changes: Safari Technology Preview succeeded
Details
Azure Pipelines (affected tests: Safari Technology Preview) affected tests: Safari Technology Preview succeeded
Details
Azure Pipelines (all tests: Edge Canary 1) all tests: Edge Canary 1 succeeded
Details
Azure Pipelines (all tests: Edge Canary 10) all tests: Edge Canary 10 succeeded
Details
Azure Pipelines (all tests: Edge Canary 2) all tests: Edge Canary 2 succeeded
Details
Azure Pipelines (all tests: Edge Canary 3) all tests: Edge Canary 3 succeeded
Details
Azure Pipelines (all tests: Edge Canary 4) all tests: Edge Canary 4 succeeded
Details
Azure Pipelines (all tests: Edge Canary 5) all tests: Edge Canary 5 succeeded
Details
Azure Pipelines (all tests: Edge Canary 6) all tests: Edge Canary 6 succeeded
Details
Azure Pipelines (all tests: Edge Canary 7) all tests: Edge Canary 7 succeeded
Details
Azure Pipelines (all tests: Edge Canary 8) all tests: Edge Canary 8 succeeded
Details
Azure Pipelines (all tests: Edge Canary 9) all tests: Edge Canary 9 succeeded
Details
Azure Pipelines (all tests: Edge Dev 1) all tests: Edge Dev 1 succeeded
Details
Azure Pipelines (all tests: Edge Dev 10) all tests: Edge Dev 10 succeeded
Details
Azure Pipelines (all tests: Edge Dev 2) all tests: Edge Dev 2 succeeded
Details
Azure Pipelines (all tests: Edge Dev 3) all tests: Edge Dev 3 succeeded
Details
Azure Pipelines (all tests: Edge Dev 4) all tests: Edge Dev 4 succeeded
Details
Azure Pipelines (all tests: Edge Dev 5) all tests: Edge Dev 5 succeeded
Details
Azure Pipelines (all tests: Edge Dev 6) all tests: Edge Dev 6 succeeded
Details
Azure Pipelines (all tests: Edge Dev 7) all tests: Edge Dev 7 succeeded
Details
Azure Pipelines (all tests: Edge Dev 8) all tests: Edge Dev 8 succeeded
Details
Azure Pipelines (all tests: Edge Dev 9) all tests: Edge Dev 9 succeeded
Details
Azure Pipelines (wpt.fyi hook: edge-canary-results) wpt.fyi hook: edge-canary-results succeeded
Details
Azure Pipelines (wpt.fyi hook: edge-dev-results) wpt.fyi hook: edge-dev-results succeeded
Details
Azure Pipelines (wpt.fyi hook: safari-preview-affected-tests) wpt.fyi hook: safari-preview-affected-tests succeeded
Details
Azure Pipelines (wpt.fyi hook: safari-preview-affected-tests-without-changes) wpt.fyi hook: safari-preview-affected-tests-without-changes succeeded
Details
Taskcluster (pull_request) TaskGroup: success
Details
staging.wpt.fyi - safari[experimental] Safari results
Details
wpt.fyi - safari[experimental] Safari results
Details
@foolip foolip deleted the foolip/azure-edge-parallelism branch Aug 20, 2019
@foolip
Copy link
Member Author

foolip commented Aug 21, 2019

The first set of aligned runs after this change:
https://wpt.fyi/results/?run_id=300520016&run_id=290970009&run_id=296660009&run_id=285010009&run_id=290980002&run_id=283450002&run_id=276780005&run_id=283400010

Edge took 1.7/1.8h, which is also the time Safari took, so this seems pretty good.

@foolip
Copy link
Member Author

foolip commented Aug 21, 2019

@foolip
Copy link
Member Author

foolip commented Aug 21, 2019

However, in the second scheduled run after this landed, Edge Canary filed. I've filed #18583 and suggest reverting this if it happens again.

@thejohnjansen
Copy link
Contributor

thejohnjansen commented Aug 21, 2019

@foolip thanks for the head's up. @mustjab is on a vacation right now, but he'll take a look when he gets back if this continues to fail.

natechapin added a commit to natechapin/wpt that referenced this pull request Aug 23, 2019
…rm-tests#18448)

20 was chosen to make each job of a full EdgeHTML run fast enough, but
with Chromium-based Edge each job now finishes in <1h. Each job has some
overhead, so decrease the number of jobs to 10.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.