Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitycheck doesn't keep my cores busy #24652

Closed
andrewboie opened this issue Apr 23, 2020 · 7 comments · Fixed by #28372
Closed

sanitycheck doesn't keep my cores busy #24652

andrewboie opened this issue Apr 23, 2020 · 7 comments · Fixed by #28372
Assignees
Labels
area: Sanitycheck Sanitycheck has been renamed to Twister area: Test Framework Issues related not to a particular test, but to the framework instead bug The issue is a bug, or the PR is fixing a bug In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on priority: low Low impact/importance bug

Comments

@andrewboie
Copy link
Contributor

andrewboie commented Apr 23, 2020

Describe the bug
There was a recent change to sanitycheck to no longer use a GNU Make jobserver to parcel out tasks, in favor of scheduling jobs directly in the Python code. Unfortunately, this has introduced a performance regression on my build machine with 16 cores/32 threads.

What I am seeing is that sanitycheck will parcel out jobs to the CPUs to build/run, but seems to be waiting for all of them to complete for sending new work, rather than scheduling new work on demand as individual jobs get done.

This specifically seems to break down for test cases that time out. What I am seeing is that my CPU usage will spike up, then drop down to 0 for quite a while, then spike up again.

To Reproduce

  1. Modify a test case, or a few test cases, such that they will hang forever instead of completing
  2. Run sanitycheck on a machine with lots of cores. On my 32 thread machine, I ran sanitycheck -j48 -p qemu_x86_64 -T tests/kernel
  3. Monitor CPU usage during the run. Note periods of extremely low CPU activity while sanitycheck sits there waiting for a test to time out

Expected behavior
CPU cores at or near full utilization during the sanitycheck run, tapering off only at the end when work to do runs out.

Impact
Wasted developer time.
Prolonged CI jobs.

Additional context
I can try to prepare a branch which demonstrates this, if desired.

@andrewboie andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Apr 23, 2020
@carlescufi
Copy link
Member

Could this be the fault of the GIL?

@andrewboie
Copy link
Contributor Author

Could this be the fault of the GIL?

Don't think so, unless the GIL routinely gets held for many seconds at a time.
There seems to be custom job scheduling code, there may be an opportunity to simplify it using ProcessPoolExecutor (see https://docs.python.org/3/library/concurrent.futures.html)

@stephanosio
Copy link
Member

Had the same problem when running sanitycheck on my 224-core machines.

I see short bursts of 100% CPU usage; but, for most of the time, it stays at not even 10% -- which is an enormous waste of the expensive CPU time.

At least, it is much better for now to have 14x 16-core machines than single 224-core super machine.

@nashif nashif added area: Sanitycheck Sanitycheck has been renamed to Twister priority: low Low impact/importance bug labels Apr 27, 2020
@nashif
Copy link
Member

nashif commented Apr 27, 2020

I think I found the issue, looking into it.

@carlescufi carlescufi added the area: Test Framework Issues related not to a particular test, but to the framework instead label Apr 30, 2020
@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions github-actions bot added the Stale label Jun 30, 2020
@andrewboie andrewboie reopened this Jul 15, 2020
@andrewboie
Copy link
Contributor Author

sigh. this is not stale.

@github-actions github-actions bot removed the Stale label Jul 16, 2020
@nashif nashif added the In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on label Sep 13, 2020
@nashif
Copy link
Member

nashif commented Sep 13, 2020

I think I finally got something here, work in progress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Sanitycheck Sanitycheck has been renamed to Twister area: Test Framework Issues related not to a particular test, but to the framework instead bug The issue is a bug, or the PR is fixing a bug In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants