sanitycheck doesn't keep my cores busy #24652

andrewboie · 2020-04-23T19:42:26Z

Describe the bug
There was a recent change to sanitycheck to no longer use a GNU Make jobserver to parcel out tasks, in favor of scheduling jobs directly in the Python code. Unfortunately, this has introduced a performance regression on my build machine with 16 cores/32 threads.

What I am seeing is that sanitycheck will parcel out jobs to the CPUs to build/run, but seems to be waiting for all of them to complete for sending new work, rather than scheduling new work on demand as individual jobs get done.

This specifically seems to break down for test cases that time out. What I am seeing is that my CPU usage will spike up, then drop down to 0 for quite a while, then spike up again.

To Reproduce

Modify a test case, or a few test cases, such that they will hang forever instead of completing
Run sanitycheck on a machine with lots of cores. On my 32 thread machine, I ran sanitycheck -j48 -p qemu_x86_64 -T tests/kernel
Monitor CPU usage during the run. Note periods of extremely low CPU activity while sanitycheck sits there waiting for a test to time out

Expected behavior
CPU cores at or near full utilization during the sanitycheck run, tapering off only at the end when work to do runs out.

Impact
Wasted developer time.
Prolonged CI jobs.

Additional context
I can try to prepare a branch which demonstrates this, if desired.

The text was updated successfully, but these errors were encountered:

carlescufi · 2020-04-23T20:03:30Z

Could this be the fault of the GIL?

andrewboie · 2020-04-23T22:47:23Z

Could this be the fault of the GIL?

Don't think so, unless the GIL routinely gets held for many seconds at a time.
There seems to be custom job scheduling code, there may be an opportunity to simplify it using ProcessPoolExecutor (see https://docs.python.org/3/library/concurrent.futures.html)

stephanosio · 2020-04-24T00:08:23Z

Had the same problem when running sanitycheck on my 224-core machines.

I see short bursts of 100% CPU usage; but, for most of the time, it stays at not even 10% -- which is an enormous waste of the expensive CPU time.

At least, it is much better for now to have 14x 16-core machines than single 224-core super machine.

nashif · 2020-04-27T02:52:25Z

I think I found the issue, looking into it.

github-actions · 2020-06-30T00:32:51Z

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

andrewboie · 2020-07-15T23:11:44Z

sigh. this is not stale.

nashif · 2020-09-13T23:44:01Z

I think I finally got something here, work in progress.

andrewboie added the bug The issue is a bug, or the PR is fixing a bug label Apr 23, 2020

andrewboie assigned nashif Apr 23, 2020

nashif added area: Sanitycheck Sanitycheck has been renamed to Twister priority: low Low impact/importance bug labels Apr 27, 2020

carlescufi added the area: Test Framework Issues related not to a particular test, but to the framework instead label Apr 30, 2020

github-actions bot added the Stale label Jun 30, 2020

github-actions bot closed this as completed Jul 14, 2020

andrewboie reopened this Jul 15, 2020

github-actions bot removed the Stale label Jul 16, 2020

nashif added the In progress For PRs: is work in progress and should not be merged yet. For issues: Is being worked on label Sep 13, 2020

nashif mentioned this issue Sep 14, 2020

Various sanitycheck fixes and improvements.. #28372

Merged

1 task

nashif closed this as completed in #28372 Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sanitycheck doesn't keep my cores busy #24652

sanitycheck doesn't keep my cores busy #24652

andrewboie commented Apr 23, 2020 •

edited

Loading

carlescufi commented Apr 23, 2020

andrewboie commented Apr 23, 2020

stephanosio commented Apr 24, 2020

nashif commented Apr 27, 2020

github-actions bot commented Jun 30, 2020

andrewboie commented Jul 15, 2020

nashif commented Sep 13, 2020 •

edited

Loading

sanitycheck doesn't keep my cores busy #24652

sanitycheck doesn't keep my cores busy #24652

Comments

andrewboie commented Apr 23, 2020 • edited Loading

carlescufi commented Apr 23, 2020

andrewboie commented Apr 23, 2020

stephanosio commented Apr 24, 2020

nashif commented Apr 27, 2020

github-actions bot commented Jun 30, 2020

andrewboie commented Jul 15, 2020

nashif commented Sep 13, 2020 • edited Loading

andrewboie commented Apr 23, 2020 •

edited

Loading

nashif commented Sep 13, 2020 •

edited

Loading