-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sanitycheck doesn't keep my cores busy #24652
Comments
Could this be the fault of the GIL? |
Don't think so, unless the GIL routinely gets held for many seconds at a time. |
Had the same problem when running I see short bursts of 100% CPU usage; but, for most of the time, it stays at not even 10% -- which is an enormous waste of the expensive CPU time. At least, it is much better for now to have 14x 16-core machines than single 224-core super machine. |
I think I found the issue, looking into it. |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
sigh. this is not stale. |
I think I finally got something here, work in progress. |
Describe the bug
There was a recent change to sanitycheck to no longer use a GNU Make jobserver to parcel out tasks, in favor of scheduling jobs directly in the Python code. Unfortunately, this has introduced a performance regression on my build machine with 16 cores/32 threads.
What I am seeing is that sanitycheck will parcel out jobs to the CPUs to build/run, but seems to be waiting for all of them to complete for sending new work, rather than scheduling new work on demand as individual jobs get done.
This specifically seems to break down for test cases that time out. What I am seeing is that my CPU usage will spike up, then drop down to 0 for quite a while, then spike up again.
To Reproduce
sanitycheck -j48 -p qemu_x86_64 -T tests/kernel
Expected behavior
CPU cores at or near full utilization during the sanitycheck run, tapering off only at the end when work to do runs out.
Impact
Wasted developer time.
Prolonged CI jobs.
Additional context
I can try to prepare a branch which demonstrates this, if desired.
The text was updated successfully, but these errors were encountered: