New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Twister jobserver support eliminates parallel build for me #54289
Comments
I think this is a misunderstanding, before the patch one set of compilation tasks would spawn as many compilation processes as you have cores on a system, so building one application would use 16 threads if you had 16 cores, but combined with twister, if you specified Perhaps what would be good is a way to control how many compilation tasks can run in parallel for each concurrent build, so |
I don't think I misunderstood the code. I used
We used to have this option by using In a perfect world, twister would coordinate with ninja to keep the maximum number of parallel processes as close as practical to the specified value, but (as has been noticed by @nashif already), ninja hasn't merged that yet (ninja-build/ninja#1139). |
Are you using the patched version of ninja, not the default/normal/repo one? Because this sounds like you're not using the patched version |
ah, there's a magic version of ninja somewhere? Doesn't seem to be in the SDK version I'm running. I figured it was something simple like that, but couldn't figure out where to look. |
Yes, I made the same mistake of trying this and wondering why everything looked broken until someone pointed out you need the kitware version of ninja https://github.com/Kitware/ninja @jeremybettis would probably be a good idea to update the getting started guide, twister documentation and release notes with links on how to get this special version, because right now it's not documented anywhere (ideally for the 3.3 release) |
yeah, seems like a pretty important note. would be nice if this custom version of twister was supplied with the SDK. |
@stephanosio for info too |
Hrm. Even with the custom version of ninja, my machine is completely underwhelmed when running twister with the jobserver support enabled. It never seems to run multiple cmake instances in parallel anymore, and as cmake is the most expensive part of the build, that means the machine never gets up a full head of steam. |
I'll need to check again tomorrow but pretty sure when I checked earlier, I saw as many python processes (1 for each build) as the |
Works for me, using command
Checked that it's different projects:
|
Yup, I get the correct number of python processes, but only ever one cmake. I must have some local configuration broken here. I'm running directly from main:
I just built ninja from the kitware sources:
A mystery. |
ah ha! I had |
It should be possible to make the job server optional or at least configurable, right now we just enable it for Linux. If this is an option that would help, please shout. I spent some time benchmarking this and main reason this was introduced is ninja, if you build with make, you will not get into the original problem the job-server was trying to solve. Also, you get the patched ninja with pip, no need to build it yourself :) |
jobserver is a huge improvement now that I've figured out what the core issue was, but for people without the interest/ability to get a fixed ninja, it might be nice to have it optional. The other issue was the stale (ah, legacy) MAKEFLAGS environment variable that messed up my fine ninja build by disabling the parallel cmake invocations. It might be nice to figure out how to detect/fix that issue, but it's definitely not a blocker. So, it seems like there are two (minor) issues here:
I'm running a twister build and things seem to be going well -- it's holding a load average of around 64, which is optimal for this machine. |
Anas commented:
(shouts very loudly) Keith commented:
That would require users to manually update their PATH to point to the Zephyr SDK ninja bin dir, which will not fit nicely into the current CMake package-based automagical discovery of Zephyr SDK.
I think this is a more practical solution given that the Kitware version of ninja-build can be installed through pip. |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
Describe the bug
The new Linux jobserver support in twister looks like it should have improved my twister runtimes by avoiding huge load spikes, but instead it has eliminated almost all parallelism during the run. Only one test appears to run at a time.
Please also mention any information which could help others to understand
the problem you're facing:
specific commit? Yes, as above, reverting the three jobserver-related patches restores the previous behavior and performance.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I expect 25 cmake processes followed by hundreds of compilation processes to keep the build machine busy for about three hours for the complete twister test.
Impact
I am unable to complete twister test runs in a reasonable amount of time; I abandoned the test run after eight hours.
Environment (please complete the following information):
Additional context
I'm running tests on a 64-thread system, so I would expect a load average of around 200 for much of this test.
The text was updated successfully, but these errors were encountered: