Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sanitycheck: don't multiply CPU count #17276

Conversation

andrewboie
Copy link
Contributor

We have a number of timing sensitive tests which run
correctly on a much more frequent basis if the system
is not so heavily loaded. Instead of squeezing a few
more crumbs of performance by doubling the CPU count,
just use the number of CPUs reported by the system.

Signed-off-by: Andrew Boie andrew.p.boie@intel.com

@andrewboie andrewboie requested a review from nashif as a code owner July 3, 2019 00:05
@zephyrbot zephyrbot added the area: Sanitycheck Sanitycheck has been renamed to Twister label Jul 3, 2019
@@ -226,7 +226,7 @@ LAST_SANITY_XUNIT = os.path.join(ZEPHYR_BASE, "scripts", "sanity_chk",
"last_sanity.xml")
RELEASE_DATA = os.path.join(ZEPHYR_BASE, "scripts", "sanity_chk",
"sanity_last_release.csv")
JOBS = multiprocessing.cpu_count() * 2
JOBS = multiprocessing.cpu_count()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the *2 in -b mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

We have a number of timing sensitive tests which run
correctly on a much more frequent basis if the system
is not so heavily loaded. Instead of squeezing a few
more crumbs of performance by doubling the CPU count,
just use the number of CPUs reported by the system.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
@andrewboie andrewboie force-pushed the sanitycheck-dont-multiply-cpu-count branch from 7051224 to d5651fd Compare July 3, 2019 00:43
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 3, 2019

I'm lucky enough to have access to a system with about ~100 cores (hi @andyross). I did some testing on it and noticed sanitycheck performance caps after JOBS=+-50. That's what I've been using on this system to minimize timeouts and upset coworkers. Pretty sure it's because storage becomes the bottleneck past that point. The number of cores is very crude because it obviously ignores storage and no I don't have anything better to suggest, just making sure everyone's expectations stay very low.

cc @dcpleung who's been playing with RAM disks and/or tmpfs (on my list)

Can we keep the *2 in -b mode?

Do you have some evidence it helps? Keep in mind that sanitycheck starts many more threads than JOBS, 2000+ Python threads for a start in the default settings. See discussion in #17239.

If nothing else the v2 of this patch makes it more obvious that there is a --jobs option!

Even though they sometimes vary, CI timings will / would have been interesting.

@andrewboie
Copy link
Contributor Author

andrewboie commented Jul 3, 2019

Can we keep the *2 in -b mode?

Do you have some evidence it helps? Keep in mind that sanitycheck starts many more threads than JOBS, 2000+ Python threads for a start in the default settings. See discussion in #17239.

-b mode is --build-only, no emulators are spawned, we don't get spurious test execution failures which this patch is trying to help mitigate. So keeping the double amount for this scenario is reasonable to me.

This patch is not a fix for #17239. That fails for me even with -j12. It appears to be a function of the number of test cases selected.

What this patch is doing is adding some default stability until #14173 is resolved (which may take some time, we've been fighting it for years)

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-b mode is --build-only, no emulators are spawned, we don't get spurious test execution failures which this patch is trying to help mitigate. So keeping the double amount for this scenario is reasonable to me.

Even if *2 is not great for --build-only either, you're right better change/address one thing at a time.

(I'm still interested in @galak's numbers if any)

This patch is not a fix for #17239.

Not a fix but maybe it will mitigate a bit. That was just some additional background anyway.

That fails for me even with -j12. It appears to be a function of the number of test cases selected.

I'm looking right now at yet another run with JOBS=30 and 2190 tests without any issue. I never hit #17239 yet, strange.

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 3, 2019

Just spotted this nit sorry for the extra email:

- sanitycheck: don't multiply CPU count 
+ sanitycheck: lower JOBS default _when running tests_

@nashif
Copy link
Member

nashif commented Jul 3, 2019

This patch is not a fix for #17239. That fails for me even with -j12. It appears to be a function of the number of test cases selected.

The main reason for the issue in #17239 is that we have added yet another qemu platform recently and the number of tests has also increased basically hitting the maximum open files (each qemu thread open three files at least, multiplied by the number of platforms multiplied by the number of cases that need to run...)

Removing one of the default qemu platforms resolves the issue... or you increase the max. number of open files to 4096 :)

@nashif nashif merged commit 9f4f57e into zephyrproject-rtos:master Jul 3, 2019
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 3, 2019

The main reason for the issue in #17239 is that we have added yet another qemu platform recently

I don't know if you were just referring to qemu_x86_coverage but in any case I think you just nailed it: I carry a private hack to exclude qemu_x86_coverage because it's not actually a platform and breaks stuff I'm doing, long story in #15886.

That's very likely why I haven't hit this, thanks.

@andrewboie andrewboie deleted the sanitycheck-dont-multiply-cpu-count branch October 2, 2019 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Sanitycheck Sanitycheck has been renamed to Twister
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants