sanitycheck: don't multiply CPU count #17276

andrewboie · 2019-07-03T00:05:07Z

We have a number of timing sensitive tests which run
correctly on a much more frequent basis if the system
is not so heavily loaded. Instead of squeezing a few
more crumbs of performance by doubling the CPU count,
just use the number of CPUs reported by the system.

Signed-off-by: Andrew Boie andrew.p.boie@intel.com

galak · 2019-07-03T00:30:22Z

scripts/sanitycheck

@@ -226,7 +226,7 @@ LAST_SANITY_XUNIT = os.path.join(ZEPHYR_BASE, "scripts", "sanity_chk",
                                 "last_sanity.xml")
 RELEASE_DATA = os.path.join(ZEPHYR_BASE, "scripts", "sanity_chk",
                            "sanity_last_release.csv")
-JOBS = multiprocessing.cpu_count() * 2
+JOBS = multiprocessing.cpu_count()


Can we keep the *2 in -b mode?

We have a number of timing sensitive tests which run correctly on a much more frequent basis if the system is not so heavily loaded. Instead of squeezing a few more crumbs of performance by doubling the CPU count, just use the number of CPUs reported by the system. Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>

marc-hb · 2019-07-03T01:01:35Z

I'm lucky enough to have access to a system with about ~100 cores (hi @andyross). I did some testing on it and noticed sanitycheck performance caps after JOBS=+-50. That's what I've been using on this system to minimize timeouts and upset coworkers. Pretty sure it's because storage becomes the bottleneck past that point. The number of cores is very crude because it obviously ignores storage and no I don't have anything better to suggest, just making sure everyone's expectations stay very low.

cc @dcpleung who's been playing with RAM disks and/or tmpfs (on my list)

Can we keep the *2 in -b mode?

Do you have some evidence it helps? Keep in mind that sanitycheck starts many more threads than JOBS, 2000+ Python threads for a start in the default settings. See discussion in #17239.

If nothing else the v2 of this patch makes it more obvious that there is a --jobs option!

Even though they sometimes vary, CI timings will / would have been interesting.

andrewboie · 2019-07-03T01:09:53Z

Can we keep the *2 in -b mode?

Do you have some evidence it helps? Keep in mind that sanitycheck starts many more threads than JOBS, 2000+ Python threads for a start in the default settings. See discussion in #17239.

-b mode is --build-only, no emulators are spawned, we don't get spurious test execution failures which this patch is trying to help mitigate. So keeping the double amount for this scenario is reasonable to me.

This patch is not a fix for #17239. That fails for me even with -j12. It appears to be a function of the number of test cases selected.

What this patch is doing is adding some default stability until #14173 is resolved (which may take some time, we've been fighting it for years)

marc-hb

-b mode is --build-only, no emulators are spawned, we don't get spurious test execution failures which this patch is trying to help mitigate. So keeping the double amount for this scenario is reasonable to me.

Even if *2 is not great for --build-only either, you're right better change/address one thing at a time.

(I'm still interested in @galak's numbers if any)

This patch is not a fix for #17239.

Not a fix but maybe it will mitigate a bit. That was just some additional background anyway.

That fails for me even with -j12. It appears to be a function of the number of test cases selected.

I'm looking right now at yet another run with JOBS=30 and 2190 tests without any issue. I never hit #17239 yet, strange.

marc-hb · 2019-07-03T02:10:20Z

Just spotted this nit sorry for the extra email:

- sanitycheck: don't multiply CPU count 
+ sanitycheck: lower JOBS default _when running tests_

nashif · 2019-07-03T02:20:50Z

This patch is not a fix for #17239. That fails for me even with -j12. It appears to be a function of the number of test cases selected.

The main reason for the issue in #17239 is that we have added yet another qemu platform recently and the number of tests has also increased basically hitting the maximum open files (each qemu thread open three files at least, multiplied by the number of platforms multiplied by the number of cases that need to run...)

Removing one of the default qemu platforms resolves the issue... or you increase the max. number of open files to 4096 :)

marc-hb · 2019-07-03T03:16:10Z

The main reason for the issue in #17239 is that we have added yet another qemu platform recently

I don't know if you were just referring to qemu_x86_coverage but in any case I think you just nailed it: I carry a private hack to exclude qemu_x86_coverage because it's not actually a platform and breaks stuff I'm doing, long story in #15886.

That's very likely why I haven't hit this, thanks.

andrewboie requested a review from nashif as a code owner July 3, 2019 00:05

andrewboie requested review from carlescufi, marc-hb, galak and ioannisg July 3, 2019 00:05

zephyrbot added the area: Sanitycheck Sanitycheck has been renamed to Twister label Jul 3, 2019

galak reviewed Jul 3, 2019

View reviewed changes

andrewboie force-pushed the sanitycheck-dont-multiply-cpu-count branch from 7051224 to d5651fd Compare July 3, 2019 00:43

marc-hb approved these changes Jul 3, 2019

View reviewed changes

nashif merged commit 9f4f57e into zephyrproject-rtos:master Jul 3, 2019

andrewboie deleted the sanitycheck-dont-multiply-cpu-count branch October 2, 2019 18:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sanitycheck: don't multiply CPU count #17276

sanitycheck: don't multiply CPU count #17276

andrewboie commented Jul 3, 2019

galak Jul 3, 2019

andrewboie Jul 3, 2019

marc-hb commented Jul 3, 2019

andrewboie commented Jul 3, 2019 •

edited

Loading

marc-hb left a comment

marc-hb commented Jul 3, 2019

nashif commented Jul 3, 2019

marc-hb commented Jul 3, 2019

sanitycheck: don't multiply CPU count #17276

sanitycheck: don't multiply CPU count #17276

Conversation

andrewboie commented Jul 3, 2019

galak Jul 3, 2019

Choose a reason for hiding this comment

andrewboie Jul 3, 2019

Choose a reason for hiding this comment

marc-hb commented Jul 3, 2019

andrewboie commented Jul 3, 2019 • edited Loading

marc-hb left a comment

Choose a reason for hiding this comment

marc-hb commented Jul 3, 2019

nashif commented Jul 3, 2019

marc-hb commented Jul 3, 2019

andrewboie commented Jul 3, 2019 •

edited

Loading