Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible race condition with -j and lots of quick tests #98

Closed
vpax opened this issue Sep 6, 2023 · 7 comments
Closed

Possible race condition with -j and lots of quick tests #98

vpax opened this issue Sep 6, 2023 · 7 comments

Comments

@vpax
Copy link

vpax commented Sep 6, 2023

I'm using btest 1.1 (on macOS Ventura) in a directory with a lot (~50) of quick-executing tests, each reading a PCAP. Most of the time they succeed, but every few runs there's a failure like this:

mytests.FOO ... failed
  % 'zeek -r $TRACES/myanonpcap.pcap $PACKAGE /myfilesystem/testing/.tmp/mytests.FOO failed unexpectedly (exit code -6)
  % cat .stderr
  libc++abi: terminating due to uncaught exception of type ghc::filesystem::filesystem_error: No such file or directory: '/myfilesystem/testing/.tmp/mytests.BAR'

Note that here test FOO is failing, but the actual .stderr complaint is about a different test BAR. If I use -t to keep all the temporary directories, the BAR directory is in fact present, at least when btest finishes.

Across repeated runs, both FOO and BAR vary, so this doesn't appear to be an issue with the actual tests. In addition, if I run serially, or with a low -j 5 setting, I'm unable to make the problem occur. Both of these point to some sort of race condition when creating the .tmp/ subdirectories.

Let me know if there's more I can provide to help diagnose this.

@vpax vpax added the Type: Bug label Sep 6, 2023
@awelzel
Copy link
Contributor

awelzel commented Sep 7, 2023

libc++abi: terminating due to uncaught exception of type ghc::filesystem::filesystem_error: No such file or directory: '/myfilesystem/testing/.tmp/mytests.BAR'

Are you able to get a core-dump / backtrace for the crashing zeek process? What Zeek version is it? Is this a plugin or package being tested? Could you share the values of ZEEK_PLUGIN_PATH / ZEEKPATH used during testing and possibly the concrete directory layout when running the tests.

The reason is the following (pure) speculation:

If you have a liberal/open ZEEKPATH or possibly ZEEK_PLUGIN_PATH set in btest.cfg that isn't pointing into a confined build/ or scripts/, but instead into the top-level plugin/package directory, Zeek during startup may traverse recursively into the ephemeral directories located in .tmp/ in order to find other Zeek plugins (and nowadays also .hlto files). If any of those directories are deleted while that is going on (a test that succeeded), I'm sure there might be corner cases that aren't considered resulting in crashes.

This doesn't fully align with the output you show, but I'd not read too much into the progress output when possibly intermingled.

EDIT: May also need to consider existence of symlinks.

@vpax
Copy link
Author

vpax commented Sep 7, 2023

Your speculation is on target. It's due to the default btest.cfg generated by zkg including

ZEEK_PLUGIN_PATH=`%(testbase)s/Scripts/get-zeek-env zeek_plugin_path`

If I change that to instead hardwire ZEEK_PLUGIN_PATH to only /usr/local/zeek/lib/zeek/plugins, rather than also the package directory, then the problem goes away.

I really have to wonder about the wisdom of the unlimited directory traversal for locating plugins, this has bitten me in the past, too (when I had saved a previous build directory as a backup and the old version of the plugin wound up being fetched from there).

I can easily fix this point-wise since this particular package doesn't even have any plugins, so no need for the fancier ZEEK_PLUGIN_PATH. What would you suggest, though, as a general way of avoiding this issue?

@ckreibich
Copy link
Member

This is at least partially my fault :-) so let me chime in here. I feel bad because get-zeek-env has long needed an overhaul, and I think much of this can be fixed in package templating. A few thoughts:

  • Indeed a package that has no plugin indeed has no business messing with ZEEK_PLUGIN_PATH. That's easy to fix in the template, though existing packages will need to adopt that change.
  • For packages that do have a plugin, there's no need for get-zeek-env to add the package's toplevel directory to ZEEK_PLUGIN_PATH — the package knows its build directory so can be specific. This too is an easy fix but needs adopting in existing packages/plugins.
  • As for limitless recursion, I'm torn. I think (@rsmmr might know better) the original motivator for this was to abstract away "a few directories", for example by allowing arbitrary subdirectories below zeek-config --plugin_dir to contain plugins, and that strikes me as useful. One idea could be to make ZEEK_PLUGIN_PATH support patterns: /foo/bar could mean "look for a plugin exactly in that folder", /foo/bar/*/*/ could mean "look for plugins exactly two subdirectories deep", and /foo/bar/** (a notation commonly used in CI systems) could capture "anything below". I'm not fully convinced by this, in part because I can't tell whether this would actually exclude surprises like your backup directories, Vern?

@ckreibich
Copy link
Member

I forgot to add: I see no reason for Zeek to traverse into dot directories, though. This would naturally exclude things like .tmp and .git.

@vpax
Copy link
Author

vpax commented Sep 8, 2023

Cool, yes, let's get the template fixed for not generating that btest.cfg line at all if not needed, and making it specific if needed.

I like the idea of changing ZEEK_PLUGIN_PATH to have varying control over just how to search a particular directory. I get it that the flexibility of deeper search can be handy (though your suggestion of skip-dot-directories seems great regardless), but the violation of Principle Of Least Surprise is pretty strong when it goes wrong. It's now twice cost me a whole bunch of time - would be great to harden against that.

@rsmmr
Copy link
Member

rsmmr commented Sep 8, 2023

  • As for limitless recursion, I'm torn. I think (@rsmmr might know better) the original motivator for this was to abstract away "a few directories", for example by allowing arbitrary subdirectories below zeek-config --plugin_dir

I don't really remember much about that choice, but yeah, somewhere in that ballpark; along probably with an early (naive) thinking of "how deep can directories holding plugins really get ...". At this point, I'd be mostly worried about breaking setups if we change how this works (but I agree on the dot directories; and in princinple also on the globs, though not sure if that's really worth the effort in the end).

bbannier added a commit to zeek/zeek that referenced this issue Sep 25, 2023
bbannier added a commit to zeek/zeek that referenced this issue Sep 25, 2023
Dot directories rarely contain anything we would want to load as a
dynamic plugin. Even worse, they likely contain files with externally
controlled lifetimes which might be removed while we are using them
(see e.g., zeek/btest#98).

With this patch we do not search _discovered_ dot directories anymore.
We continue to load from a user-specified `ZEEK_PLUGIN_PATH`, even if
its name starts with a dot.

Since this patch changes previous behavior it is a **BREAKING CHANGE**.

Closes zeek/btest#98.
bbannier added a commit to zeek/zeek that referenced this issue Sep 26, 2023
Dot directories rarely contain anything we would want to load as a
dynamic plugin. Even worse, they likely contain files with externally
controlled lifetimes which might be removed while we are using them
(see e.g., zeek/btest#98).

With this patch we do not search _discovered_ dot directories anymore.
We continue to load from a user-specified `ZEEK_PLUGIN_PATH`, even if
its name starts with a dot.

Since this patch changes previous behavior it is a **BREAKING CHANGE**.
@awelzel
Copy link
Contributor

awelzel commented Sep 27, 2023

Closing after merging @bbannier PR to exclude dot directories from the recursive search.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants