Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--follow-exec causes difference in execution #966

Open
boustrophedon opened this issue Feb 28, 2022 · 28 comments
Open

--follow-exec causes difference in execution #966

boustrophedon opened this issue Feb 28, 2022 · 28 comments
Assignees
Labels
bug Instrumentation Issues relating to ptrace and parsing of DWARF tables

Comments

@boustrophedon
Copy link

Describe the bug
In some cases it appears using --follow-exec can cause test code to not run somehow.

To Reproduce
https://github.com/boustrophedon/tarpaulin_missing_coverage/commits/master

The original problem I'm trying to solve is the following:
I have tests in tests/ and examples in examples/ and I want to get combined coverage for all of them. If I just run cargo tarpaulin --tests --examples, the examples don't execute because the test runner is used to look for tests rather than execute the examples (related?). I've worked around this by adding a small test in each example that just calls main.

However, one of my example programs calls itself (which, as above, is actually the test runner process) as a subprocess and although it appears the processes are executing (e.g. if I intentionally throw a panic in one, the panic shows up), tarpaulin isn't catching that.

While investigating this I've found a minimal example that shows just by adding and removing the --follow-exec flag that the subprocess code isn't being called somehow.

Expected behavior
The subprocess code should run, the file should be written to, and the CI step "Check file was actually created" should pass.

fails:
boustrophedon/tarpaulin_missing_coverage@9d91fb2
succeeds:
boustrophedon/tarpaulin_missing_coverage@4e92bfe

@xd009642
Copy link
Owner

so for the first part, cargo tarpaulin --examples is equivalent to cargo test --examples which is because of the issue you linked. If you want to avoid creating example tests you can run the examples directly with cargo tarpaulin --command Build --examples and use a config file to combine that with other test types. Something as follows should work:

[examples]
command = "Build"
run-types = ["Examples"]

[tests]
run-types =["tests"] # Just put the others here - maybe an empty section will work but I haven't tried

I have a feeling this may be related to some things I've been seeing with #953 so will try the examples and see if i can puzzle it out

@boustrophedon
Copy link
Author

Thanks for the tip on using the run-types = Examples in a config file!

Let me know if there's anything I can do to help with understanding the weird behavior of --follow-exec.

@xd009642
Copy link
Owner

xd009642 commented Mar 2, 2022

So I tried this out and it fixes your example #962 if you want to try it on your own project 👀

@boustrophedon
Copy link
Author

https://github.com/boustrophedon/tarpaulin_missing_coverage/runs/5400464385

Here's what I'm getting when using the issue/process-kill-953 branch


running 1 test
test call_main ... 
running 1 test
test call_main ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Error: "Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response"
Mar 03 00:43:07.529 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error: Timed out waiting for test response
Error: Process completed with exit code 1.

@boustrophedon
Copy link
Author

Oh, I should change the code to remove the test harness and see if that does anything though.

@boustrophedon
Copy link
Author

@xd009642
Copy link
Owner

xd009642 commented Mar 3, 2022

Interesting, I tried it on 4e92bfe238d5391 and got 100% coverage 🤔

@xd009642
Copy link
Owner

xd009642 commented Mar 3, 2022

Oh it only times out using the config for me so must be another issue in the config stuff 👀 fun

@xd009642
Copy link
Owner

xd009642 commented Mar 3, 2022

It was me being a dummy, follow-exec wasn't aliased in serde so it expected follow_exec, I've added the alias in now though, it does time out without follow-exec though...

@xd009642
Copy link
Owner

xd009642 commented Mar 3, 2022

Okay, If you try now it works with and without the config file! Finally satisfied I've laid this one to rest. And I think this may have done enough to unblock me on #953 so thanks 😁

@boustrophedon
Copy link
Author

@xd009642
Copy link
Owner

xd009642 commented Mar 3, 2022

Brilliant, I'll be tweaking the branch a bit more to try and fix the issue it was originally there to fix. But once it's merged in I'll cut a new release 👍

@boustrophedon
Copy link
Author

Awesome. Should I close this issue or do you want to do it when you do the release?

@xd009642
Copy link
Owner

xd009642 commented Mar 3, 2022

I'll do it when the PR's merged, that way you'll know the release is coming imminently and can switch your CI to use the latest release 👍

@xd009642
Copy link
Owner

xd009642 commented Mar 6, 2022

Just a reminder in case I don't solve this change in behaviour, it may become necessary to add an explicit wait() to that example test case to stop tarpaulin continuing on the exec killing the test process and then not capturing the exec'd coverage. But I should be able to solve that issue as well.

It's just the follow-exec issues recently are churning up a lot of the behaviours 😅 . Hopefully, it will be a lot simpler (and faster to execute) after this though

@xd009642
Copy link
Owner

Release 0.20.0 will be out once CI finishes with the fix for this issue 👍

@boustrophedon
Copy link
Author

@xd009642
Copy link
Owner

A quick check are you setting --test-threads in any way? I found there was a weird spurious one until I set it to 1 and then it disappeared completely.

@boustrophedon
Copy link
Author

It's just calling cargo tarpaulin: https://github.com/boustrophedon/extrasafe/blob/ipc_coverage-rebase/.github/workflows/build-test.yaml#L60

I wonder if the examples are also being run with the equivalent of test-threads 1? https://github.com/boustrophedon/extrasafe/blob/ipc_coverage-rebase/.tarpaulin.toml#L3

@boustrophedon
Copy link
Author

Actually it looks like this failure is probably on my side. Out of curiosity, is tarpaulin using signals in some way to catch syscalls or something?

@xd009642
Copy link
Owner

Not explicitly, ptrace does catch all the signals, but tarpaulin will forward ones that it doesn't think it has use for back to the test i.e. SIGCHLD to identify a spawned processed has finished so that spawned commands can be waited on

@xd009642 xd009642 reopened this Mar 22, 2022
@boustrophedon
Copy link
Author

So unfortunately what's happening is that test-threads=1 actually just breaks all my tests. This is because my library is a wrapper around seccomp, which allows you to tell the kernel to deny the usage of syscalls of your choosing, for security reasons. Seccomp filters are applied to the current thread (and are inherited by child threads which isn't relevant here), so when you run two tests with different filters, the intersection of the two filters happens.

In particular, what's happening is that one test is only allowing filesystem operations, another one is allowing only network operations, and so when they run sequentially on the same thread, the end result is that neither is allowed, and the test fails.

When test-threads=1, the rust test runner says "we don't need to spawn new threads, just use the current one" (see here and run_test/run_test_inner later in the same file).

@boustrophedon
Copy link
Author

With RUST_TEST_THREADS=2 and just running the tests, it doesn't have the issue.

However with RUST_TEST_THREADS=2 and running the examples as well, it segfaults on the subprocess example.

@xd009642
Copy link
Owner

Okay looks like for you tests I have to fix that test thread > 1 segfault that happens like 4% of the time on my machine and >99% of the time on CI 😢.

I did find the segfault only happened on nightly not stable so maybe running on stable for coverage could be a stop gap solution?

@boustrophedon
Copy link
Author

I'm not selecting nightly anywhere - does tarpaulin select nightly internally somewhere? Per the github CI documentation it's running rust 1.59.

I'll try adding an explicit .wait() at the subprocess call and see if that fixes it.

@xd009642
Copy link
Owner

No it doesn't, I just saw the segfault in CI only on nightly for my own tests, maybe yours just exercises the issue stronger so it appears on stable 🤔

@boustrophedon
Copy link
Author

Actually I just checked and I have explicit kill calls.

What's happening is that I have:

  • The main process
  • A "db server" process
  • A "web server" process
    and then two client processes that make network calls to the webserver

First I start the db server in a subprocess (without wait ing since it's a server), and per the CI log the db process gets to the point where it's waiting

Then we sleep for 100 ms (which could be what's making the issue occur every time) so that we're sure the server's ready.

Then we try to start the webserver but we never actually get there because the first line of it is a println that doesn't show up.

So either the sleep is causing the issue or the second subprocess call itself. In particular, the subprocess call is calling the same executable as the first, /proc/self/exe, just with different arguments.

@orium
Copy link

orium commented Jun 14, 2022

I have the same problem in orium/cargo-rdme:

$ cargo tarpaulin --follow-exec
⋮
Jun 14 10:57:52.890  INFO cargo_tarpaulin::process_handling::linux: Launching test
Jun 14 10:57:52.890  INFO cargo_tarpaulin::process_handling: running /home/orium/programming/projects/cargo-rdme/target/debug/deps/tests-e4918e5f44a98741

running 44 tests
test system_test_avoid_overwrite_uncommitted_readme ... ok
test system_test_custom_lib_path ... Jun 14 10:57:57.976 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests"

Setting the number of threads to 1 doesn't seem to make a difference.

This happens with tarpaulin 0.20.1.

@orium orium mentioned this issue Oct 11, 2022
29 tasks
@xd009642 xd009642 added bug Instrumentation Issues relating to ptrace and parsing of DWARF tables labels Jan 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Instrumentation Issues relating to ptrace and parsing of DWARF tables
Projects
None yet
Development

No branches or pull requests

3 participants