Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tarpaulin v0.17.0 sometimes errors with a message about a segfault when trying to build against musl #618

Closed
cptpcrd opened this issue Nov 11, 2020 · 18 comments

Comments

@cptpcrd
Copy link

cptpcrd commented Nov 11, 2020

Steps to reproduce

I've managed to work my code down to this minimal example:

(Clarification edit: As the title indicates, I have only encountered this when trying to build/test against musl libc; the same programs build fine when targetting glibc.)

rustup target install x86_64-unknown-linux-musl  # Or however you need to do it for your setup

cd /tmp
cargo new test-tarpaulin
cd test-tarpaulin

cat <<EOF >src/main.rs
fn a()  {
    if unsafe { libc::getpid() } < 0 {
        panic!("{}", std::io::Error::last_os_error());
    }
}
EOF
echo 'libc = "0.2"' >>Cargo.toml

cargo tarpaulin --target=x86_64-unknown-linux-musl --verbose

What should happen

tarpaulin builds the target and runs successfully. (It worked with tarpaulin v0.16.0.)

What actually happens

tarpaulin fails with a message about a segfault.

Nov 11 16:45:29.104 DEBUG cargo_tarpaulin: set up logging
Nov 11 16:45:29.104  INFO cargo_tarpaulin::config: Creating config
Nov 11 16:45:29.118  INFO cargo_tarpaulin: Running Tarpaulin
Nov 11 16:45:29.118  INFO cargo_tarpaulin: Building project
   Compiling bitflags v1.2.1
   Compiling libc v0.2.80
   Compiling test-tarpaulin v0.1.0 (/tmp/test-tarpaulin)
    Finished test [unoptimized + debuginfo] target(s) in 1.08s
Nov 11 16:45:30.257  INFO cargo_tarpaulin: Launching test
Nov 11 16:45:30.257  INFO cargo_tarpaulin: running /tmp/test-tarpaulin/target/x86_64-unknown-linux-musl/debug/deps/test_tarpaulin-86c594765747da0f
Nov 11 16:45:30.534 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests

I'm not familiar with tarpaulin's internals (not by a long shot :-), so I can't debug this any further.

Environment

I've been able to reproduce this on Arch Linux, Void Linux, and Ubuntu (all three had rustc 1.47.0 and were running either the latest kernel from the repos or only a few revisions behind).

Potential cause

It looks like this regression was caused (at least, most directly) by #613. RUSTFLAGS='-C relocation-model=dynamic-no-pic' cargo tarpaulin --target=x86_64-unknown-linux-musl --verbose builds with no problems.

@cptpcrd
Copy link
Author

cptpcrd commented Nov 12, 2020

Note: The bug referenced in ZcashFoundation/zebra#1283 is slightly different (but possibly related). It occurs when building against glibc, and the error message is different.

dconnolly added a commit to ZcashFoundation/zebra that referenced this issue Nov 12, 2020
@xiye520
Copy link

xiye520 commented Nov 12, 2020

I also have the same problem. In the previous version(0.16.0), I used the following commands to successfully execute and generate unit test coverage:

 cargo tarpaulin -v

but after upgrading to v0.17.0, the following errors began to appear:

Nov 12 16:41:55.984  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 16957
Nov 12 16:41:55.984  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 16954
Nov 12 16:41:55.984  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 16954
Nov 12 16:41:55.984 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests"

Can the cargo tarpaulin tool specify a certain version to install?

@xd009642
Copy link
Owner

It's interesting that the relocation model needs to be set explicitly I'll try some stuff with musl and see if I can get to the bottom of it... Generally tarpaulin adding linker flags tends to make issues for some projects so it's interesting to see the inverse 😬

@cptpcrd and @xiye518 are your tests launching external processes? I added functionality to follow exec events down if the binary was part of the project so it's likely a problem with that which wasn't caught in testing. I'll have a look at these issues tonight!

@xiye518 cargo install --version 0.16.0 cargo-tarpaulin

@MitMaro
Copy link
Sponsor

MitMaro commented Nov 12, 2020

I've also been seeing similar issues for a couple of days now and have been trying to reduce the segfault to a minimal reproducible example but the segfault seems to "move" when tests are recompiled. I am using glibc.

$ cargo +nightly tarpaulin --verbose --output-dir coverage -- config
Nov 12 09:28:09.195 DEBUG cargo_tarpaulin: set up logging
Nov 12 09:28:09.195  INFO cargo_tarpaulin::config: Creating config
Nov 12 09:28:09.247  INFO cargo_tarpaulin: Running Tarpaulin
Nov 12 09:28:09.247  INFO cargo_tarpaulin: Building project
   Compiling git-interactive-rebase-tool v1.2.1 (/home/mitmaro/code/git-interactive-tool)
    Finished test [unoptimized + debuginfo] target(s) in 6.01s
Nov 12 09:28:15.546  INFO cargo_tarpaulin: Launching test
Nov 12 09:28:15.546  INFO cargo_tarpaulin: running /home/mitmaro/code/git-interactive-tool/target/debug/deps/interactive_rebase_tool-17c651823e96ab64
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x1879d0
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187a10
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b10
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b50
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b90
Nov 12 09:28:21.346 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187bd0
Nov 12 09:28:21.351 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x28a730
Nov 12 09:28:21.363 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x27b4c0

running 167 tests
test config::tests::config_diff_tab_invalid_range ... ok
test config::tests::config_diff_tab_invalid ... ok
test config::tests::config_diff_tab_symbol_invalid_utf8 ... ok
test config::tests::config_diff_space_symbol_invalid_utf8 ... ok
Nov 12 09:28:21.552  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994321
Nov 12 09:28:21.552  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994321
Nov 12 09:28:21.554  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994321
Nov 12 09:28:21.554  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994321
Nov 12 09:28:21.558 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests
Error: "Failed to get test coverage! Error: Failed to run tests: A segfault occurred while executing tests"

I've also seen a SIGILL:

cargo +nightly tarpaulin --verbose --output-dir coverage -- config
Nov 12 09:28:33.491 DEBUG cargo_tarpaulin: set up logging
Nov 12 09:28:33.491  INFO cargo_tarpaulin::config: Creating config
Nov 12 09:28:33.532  INFO cargo_tarpaulin: Running Tarpaulin
Nov 12 09:28:33.532  INFO cargo_tarpaulin: Building project
    Finished test [unoptimized + debuginfo] target(s) in 0.03s
Nov 12 09:28:33.822  INFO cargo_tarpaulin: Launching test
Nov 12 09:28:33.822  INFO cargo_tarpaulin: running /home/mitmaro/code/git-interactive-tool/target/debug/deps/interactive_rebase_tool-17c651823e96ab64
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x1879d0
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187a10
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b10
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b50
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187b90
Nov 12 09:28:39.723 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x187bd0
Nov 12 09:28:39.728 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x28a730
Nov 12 09:28:39.744 DEBUG cargo_tarpaulin::statemachine::linux: Instrumentation address clash, ignoring 0x27b4c0

running 167 tests
test config::tests::config_diff_tab_invalid ... ok
test config::tests::config_diff_tab_invalid_range ... ok
test config::tests::config_diff_tab_symbol_invalid_utf8 ... ok
test config::tests::config_diff_space_symbol_invalid_utf8 ... ok
Nov 12 09:28:39.933  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994486
Nov 12 09:28:39.933  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994486
Nov 12 09:28:39.935  WARN cargo_tarpaulin::statemachine::linux: Failed to find process for pid: 1994486
Nov 12 09:28:39.935  WARN cargo_tarpaulin::statemachine::linux: Failed to find traces for pid: 1994486
Nov 12 09:28:39.940 ERROR cargo_tarpaulin: Failed to get test coverage! Error: Failed to run tests: Error running test - SIGILL raised in 1994486
Error: "Failed to get test coverage! Error: Failed to run tests: Error running test - SIGILL raised in 1994486"

Project: https://github.com/MitMaro/git-interactive-rebase-tool

Failing coverage run in GitHub Actions: https://github.com/MitMaro/git-interactive-rebase-tool/runs/1390537880?check_suite_focus=true

@cptpcrd
Copy link
Author

cptpcrd commented Nov 12, 2020

@cptpcrd and @xiye518 are your tests launching external processes? I added functionality to follow exec events down if the binary was part of the project so it's likely a problem with that which wasn't caught in testing. I'll have a look at these issues tonight!

The reproducible example I added doesn't have any tests. When linking against musl instead of glibc, the mere presence of a function that does particular things is enough to make it segfault.

@MitMaro and @xiye518 seem to have encountered slightly different versions of this bug.

@xd009642
Copy link
Owner

I think there are two bugs here, I musl seems to cause very large address offsets to be reported which I think is leading to mis-instrumentation.

For @MitMaro and @xiye518 I cant map the thread id to a parent pid with the new exec aware tracing. If I stub it so if it fails it's always the root test id it seems to always work... But that may then break tests that trace execs. If I can't find a better solution by the end of the week I'll release a patch doing that, after all having a new feature break some projects that use it is better than breaking existing projects.

@MitMaro
Copy link
Sponsor

MitMaro commented Nov 13, 2020

@xd009642 , I wish I knew more about how this project works to help out. If there is anything that I can do to provide help, please don't hesitate to ping me.

Also, thanks for the amazing project!

@xd009642
Copy link
Owner

Okay I've figured out the musl bug, I just need to implement it. In the process memory map it adds vvar and vDSO before the executable which is I guess the memory map info for musl itself. glibc the process is always first which I thought should hold for everything.

I'll fix this and also add a test for musl binaries to avoid regressions.

After sleeping on it I think I have a solution for the other problem I just need to do some experimentation 😄

This was referenced Nov 13, 2020
@xd009642
Copy link
Owner

@cptpcrd your issue should be fixed in develop if you want to try that.

@MitMaro @xiye518 I have a possible solution in the branch fix_unhandled_tids, it works for git-interactive-rebase but it seems to timeout in zebra still (although still performs better). So still investigating it

@MitMaro
Copy link
Sponsor

MitMaro commented Nov 14, 2020

@xd009642 , just gave the branch and try and I can confirm that it works. Thanks for the quick update! Hopefully, the timeout issue isn't too difficult. :)

@xiye520
Copy link

xiye520 commented Nov 14, 2020

tarpaulin

cargo install --version 0.16.0 cargo-tarpaulin

The command you gave helped me successfully. I have now reduced my test environment version back to 0.16.0 and can now successfully generate coverage information; thank you for this amazing project!

I will continue to pay attention to this project, and when this bug is fixed, I will experience the new version as soon as possible.

@djeedai
Copy link

djeedai commented Nov 14, 2020

Possibly related to #463 which also appears to be a regression in 0.17.0, and also exhibits similar symptoms.

@xd009642
Copy link
Owner

I've just yanked 0.17.0, I figured I'd spend a bit longer on the issues and didn't want to keep peoples CI broken in the meantime

@xd009642
Copy link
Owner

So I have a potential fix for everything on the fix_unhandled_tids branch. I'll test it on zcash tomorrow, it just started to pass my minimal repo at midnight here so I didn't want to start running another test that takes a longer time

@Ch00k
Copy link
Contributor

Ch00k commented Dec 1, 2020

I really hate to be annoying, but is there any progress on this issue? Is there any help needed (testing etc.)? I would really like to be able to see coverage for a project with integration tests only, run against the binary (which was implemented in the yanked 0.17.0, #107).

@xd009642
Copy link
Owner

xd009642 commented Dec 1, 2020

so I've merged a partial fix to the develop branch which adds a --follow-exec flag to the CLI options as I couldn't get it fully working with zebra. I am currently abroad on a business trip so progress has stalled slightly as a result. One way to help is to use tarpaulin from develop with that option and see that it all works for you.

I've got one bug that's also in 0.16.0 to fix that I just need to do a test for and then if follow-exec works for enough people I'll release a new version. I just need to find the time alongside my day job and a bunch of end of year deadlines...

@Ch00k
Copy link
Contributor

Ch00k commented Dec 2, 2020

The --follow-exec seems to work for me. I'll stick to installing from develop for the time being. Thanks so much! 👍

@xd009642
Copy link
Owner

Closing this as only the zebra issue remains which seems to be separate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants