Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Instrumentation address clash errors and segfaults #190

Closed
thedavidmeister opened this issue Jan 4, 2019 · 83 comments
Closed

Instrumentation address clash errors and segfaults #190

thedavidmeister opened this issue Jan 4, 2019 · 83 comments
Labels
bug Instrumentation Issues relating to ptrace and parsing of DWARF tables

Comments

@thedavidmeister
Copy link

thedavidmeister commented Jan 4, 2019

as seen in failing builds https://circleci.com/gh/holochain/holochain-rust/1514

using 0.6.11 and nightly-2018-12-26

i've tried various flags passed to tarpaulin, but builds always seem to fail with this error

also seeing Error a segfault occured when executing test

potentially related #35 as we are using threads

@xd009642
Copy link
Owner

xd009642 commented Jan 4, 2019

So the instrumentation clash warnings may not be errors as such, a multiple lines of code may map to the same addresses and that just warns on occurrences of that which may affect the accuracy of results. And #35 has been partially mitigated by making --no-count the default so it disables the breakpoint on a line after that line is hit.

I'm cloning the repo and having a look into what's happening and will report back if I find anything/solve it.

@Cogitri
Copy link

Cogitri commented Jan 4, 2019

The segfault also happens to me and here I thought my CI was going insane. See https://drone.exqa.de/Cogitri/tmplgen/89/3/3

@xd009642
Copy link
Owner

xd009642 commented Jan 4, 2019

@thedavidmeister I haven't managed to run your tests yet getting a compile failure I need to look into.
@Cogitri there appears to be a timeout on your test test_bad_env I wonder if this might be leaving things in an odd state that causes the segfault in test tests::test_crate_or_licensing. I'll need to look more into it

@thedavidmeister
Copy link
Author

hi, a bit more info based on my testing

there are 3 issues that started popping up:

  • the instrumentation address clash complaints
  • segfaults
  • threaded tests failing in tarpaulin that pass in regular cargo test

instrumentation clash could be fixed by disabling thinlto

segfaults could be fixed by disabling incremental compilation

still not sure about the different threading behaviour yet

@thedavidmeister
Copy link
Author

ah no, i'm wrong, segfaults still happening :(

@thedavidmeister
Copy link
Author

looks like the threading issues are partly our fault

it's trying to unwrap a None of something stateful that needs to be initialised in our setup

i have no idea why that would work fine in regular cargo test but fail with tarpaulin

oh, and it worked with tarpaulin about a month ago because we ran all our CI through it

so something changed, but also we're handling that change ungracefully at our end

@thedavidmeister
Copy link
Author

thedavidmeister commented Jan 5, 2019

ok, i have a plan to fix our side of things

just need a hand with the segfaults... i'm not even sure where/how to debug those, they appear in different places on CI vs. local for example

@thedavidmeister thedavidmeister changed the title Instrumentation address clash errors Instrumentation address clash errors and segfaults Jan 5, 2019
@euclio
Copy link

euclio commented Jan 12, 2019

I'm trying to add cargo-tarpaulin to the xi-editor project and I also received a segfault: https://travis-ci.com/xi-editor/xi-editor/jobs/169995236.

PR here: xi-editor/xi-editor#1086

@xd009642
Copy link
Owner

So I looked at tmplgen - being the smaller and simpler project and the issue goes away when I replace rayon's par_iter with iter making me think this might be another threading based issue. This may be a fiddly one with a fair bit of investigation so I'll keep you all posted.

Also knowing these sort of issues there is always the chance it may disappear after another nightly upgrade...

@xd009642 xd009642 added bug Instrumentation Issues relating to ptrace and parsing of DWARF tables labels Jan 14, 2019
@Cogitri
Copy link

Cogitri commented Jan 14, 2019

I replace rayon's par_iter with iter making me think this might be another threading based issue

Ah, yes, makes sense timing wise. Thanks for figuring that out!

@brunocodutra
Copy link

Thanks for looking into this @xd009642! In case you're looking for a minimal repro, the snippet below seems to be enough to cause tarpaulin to segfault.

#![feature(async_await, await_macro, futures_api)]

#[test]
pub fn test() {
    futures::executor::ThreadPool::new();
}

@xd009642
Copy link
Owner

@brunocodutra is that using futures-preview 0.3.0-alpha.12? Because I tried that minimal example and experienced no segfault on my system with the latest nightly.

@brunocodutra
Copy link

0.3.0-alpha.12 yes.

Now that's interesting, this definitely triggers a segfault if I add it to reducer, in fact I reduced it from a segfault I first observed in brunocodutra/reducer#11. However I can indeed confirm it doesn't appear to cause any issues on an otherwise empty project.

I'll try a ground up approach this time, pulling in bits from reducer until I can reproduce it.

@brunocodutra
Copy link

brunocodutra commented Jan 17, 2019

@xd009642 I think I got it, the issue seems to only manifests itself if there are at least two separate test cases spawning threads, so the snippet bellow triggers the segfault for me on an empty project, not on every execution, but very frequently.

#![feature(async_await, await_macro, futures_api)]

#[test]
pub fn a() {
    futures::executor::ThreadPool::new();
}

#[test]
pub fn b() {
    futures::executor::ThreadPool::new();
}

EDIT: the issue also manifests itself by spawning std::thread::Threads directly, albeit less frequently, so I assume the number of threads spawned is an important factor.

@elinorbgr
Copy link
Contributor

elinorbgr commented Jan 18, 2019

For some more context, using the currently 0.7.0 release, I have some of my test suite that segfaults when run on travis, but not on my local computer.

both travis and my computer running rustc 1.33.0-nightly (daa53a52a 2019-01-17).

The test file causing segfaults on travis contains spawning a thread with std::thread::spawn().

@euclio
Copy link

euclio commented Jan 18, 2019

Same here. I only see the segfaults on travis.

elinorbgr added a commit to Smithay/wayland-rs that referenced this issue Jan 28, 2019
Spawning threads in tests hits a tarpaulin bug preventing collection
of coverage, so disable them for now.

cf xd009642/tarpaulin#190
@elinorbgr
Copy link
Contributor

I tried to make some bissecting on trapaulin to see if this was a recently introduces issue, all that I can say is that version 0.6.8 already triggers a segfault on @brunocodutra 's example. (I couldn't try earlier versions because I can't simply upgrade them to cargo 0.32 which is required to compile the example using futures-preview).

All this was done on rustc 1.33.0-nightly (01f8e25b1 2019-01-24), but tarpaulin 0.7.0 also displays the issue when compiled on stable rustc 1.32.0 (9fda7c223 2019-01-16).

Cogitri pushed a commit to Cogitri/tmplgen that referenced this issue Feb 2, 2019
@xd009642
Copy link
Owner

xd009642 commented Oct 3, 2019

New version 0.9.0 is being released as well speak and docker image will be updated as part of that process once the really long travis build finishes!

It's been "fun" but it's now time to close this issue for good!

@xd009642 xd009642 closed this as completed Oct 3, 2019
@elinorbgr
Copy link
Contributor

I have some bad news : tarpaulin 0.9 segfaults on several of wayland-rs tests, including several test files that do not do any multithreading. I'll try to make some reasonably small example.

@elinorbgr
Copy link
Contributor

elinorbgr commented Oct 4, 2019

Here is a repro:

use std::sync::Arc;

fn do_it() {
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
    let _a = Arc::new(0);
}

#[test]
fn test_1() {
    do_it()
}

#[test]
fn test_2() {
    do_it()
}

Running tarpaulin 0.9 on this regularly (30-40% of the time) results in:

Error: "Failed to get test coverage!
Error: Failed to run tests:
Error running test - SIGILL raised in 40531"

And sometimes also results on what appears to be a deadlock, or even the usual:

Error: "Failed to get test coverage!
Error: Failed to run tests: A segfault occurred while executing tests"

Increasing the number of Arc::new(...) in the body of the function seems to increase the chances of a segfault or SIGILL occurring.

@clux
Copy link

clux commented Oct 4, 2019

While there might be more issues with this ^, I just wanted to say thanks a lot for all the hard work on this release. This fixed all the issues I had with tarpaulin on a mid-sized project that had heavy use of rayon. It still crashes with --test-threads 1, but removing that flag actually worked 👍

@xd009642 xd009642 reopened this Oct 5, 2019
@hntd187
Copy link

hntd187 commented Oct 5, 2019

Although this is being reopened, my build now works beautifully with the new change and my project makes some pretty extensive use of the futures ecosystem. Amazing work

https://dev.azure.com/toshi-search/toshi-search/_build/results?buildId=354

@xd009642
Copy link
Owner

xd009642 commented Oct 5, 2019

@vberger I'm getting much lower incidence rates for the failure (2/1000 attempts failed) and I've got 25 calls to Arc::new and 4 tests calling do_it. There's a chance this one may be trickier

@elinorbgr
Copy link
Contributor

Indeed, the reproducibility seems to depend on the computer running the test, on my other computer it is much harder to reproduce.

Still, I managed to reproduce it a few times with the do_it() function being simply

fn do_it() {
    for _ in 0..10000 {
        let _a = vec![0];
    }
}

and running it 8 times in parallel.

Given that now there is nothing especially multithreaded in these tests, I strongly suspect that tarpaulin somehow interferes with the behavior of the allocator. This is possibly the same issue cause as #264 ?

@jonhoo
Copy link
Sponsor

jonhoo commented Oct 14, 2019

Hmm, this is interesting. In evmap, I now get:

cargo tarpaulin --features "" --out Xml
========================== Starting Command Output ===========================
/bin/bash --noprofile --norc /__w/_temp/57325ed4-7e46-4e9b-b76a-e249415e0eef.sh
[INFO tarpaulin] Running Tarpaulin
[INFO tarpaulin] Building project
[INFO tarpaulin] Launching test
[INFO tarpaulin] running /__w/1/s/target/debug/deps/lib-94fc5ea0bf28285e

running 23 tests
Error: "Failed to get test coverage! Error: Failed to run tests: Error running test - SIGILL raised in 369"

So no segfault any more, but an illegal instruction. That seems even weirder!

@jonhoo
Copy link
Sponsor

jonhoo commented Oct 14, 2019

Inferno still seems to hit the segfaul though. This is all running with the latest Docker image.

@xd009642
Copy link
Owner

So there's a chance the SIGILL and SIGSEGV might be the same issue just manifesting in slightly different ways. I've got an experiment currently running to see if I've made any progress or not. If I have I'll push the branch and ask people here to test it out on their own projects 👀

@xd009642
Copy link
Owner

xd009642 commented Oct 28, 2019

Ok so ran on rust-evmap as @jonhoo mentioned it previously. With the change in the branch affinity did 100 runs with no SIGILL or SIGSEGV. Running on the current latest tarpaulin on crates.io and I can already see some failures before finishing the 100 runs.

So yeah if anyone wants to try it out and report back that would be appreciated!

Edit results for the latest release. Failed 11/100 times!

@xd009642
Copy link
Owner

xd009642 commented Oct 29, 2019

Another 100 runs on evmap and no failures. Now trying inferno (will edit with some results however far it gets before I shutdown).

EDIT: There appear to be failing tests on inferno. Maybe something I'm missing dependency wise? This kinda messed up my failure rate script but when I did some runs without suppressing the output I didn't see any segfaults... @jonhoo you mind trying it out and letting me know? Otherwise when it gets to the weekend I'll merge as it doesn't have a detrimental effect.

@jonhoo
Copy link
Sponsor

jonhoo commented Oct 30, 2019

@xd009642 I'm away at a conference, so may be hard for me to find the time, but my guess is that the test failures is due to git submodules. Just run git submodule update --init and the tests should work! Also, awesome work on this. I'm so excited to see tarpaulin working again :D

@xd009642
Copy link
Owner

I'll try that tonight. And have fun at the conference 👍

@xd009642
Copy link
Owner

xd009642 commented Oct 30, 2019

Yeah that was it. And I'm doing a run of 100. With latest master tarpaulin it segfaulted every time I tried. So far this fix has worked every time (but that's just 3 times). I'll let it get through all 100 runs but if it passes that I'm merging and might do a release tonight 👀

Edit: Release tonight is coming, >1000 runs not a single segfault or sigill!

@jonhoo
Copy link
Sponsor

jonhoo commented Oct 31, 2019

It appears to be working! Thank you!

@xd009642
Copy link
Owner

xd009642 commented Nov 1, 2019

Nice! So as this issue ended up resolving two different issues I'm closing it and if there's another segfault or sigill a new issue should be opened 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Instrumentation Issues relating to ptrace and parsing of DWARF tables
Projects
None yet
Development

No branches or pull requests