Set up an automated valgrind test #69

adamcrume · 2017-03-10T02:31:53Z

Not usually necessary for Rust, but since we're doing a lot of unsafe stuff with pointers, we should have a valgrind test. Ideally, Travis would run it and fail the build if there are issues.

nbigaouette-eai · 2017-03-14T00:17:47Z

It is always a good idea to run code/tests through valgrind. One could simply run the current tests (without docs) through valgrind, and add more that use yet uncovered code path.

Something interesting too would be to add code coverage to make sure as much as possible is covered.

And then even fuzzing; I've heard good things of cargo fuzz.

nbigaouette-eai · 2017-03-14T00:28:01Z

Note that jemalloc removed valgrind support (see jemalloc/jemalloc@9a8add1) but this does not seems to be included in rust's version of a jemalloc. Rust uses jemalloc @ 33184bf69813087bf1885b0993685f9d03320c69 plus a couple of patches, which is jemalloc 4.1.0 plus four patches.

Enet4 · 2017-03-14T22:19:00Z

Well, I read this recent blog post on using Valgrind for this very purpose, and we are still advised to change the memory allocation system (using the nightly alloc_system feature). Other than that, this is a useful resource to whoever is attempting to resolve this issue.

nbigaouette-eai · 2017-03-15T00:59:19Z

Great read, thanks.

Maybe an optional combination alloc_system+nightly+valgrind could be added. As @adamcrume mentioned in #66, it might not be a good idea to depend on alloc_system and nightly for the crate, but an option to increase the testing is definitely a good idea.

adamcrume · 2017-03-15T01:05:48Z

We might copy the suggestion at rust-lang/rust#28224 (comment), which is to add a nightly feature and add this to the code:

#![cfg_attr(feature="nightly", feature(alloc_system))]
#[cfg(feature="nightly")]
extern crate alloc_system;

Then we can build on the nightly channel with --features=nightly and run valgrind against that.

nbigaouette-eai · 2017-03-20T15:46:36Z

So I've experimented running the tests through valgrind, but it's not as trivial as it looks.

First, as mentioned above, using jemalloc (the default memory allocator in rust) will confuse valgrind. So one has to build the tests using the system allocator, which is only available when using a nightly build of rust. This means Travis would need to be setup to run nightly, possibly in addition to stable rust.

Second (and most importantly), we don't want to run cargo through valgrind, but the tests binaries. This means the command used to run the tests should not be something like valgrind cargo test because valgrind will catch cargo errors, not the bindings' ones. Suppressions could be written but this is a tedious task.

"Normally" people would run valgrind directly on the test binaries, for example valgrind tensorflow_rust.git/target/debug/deps/lib-aab328fe9f627d08. This has two problems:

How to programatically extract the filename? There is a couple of these binaries in the target directory. We'd need to find them and execute them. This complexifies the .travis.yml file.
The binaries are linked dynamically to libtensorflow.so and so must find them at run time. Simply running the binaries will fail with error while loading shared libraries: libtensorflow.so: cannot open shared object file: No such file or directory if LD_LIBRARY_PATH is not set properly.

Cargo will set a working environment for the binaries so it can run them. It also knows exactly what to run. So it's sad that can't directly run cargo valgrind test or something like that.

Instead of using valgrind, we could try out the different sanitizers as provided by llvm. See here for more information: https://github.com/japaric/rust-san
With a nightly rust newer than a month old, the sanitizers are included. Using them is quite easy and should work I think in Travis. On my machine for example:

$ cd tensorflow-sys
$ RUSTFLAGS="-Z sanitizer=leak" cargo +nightly test -vv -j 1 --target x86_64-unknown-linux-gnu -- --nocapture
[...]
=================================================================
==28527==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 4 byte(s) in 1 object(s) allocated from:
    #0 0x55620849f0ea  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x400ea)
    #1 0x55620846d0ac  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe0ac)
    #2 0x55620846d020  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe020)
    #3 0x55620846d1bd  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0xe1bd)
    #4 0x55620847b5be  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x1c5be)
    #5 0x556208470040  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x11040)
    #6 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #7 0x55620846f680  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x10680)
    #8 0x5562084d8aaa  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x79aaa)
    #9 0x5562084762f6  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x172f6)
    #10 0x5562084d0924  (tensorflow_rust.git/target/x86_64-unknown-linux-gnu/debug/deps/lib-b2190e536168b674+0x71924)
    #11 0x7ffbbbee02e6  (/usr/lib/libpthread.so.0+0x72e6)

SUMMARY: LeakSanitizer: 4 byte(s) leaked in 1 allocation(s).
error: test failed

when adding a simple std::mem::forget(Box::new(42)); to a test to force a memory leak.

This is not perfect; the implementation is not bug free nor finished. I'm also not seeing where the allocation was performed (as shown in the link example) but at least it's catching it.

What I think should be done is to first extend the Travis configuration to allow stable/beta/nightly rust. This would allow adding more testing pieces, like sanitizers or valgrind. I'll open a PR about this.

adamcrume · 2017-03-21T02:32:51Z

I'd like to see how a compiler-based sanitizer works when interacting with code it didn't compile, i.e. what happens when we allocate something through libtensorflow.so and then don't release that.

I assume that the offsets in that stack trace could be traced back to a line of code via nm voodoo. Maybe they'll build that into the sanitizer at some point.

This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.

* As part of tensorflow#69 refactor to break views from core dataset manipulation * tensorflow#69 Allow for navigatable data sets that are wide

adamcrume added the enhancement label Mar 10, 2017

adamcrume mentioned this issue Mar 10, 2017

Tests for tensorflow-sys fail #66

Closed

nbigaouette-eai mentioned this issue Mar 15, 2017

WIP: Run tensorflow-sys' test under valgrind #70

Closed

nbigaouette-eai mentioned this issue Mar 20, 2017

Increase Travis coverage #78

Merged

adamcrume mentioned this issue Jun 4, 2017

Session runs leak memory #86

Merged

oxinabox mentioned this issue Jun 13, 2017

Memory Leak malmaud/TensorFlow.jl#255

Closed

adamcrume added a commit to adamcrume/tensorflow-rust that referenced this issue Jun 17, 2017

Add run-valgrind

22fd01c

This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.

adamcrume added a commit to adamcrume/tensorflow-rust that referenced this issue Jun 17, 2017

Add run-valgrind

ddb684e

This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.

adamcrume mentioned this issue Jun 17, 2017

Add run-valgrind #89

Merged

adamcrume added a commit to adamcrume/tensorflow-rust that referenced this issue Jul 29, 2017

Add run-valgrind

030860a

This partially addresses tensorflow#69, but it requires us to build TensorFlow from source, which is too slow to do on Travis.

nixpulvis mentioned this issue Nov 22, 2019

High amount of small allocations during alt-screen-randow-write compared to the older version alacritty/alacritty#3002

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up an automated valgrind test #69

Set up an automated valgrind test #69

adamcrume commented Mar 10, 2017

nbigaouette-eai commented Mar 14, 2017

nbigaouette-eai commented Mar 14, 2017

Enet4 commented Mar 14, 2017

nbigaouette-eai commented Mar 15, 2017

adamcrume commented Mar 15, 2017

nbigaouette-eai commented Mar 20, 2017

adamcrume commented Mar 21, 2017

Set up an automated valgrind test #69

Set up an automated valgrind test #69

Comments

adamcrume commented Mar 10, 2017

nbigaouette-eai commented Mar 14, 2017

nbigaouette-eai commented Mar 14, 2017

Enet4 commented Mar 14, 2017

nbigaouette-eai commented Mar 15, 2017

adamcrume commented Mar 15, 2017

nbigaouette-eai commented Mar 20, 2017

adamcrume commented Mar 21, 2017