Using tensorflow.contrib with cv_bridge causes tcmalloc error #8146

ethanabrooks · 2017-03-06T23:43:14Z

NOTE: Only file GitHub issues for bugs and feature requests. All other topics will be closed.

For general support from the community, see StackOverflow.
To make bugs and feature requests more easy to find and organize, we close issues that are deemed
out of scope for GitHub Issues and point people to StackOverflow.

For bugs or installation issues, please provide the following information.
The more information you provide, the more easily we will be able to offer
help and advice.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

None.

Environment info

Operating System:

❯ uname -a 
Linux dos 3.13.0-76-generic #120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Installed version of CUDA and cuDNN:
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

❯ ls -l /path/to/cuda/lib/libcud*
ls: cannot access /path/to/cuda/lib/libcud*: No such file or directory

If installed from binary pip package, provide:

A link to the pip package you installed:
The output from python -c "import tensorflow; print(tensorflow.__version__)".

❯ python -c "import tensorflow; print(tensorflow.__version__)"
1.0.0

If installed from source, provide

The commit hash (git rev-parse HEAD)
The output of bazel version

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

import tensorflow.contrib
import cv_bridge

import rospy
rospy.init_node('node')

This throws the following error:

/usr/bin/python2.7 /home/ethan/.PyCharmCE2016.3/config/scratches/scratch_4.py
src/tcmalloc.cc:277] Attempt to free invalid pointer 0xa2e78616d5f7475 

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

I'll also post to stackoverflow and to the cv_bridge page (ros-perception/vision_opencv#161).

What other attempted solutions have you tried?

I tried reinstalling ros and tensorflow. No change. I also tried print(cv_bridge.__file__) to make sure I was importing the right directory for cv_bridge.

Logs or other output that would be helpful

(If logs are large, please upload as attachment or provide link).

The text was updated successfully, but these errors were encountered:

prb12 · 2017-03-07T19:23:24Z

@jhseu Could you please comment on whether recent jemalloc/tcmalloc changes might affect this?

jhseu · 2017-03-07T19:32:56Z

It's unrelated to jemalloc. My guess is it's an issue with your usage of tcmalloc.

That error would happen if you call tcmalloc's malloc() and try to free with glibc malloc() and vice-versa. Disable tcmalloc?

ethanabrooks · 2017-03-07T19:44:56Z

@jhseu , I'm not exactly sure how to disable tcmalloc. I assume it's getting called either from tensorflow or cv_bridge, so would the best way be to find the actual tcmalloc function call and change it to malloc?

jhseu · 2017-03-07T19:51:55Z

It's definitely not in TensorFlow. We don't use tcmalloc anywhere.

So it's either coming from your environment or being used by cv_bridge. You can track it down through gdb python and run /path/to/your/script.py.

ethanabrooks · 2017-03-07T20:34:17Z

This was the output:

(gdb) run test.py
Starting program: /usr/bin/python test.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff2eda700 (LWP 5777)]
[New Thread 0x7ffff26d9700 (LWP 5778)]
[New Thread 0x7fffefed8700 (LWP 5779)]
[New Thread 0x7fffed6d7700 (LWP 5780)]
[New Thread 0x7fffeaed6700 (LWP 5781)]
[New Thread 0x7fffe86d5700 (LWP 5782)]
[New Thread 0x7fffe5ed4700 (LWP 5783)]
[Thread 0x7fffe86d5700 (LWP 5782) exited]
[Thread 0x7fffe5ed4700 (LWP 5783) exited]
[Thread 0x7fffed6d7700 (LWP 5780) exited]
[Thread 0x7fffeaed6700 (LWP 5781) exited]
[Thread 0x7ffff2eda700 (LWP 5777) exited]
[Thread 0x7fffefed8700 (LWP 5779) exited]
[Thread 0x7ffff26d9700 (LWP 5778) exited]
[New Thread 0x7fffe5ed4700 (LWP 5788)]
src/tcmalloc.cc:277] Attempt to free invalid pointer 0xa2e78616d5f7475 

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe5ed4700 (LWP 5788)]
0x00007ffff75e2cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56	../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

I wasn't really able to make sense of it. I also searched through all of /opt/ros/indigo/ for tcmalloc with no results.

In a debugger, I stepped through the program until it threw the error. The offending line was /opt/ros/indigo/lib/python2.7/dist-packages/rosgraph/xmlrpc.py:199:

    def start(self):
        """
        Initiate a thread to run the XML RPC server. Uses thread.start_new_thread.
        """
        _thread.start_new_thread(self.run, ())

Is it possible that cv_bridge is using a version of OpenCV that is not compatible with the recent Tensorflow update?

❯ pkg-config --modversion opencv
2.4.13

jhseu · 2017-03-07T20:54:46Z

We don't depend on opencv in TensorFlow, so I'm not sure. Closing out, though, because this bug is unlikely to be an issue in TensorFlow.

ethanabrooks · 2017-03-07T21:01:35Z

That may be, but the script does not throw the error without the import tensorflow.contrib line.

jhseu · 2017-03-07T21:07:32Z

It's still unlikely to be in TensorFlow. My best guess without trying it out is that there's a shared module dependency somewhere, TF is using glibc malloc upon module import, and somewhere along the long someone is using tcmalloc and freeing.

Libraries shouldn't be switching out malloc implementations unless its usage is completely self-contained.

ethanabrooks · 2017-03-07T21:25:43Z

I (sort of) fixed it:

import cv_bridge  # <-- note: switched with
import tensorflow.contrib  # this
import rospy
rospy.init_node('node')

This does not throw an error. Why the order of imports matters is beyond me. These kinds of things seem to crop up often when working with ros.

jhseu · 2017-03-07T21:29:52Z

Yeah, import order affects symbol resolution order. My explanation before is most likely right, and it's a bug in cv_bridge.

prb12 · 2017-03-07T21:29:57Z

@lobachevzky Thanks for following up with the workaround. It does indeed look like cv_bridge is doing something bad with tc_malloc.

ethanabrooks · 2017-03-09T16:20:44Z

Ok. I did a little more digging and I found this line in my .zshrc:

export LD_PRELOAD="/usr/lib/libtcmalloc_minimal.so.4"

Commenting this out solved the problem. I'm not really sure which of the three libraries that were involved would be responsible, but it might be good to include a more informative error message.

ethanabrooks mentioned this issue Mar 6, 2017

Using tensorflow.contrib with cv_bridge causes tcmalloc error ros-perception/vision_opencv#161

Closed

prb12 assigned jhseu Mar 7, 2017

jhseu closed this as completed Mar 7, 2017

f11r mentioned this issue Mar 31, 2017

spaCy and tcmalloc compiled module leads to "Attempt to free invalid pointer" explosion/spaCy#938

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using tensorflow.contrib with cv_bridge causes tcmalloc error #8146

Using tensorflow.contrib with cv_bridge causes tcmalloc error #8146

ethanabrooks commented Mar 6, 2017 •

edited

prb12 commented Mar 7, 2017

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017 •

edited

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017 •

edited

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017 •

edited

jhseu commented Mar 7, 2017

prb12 commented Mar 7, 2017

ethanabrooks commented Mar 9, 2017 •

edited

Using tensorflow.contrib with cv_bridge causes tcmalloc error #8146

Using tensorflow.contrib with cv_bridge causes tcmalloc error #8146

Comments

ethanabrooks commented Mar 6, 2017 • edited

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

What other attempted solutions have you tried?

Logs or other output that would be helpful

prb12 commented Mar 7, 2017

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017 • edited

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017 • edited

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017

jhseu commented Mar 7, 2017

ethanabrooks commented Mar 7, 2017 • edited

jhseu commented Mar 7, 2017

prb12 commented Mar 7, 2017

ethanabrooks commented Mar 9, 2017 • edited

ethanabrooks commented Mar 6, 2017 •

edited

ethanabrooks commented Mar 7, 2017 •

edited

ethanabrooks commented Mar 7, 2017 •

edited

ethanabrooks commented Mar 7, 2017 •

edited

ethanabrooks commented Mar 9, 2017 •

edited