New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ compilation of rule '@jemalloc//:jemalloc' failed: #7268
Comments
Ultimately I disabled jemalloc when prompted during configuration by the tensorflow/configure file. TensorFlow seems to have compiled fine from there on. |
cc @jhseu who added jemalloc. @Montmorency can you specify your linux/bazel versions? |
@Montmorency Possibly unrelated, but I've met random problems using bazel build on NFS hosted home directory. export TEST_TMPDIR="/tmp" # or some local directory that is not hosted on NFS This might help. |
@jhseu Can you comment? |
@Montmorency I tested against gcc 4.8.4 and it works for me. Can you copy the compilation error that you get before that error message? Also, can you try removing NFS as a potential issue by doing what @byronyi mentioned? |
Thanks for response!
Distro
Update: I removed the second flag when the build failed at linking stage I think this is related to an older version of openssl on the cluster which configure didn't seem to detect (getenv rather than secure_getenv). Training the models is fine with this build but I do still a strange glibc error when running model.predict_proba(): *** glibc detected *** python: double free or corruption (!prev): 0x0000000001375da0 *** |
Ah yeah, that kernel is really old (from 2009!). I'm not sure we should try to support jemalloc with that kernel when there's a reasonable workaround. The double free issue is unrelated because it happens even when disabling jemalloc (from your other comment on #6968). Also, it's in code that's unaffected by jemalloc. Looking at the stack trace, it's crashing in pthread_join in deallocating thread-local storage. Seems unlikely to be a TensorFlow issue (possibly also related to the old linux kernel?) Closing the issue out as intended behavior. |
I haven't tested, but some searching indicates that jemalloc should build as long as the Linux kernel version is >= 2.6.38, otherwise it needs to be disabled. |
I removed bazel cache, did bazel clean, disabled jemalloc via configure and still get the following error (and yes my kernel is 2.6.32) ERROR: /home/ebice/.cache/bazel/bazel_ebice/975e0509e630426b34ea61d02aa8b898/ex ternal/jemalloc/BUILD:10:1: C++ compilation of rule '@jemalloc//:jemalloc' faile d: gcc failed: error executing command /opt/rh/devtoolset-6/root/usr/bin/gcc -U FORTIFY_SOURCE -fstack-protector -Wall -B/opt/rh/devtoolset-6/root/usr/bin -B/us r/bin -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 38 argu ment(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Pr ocess exited with status 1. |
@edi-bice Unfortunately, I don't have access to any machine with such old Linux kernels to test. That shouldn't happen if jemalloc is really disabled as far as I can tell. |
"really disabled" is the keyword. Apparently bazel clean did not really clean everything. In addition to .bazelrc the file .tf_configure.bazelrc remained even after a bazel clean and inside there jemalloc=true despite configure output stating "jemalloc disabled". |
Did you rerun ./configure? That file is deleted and updated again here: |
Trying to install tensor flow on cluster from source. I have installed bazel
[bazel release 0.4.4- (@non-git)], and I am using python 2.7.13 with pyenv. Upon trying to build the tensorflow pip wheel I am getting a compilation error:
ERROR: ~/.cache/bazel/_bazel/e924d9c3ba75314415252c6f4f93bb86/external/jemalloc/BUILD:10:1: C++ compilation of rule '@jemalloc//:jemalloc' failed: gcc failed: error executing command /opt/apps/compilers/gcc/4.8.2/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -B/opt/apps/compilers/gcc/4.8.2/bin -B/usr/bin -Wunused-but-set-parameter -Wno-free-nonheap-object ... (remaining 38 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
Has anyone experienced this? Is this a consequence of the warning I get from bazel for being on an NFS:
WARNING: Output base '~/.cache/bazel/_bazel/e924d9c3ba75314415252c6f4f93bb86' is on NFS. This may lead to surprising failures and undetermined behaviour.
My gcc version is:
gcc (GCC) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
The text was updated successfully, but these errors were encountered: