Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow compile from source with GPU support error #16988

Closed
ciprianflow opened this issue Feb 13, 2018 · 7 comments
Closed

TensorFlow compile from source with GPU support error #16988

ciprianflow opened this issue Feb 13, 2018 · 7 comments
Labels
stat:awaiting response Status - Awaiting response from author

Comments

@ciprianflow
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): master version or 1.6.0
  • Python version: 3.5
  • Bazel version (if compiling from source): 0.7.0
  • GCC/Compiler version (if compiling from source): gcc (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609
  • CUDA/cuDNN version: 9.0 / 7
  • GPU model and memory: NVIDIA Tesla K80
  • Exact command to reproduce: bazel build --config=opt --config=cuda tensorflow/tools/pip_package:build_pip_package --action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}"

Describe the problem

I have tried to compile the master version from source, I've added all the env variables, been through stackoverflow and github issues, nothing works, I think it's a bug.

Source code / logs

ERROR: /home/ubuntu/work/master/tensorflow/tensorflow/contrib/periodic_resample/BUILD:40:1: Linking of rule '//tensorflow/contrib/periodic_resample:gen_gen_periodic_resample_op_py_py_wrappers_cc' failed (Exit 1)
/usr/bin/ld: warning: libcublas.so.9.0, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scontrib_Speriodic_Uresample_Cgen_Ugen_Uperiodic_Uresample_Uop_Upy_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libcudnn.so.7, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scontrib_Speriodic_Uresample_Cgen_Ugen_Uperiodic_Uresample_Uop_Upy_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libcurand.so.9.0, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scontrib_Speriodic_Uresample_Cgen_Ugen_Uperiodic_Uresample_Uop_Upy_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scontrib_Speriodic_Uresample_Cgen_Ugen_Uperiodic_Uresample_Uop_Upy_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cublasGemmEx@libcublas.so.9.0'

and a lot more going

@cy89
Copy link

cy89 commented Feb 14, 2018

@ciprianflow have you gotten other TF builds to work from src on this machine, and it's just this particular version that isn't building for you?

@cy89 cy89 added the stat:awaiting response Status - Awaiting response from author label Feb 14, 2018
@ciprianflow
Copy link
Author

Yes I have, I've actually downgraded to CUDA 8.0 and I managed to successfully build. It seems to be a problem with CUDA 9.0 only.

@pwaller
Copy link

pwaller commented May 9, 2018

I'm experiencing the same issue in very similar circumstances. I've read all the related issues and stack overflow posts I can find, and employed the workaround to no avail.

I'm trying to build on a machine running Deep Learning AMI (Ubuntu) Version 8.0 (ami-5d7c5024) in eu-west-1 on AWS. I'm using python 3.6 and TensorFlow 1.8.0. It is using cuda 9 and cudnn 7, which is in the LD_LIBRARY_PATH, both at configure and build time.

The shared objects which ld claims are missing are definitely present, and I have run with and without the workaround (--action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}") suggested in this issue and elsewhere.

@ciprianflow and @cy89 could we reopen?

@dan-1d
Copy link

dan-1d commented May 9, 2018

  • UPDATE: Workaround posted below *

I am also experiencing this issue. I've tried TF v.1.8 and the latest Github master. My system:
Linux Redhat 7.5
CUDA 9.1 -- installed as root user
cuDNN 7.1.3 -- located in my user directory
appropriately set paths in ./configure script, and confirmed by looking at tensorflow-master/.tf_configure.bazelrc

I have tried with/without the --action_env="LD_LIBRARY_PATH=${LD_LIBRARY_PATH}" as suggested. Failure occurs in various places (this is a 32-core machine, so there's some parallelism-related non-determinism with which files cause a fault-out first). It gets to about 11,000 of 12,462 tasks.

""""" log with verbose errors """"
ERROR: $HOME/tensorflow_source/tensorflow-master/tensorflow/python/BUILD:1568:1: Linking of rule '//tensorflow/python:gen_set_ops_py_wrappers_cc' failed (Exit 1): crosstool_wrapper_driver_is_not_gcc failed: error executing command
(cd $HOME/.cache/bazel/bazel$USER/0adab69598361a8f35bbb0e3835cbfb8/execroot/org_tensorflow &&
exec env -
PWD=/proc/self/cwd
external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/tensorflow/python/gen_set_ops_py_wrappers_cc '-Wl,-rpath,$ORIGIN/../../_solib_local/_U_S_Stensorflow_Spython_Cgen_Uset_Uops_Upy_Uwrappers_Ucc___Utensorflow' '-Wl,-rpath,$ORIGIN/../../_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib' '-Wl,-rpath,$ORIGIN/../../_solib_local/_U@mkl_Ulinux_S_S_Cmkl_Ulibs_Ulinux___Uexternal_Smkl_Ulinux_Slib' -Lbazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Uset_Uops_Upy_Uwrappers_Ucc___Utensorflow -Lbazel-out/host/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccudart___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib -Lbazel-out/host/bin/_solib_local/_U@mkl_Ulinux_S_S_Cmkl_Ulibs_Ulinux___Uexternal_Smkl_Ulinux_Slib '-Wl,-rpath,$ORIGIN/,-rpath,$ORIGIN/..' -Wl,-z,notext -Wl,-z,notext -Wl,-rpath,../local_config_cuda/cuda/lib64 -Wl,-rpath,../local_config_cuda/cuda/extras/CUPTI/lib64 -pthread -Wl,-no-as-needed -B/usr/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,--gc-sections -Wl,-S -Wl,@bazel-out/host/bin/tensorflow/python/gen_set_ops_py_wrappers_cc-2.params)
/usr/bin/ld: warning: libcudnn.so.7, needed by bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Uset_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so, not found (try using -rpath or -rpath-link)
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Spython_Cgen_Uset_Uops_Upy_Uwrappers_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `cudnnCreate@libcudnn.so.7'

"""""

@dan-1d
Copy link

dan-1d commented May 10, 2018

I was able to work around this issue by modifying:
CROSSTOOL_nvcc.tpl

adding this under toolchain "local_linux" (I used my full pathnames.. but I put in $HOME for reference here)

linker_flag: "-Wl,-rpath=$HOME/.local/cuda/lib64"

and below that under %host_compiler_includes:
cxx_builtin_include_directory: "$HOME/.local/cuda/include"

@wincle
Copy link

wincle commented May 10, 2018

@dan-1d Thanks!

@petacube
Copy link

dan1-d I had modified CROSSTOOL_nvcc.tpl but it did not help. can you post the complete modified cross tool file pls?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

6 participants