Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path. #4105

Closed
trevorwelch opened this issue Aug 30, 2016 · 74 comments
Assignees

Comments

@trevorwelch
Copy link

trevorwelch commented Aug 30, 2016

Environment info

Operating System:
OS 10.10.5

Installed version of CUDA and cuDNN:

$ ls -l /usr/local/cuda/lib/libcud*
-rwxr-xr-x  1 root           wheel      8280 Apr 13 01:02 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x  1 root           wheel        45 Apr 13 01:03 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudadevrt.a
lrwxr-xr-x  1 root           wheel        50 Apr 13 01:03 /usr/local/cuda/lib/libcudart.7.5.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.7.5.dylib
lrwxr-xr-x  1 root           wheel        46 Apr 13 01:03 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.dylib
lrwxr-xr-x  1 root           wheel        49 Apr 13 01:03 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart_static.a
-rwxr-xr-x@ 1 production204  staff  60108616 Feb  8  2016 /usr/local/cuda/lib/libcudnn.4.dylib
lrwxr-xr-x  1 root           admin        47 Aug 29 18:08 /usr/local/cuda/lib/libcudnn.5.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudnn.5.dylib
lrwxr-xr-x  1 root           admin        45 Aug 29 18:08 /usr/local/cuda/lib/libcudnn.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudnn.dylib
-rw-r--r--@ 1 production204  staff  59311504 Feb  8  2016 /usr/local/cuda/lib/libcudnn_static.a
  1. The output from python -c "import tensorflow; print(tensorflow.__version__)".
    (can't get that far, but i'm using 0.10)
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
Segmentation fault: 11

If installed from source, provide

  1. The commit hash (git rev-parse HEAD)
4c49dbebef05442c7e72d6129a30574fcd13f0e1
  1. The output of bazel version
$ bazel version
Build label: 0.3.1-homebrew
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Aug 4 09:59:58 2016 (1470304798)
Build timestamp: 1470304798
Build timestamp as int: 1470304798

If possible, provide a minimal reproducible example (We usually don't have time to read hundreds of lines of your code)

$ bazel build -c opt --config=cuda //tensorflow/cc:tutorials_example_trainer
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
INFO: Elapsed time: 0.076s

What other attempted solutions have you tried?

  • Downgrading to cuDNN4, switching between 4 and 5
  • Re-installing bazel
  • Modifying CROSSTOOL file according to various threads
  • Manually linking CUDA libraries during ./configure to not use symlinked libraries
  • Various other hacks over the last week 😭
@trevorwelch trevorwelch changed the title ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path. ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path. INFO: Elapsed time: 1.327s ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path. Aug 30, 2016
@vrv
Copy link

vrv commented Aug 30, 2016

I got this recently too, I was somehow successful by just re-running ./configure and then immediately running bazel build, but I'm not sure what's going on.

@trevorwelch
Copy link
Author

trevorwelch commented Aug 30, 2016

Following that @vrv I just tried de-installing TF completely and then re-running with the same ./configure settings:

$ ./configure
Please specify the location of python. [Default is /Library/Frameworks/Python.framework/Versions/2.7/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] y
Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
  /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
  /Library/Python/2.7/site-packages
Please input the desired Python library path to use.  Default is [/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages]

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 7.5
Please specify the location where CUDA 7.5 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 5
Please specify the location where cuDNN 5 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
.
INFO: Waiting for response from Bazel server (pid 63884)...
WARNING: /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/boringssl_git/WORKSPACE:1: Workspace name in /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/boringssl_git/WORKSPACE (@boringssl) does not match the name given in the repository's definition (@boringssl_git); this will cause a build error in future versions.
INFO: All external dependencies fetched successfully.
Configuration finished

Then a simplified build command:

$ bazel build -c opt --config=cuda tensorflow/...

After a very verbose and lengthy compile attempt, I received this error message (it caused OS X Terminal app to hang permanently as well, so I couldn't copy-paste, had to take a screenshot):

https://www.dropbox.com/s/riu5f4n5aj1opmk/Screenshot%202016-08-30%2016.33.48.png?dl=0

Yet again, some sort of a CROSSTOOL issue. I've made sure to pull and update my local TF often over the week that I've been trying to build, as I've seen lots of activity related to this component of TF.

@Mistobaan
Copy link
Contributor

Yes if you are developing TF it happens quite often. I think it might be something related with the caching system of bazel. After few hours it resets and I have to re run the configure again.

@davidzchen
Copy link
Contributor

davidzchen commented Aug 30, 2016

+cc @damienmg

This seems to be a bug in Bazel. (Edit: to clarify: I meant the occasional no such package '@local_config_cuda//crosstool': BUILD file not found on package path. error).

Next time this happens, can you take a look at directory bazel-tensorflow/external/local_config_cuda/crosstool and let me know which files are there?

@trevorwelch
Copy link
Author

trevorwelch commented Aug 30, 2016

@davidzchen thanks for your reply.

This error or some version of it is consistent:

tensorflow$ cd bazel-tensorflow/external/local_config_cuda/crosstool

production204@Trevors-MacBook-Pro crosstool$ ls -l
total 32
-rwxr-xr-x  1 production204  wheel   925 Aug 30 14:45 BUILD
-rwxr-xr-x  1 production204  wheel  9068 Aug 30 14:45 CROSSTOOL
drwxr-xr-x  3 production204  wheel   102 Aug 30 14:45 clang

production204@Trevors-MacBook-Pro crosstool$ 

@davidzchen
Copy link
Contributor

I just saw your screenshot, and that appears to be a different problem than the no such package '@local_config_cuda//crosstool': BUILD file not found on package path. error, which seems to be a caching issue.

Do you mean that the crosstool_wrapper_driver_is_not_gcc error occurs consistently? Is it causing your Terminal.app to hang every time? If it reproduces consistently, you re-run the command with --verbose_failures?

@trevorwelch
Copy link
Author

I meant that my TF builds seem to fail related to crosstool consistently, probably my naivete on the specifics for me to think that crosstool_wrapper_driver_is_not_gcc and CROSSTOOL could be the same problem!

@davidzchen
Copy link
Contributor

davidzchen commented Aug 31, 2016

No problem. The naming could be a bit confusing. The @local_config_cuda//crosstool error may be an issue with Bazel's caching; I have ran into it a couple of times myself, and it usually goes away after I re-run ./configure.

Were you able to reproduce the crosstool_wrapper_driver_is_not_gcc error again? Looking at your screenshot, it looks like the headerpad_max_install_names flag is not spelled correctly for some reason since it is complaining about eaderpad_max_install_names. Did you change this flag in your CROSSTOOL file?

@Dapid
Copy link

Dapid commented Aug 31, 2016

I am getting the same missing crosstool on Linux. The strange thing is that there isn't even a bazel-tensorflow directory:

[david@SQUIDS tensorflow]$ ls
ACKNOWLEDGMENTS  avro.BUILD   boringssl.BUILD  bzip2.BUILD  CONTRIBUTING.md  farmhash.BUILD  gmock.BUILD  ISSUE_TEMPLATE.md  jsoncpp.BUILD  nanopb.BUILD  png.BUILD      README.md   six.BUILD   third_party  util       zlib.BUILD
AUTHORS          boost.BUILD  bower.BUILD      configure    eigen.BUILD      gif.BUILD       grpc.BUILD   jpeg.BUILD         LICENSE        navbar.md     _python_build  RELEASE.md  tensorflow  tools        WORKSPACE

bazel is 0.3.1, and I have ran ./configure four times now.

@trevorwelch
Copy link
Author

trevorwelch commented Aug 31, 2016

I do see that eaderpad_max_install_names reference in the screenshot above, however, after searching through all the files and folders in my TF directory I don't see that text anywhere, only headerpad_max_install_names. I don't know why the h would be getting clipped off.

Update: Same error message related to crosstool_wrapper_driver_is_not_gcc. The '@local_config_cuda//crosstool': BUILD file not found on package path. error seems to have gone away after doing a pip uninstall on TF and then re-installing.

INFO: From Linking tensorflow/cc/ops/io_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/random_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/parsing_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/sparse_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/logging_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/string_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/user_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/candidate_sampling_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/control_flow_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/image_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/array_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/linalg_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/no_op_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
INFO: From Linking tensorflow/cc/ops/training_ops_gen_cc [for host]:
clang: warning: argument unused during compilation: '-pthread'
ERROR: /Users/production204/Github/tensorflow/tensorflow/cc/BUILD:179:1: Executing genrule //tensorflow/cc:io_ops_genrule failed: bash failed: error executing command /bin/bash -c ... (remaining 1 argument(s) skipped): com.google.devtools.build.lib.shell.AbnormalTerminationException: Process terminated by signal 5.
dyld: Library not loaded: @rpath/libcudart.7.5.dylib
  Referenced from: /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/execroot/tensorflow/bazel-out/host/bin/tensorflow/cc/ops/io_ops_gen_cc
  Reason: image not found
/bin/bash: line 1: 44071 Trace/BPT trap: 5       bazel-out/host/bin/tensorflow/cc/ops/io_ops_gen_cc bazel-out/local_darwin-opt/genfiles/tensorflow/cc/ops/io_ops.h bazel-out/local_darwin-opt/genfiles/tensorflow/cc/ops/io_ops.cc 0
Target //tensorflow/cc:tutorials_example_trainer failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 3015.469s, Critical Path: 3002.51s

The complete log was too big for pastebin, here it is on Dropbox: https://www.dropbox.com/home/Documents%20Dropbox?preview=TW-TF-error-log-083116.txt

Then, I run the same above commands, but with --verbose_failures (hard to imagine it being more verbose that the previous log, which was almost 15,000 lines!), final error message was:

ERROR: /Users/production204/Github/tensorflow/tensorflow/cc/BUILD:179:1: Executing genrule //tensorflow/cc:training_ops_genrule failed: bash failed: error executing command 
  (cd /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/execroot/tensorflow && \
  exec env - \
    PATH=/usr/local/cuda/bin:/Library/Frameworks/Python.framework/Versions/2.7/bin:/usr/local/bin:usr/local/sbin:/usr/local/mysql/bin:/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin \
    TMPDIR=/var/folders/h3/pn9k79xn6qd9jgksqbkpn3l80000gn/T/ \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/host/bin/tensorflow/cc/ops/training_ops_gen_cc bazel-out/local_darwin-opt/genfiles/tensorflow/cc/ops/training_ops.h bazel-out/local_darwin-opt/genfiles/tensorflow/cc/ops/training_ops.cc 0'): com.google.devtools.build.lib.shell.AbnormalTerminationException: Process terminated by signal 5.
dyld: Library not loaded: @rpath/libcudart.7.5.dylib
  Referenced from: /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/execroot/tensorflow/bazel-out/host/bin/tensorflow/cc/ops/training_ops_gen_cc
  Reason: image not found
/bin/bash: line 1: 74845 Trace/BPT trap: 5       bazel-out/host/bin/tensorflow/cc/ops/training_ops_gen_cc bazel-out/local_darwin-opt/genfiles/tensorflow/cc/ops/training_ops.h bazel-out/local_darwin-opt/genfiles/tensorflow/cc/ops/training_ops.cc 0
Target //tensorflow/cc:tutorials_example_trainer failed to build
INFO: Elapsed time: 3111.405s, Critical Path: 3097.65s

production204@Trevors-MacBook-Pro tensorflow $ 

Here's the complete log: https://www.dropbox.com/s/nozqcscnc9ho5uz/TW-TF-error-log-083116--verbose_failures.txt?dl=0

@davidzchen
Copy link
Contributor

@Dapid That is interesting. How did you run the ./configure script? Can you paste the output

@damienmg Is there currently a way to inspect the contents of /external after running bazel fetch but before the Bazel output directories get symlinked?

@trevorwelch FWIW, most of the noise in the output are compiler warnings. The dyld: Library not loaded: @rpath/libcudart.7.5.dylib error is interesting. Can you verify whether the file bazel-bin/tensorflow/cc/tutorials_example_trainer.runfiles/local_config_cuda/cuda/lib/libcudart.7.5.dylib exists? If not, what files are under the bazel-bin/tensorflow/cc/tutorials_example_trainer.runfiles/local_config_cuda/cuda/lib directory?

@Dapid
Copy link

Dapid commented Sep 1, 2016

@davidzchen

./configure 
Please specify the location of python. [Default is /home/david/.virtualenvs/py35/bin/python]: 
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N] n
No Google Cloud Platform support will be enabled for TensorFlow
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /bin/gcc]: /usr/local/cuda/bin/gcc
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
libcudnn.so resolves to libcudnn.4
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 5.0
INFO: Reading 'startup' options from /home/david/.bazelrc: --batch
Warning: ignoring LD_PRELOAD in environment.
INFO: Starting clean (this may take a while). Consider using --expunge_async if the clean takes more than several minutes.
INFO: Reading 'startup' options from /home/david/.bazelrc: --batch
Warning: ignoring LD_PRELOAD in environment.
WARNING: /home/david/.cache/bazel/_bazel_david/47d00ffdd2fc0515138a34f138cebd63/external/boringssl_git/WORKSPACE:1: Workspace name in /home/david/.cache/bazel/_bazel_david/47d00ffdd2fc0515138a34f138cebd63/external/boringssl_git/WORKSPACE (@boringssl) does not match the name given in the repository's definition (@boringssl_git); this will cause a build error in future versions.
INFO: All external dependencies fetched successfully.
Configuration finished
git log
commit 6ce5b5c8298273e3861a75fb6ccde63b9dd157c5
Author: Sanders Kleinfeld <sandersk@users.noreply.github.com>
Date:   Sun Aug 28 01:00:52 2016 -0400

On branch r0.10.

If I leave the default GCC it is created, but the build fails because it is incompatible with CUDA.

@trevorwelch
Copy link
Author

@davidzchen
It does exist:

tensorflow$ cd bazel-bin/tensorflow/cc/tutorials_example_trainer.runfiles/local_config_cuda/cuda/lib/

lib$ ls -l
total 40
lrwxr-xr-x  1 production204  wheel  126 Aug 31 11:03 libcublas.7.5.dylib -> /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/local_config_cuda/cuda/lib/libcublas.7.5.dylib
lrwxr-xr-x  1 production204  wheel  126 Aug 31 11:03 libcudart.7.5.dylib -> /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/local_config_cuda/cuda/lib/libcudart.7.5.dylib
lrwxr-xr-x  1 production204  wheel  123 Aug 31 11:03 libcudnn.5.dylib -> /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/local_config_cuda/cuda/lib/libcudnn.5.dylib
lrwxr-xr-x  1 production204  wheel  125 Aug 31 11:03 libcufft.7.5.dylib -> /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/local_config_cuda/cuda/lib/libcufft.7.5.dylib
lrwxr-xr-x  1 production204  wheel  126 Aug 31 11:03 libcurand.7.5.dylib -> /private/var/tmp/_bazel_production204/ed2bbf43bcd665c40f1e3ebaa04f68f6/external/local_config_cuda/cuda/lib/libcurand.7.5.dylib

lib$ 

@jmhodges
Copy link
Contributor

jmhodges commented Sep 1, 2016

Whoa, I'm running into this, too, but on master with OS X 10.11.6.

./configure:

/usr/local/Cellar/python/2.7.12/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]: 
Please specify the location where CUDA  toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify the Cudnn version you want to use. [Leave empty to use system default]: 
Please specify the location where cuDNN  library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.
[Default is: "3.5,5.2"]: 3.0

Here's my file listings. All of the symlinks work and the files are all Mach-O so no weird accidental ELF or something: https://gist.github.com/jmhodges/a5de9cc5760333f5b57040d1947ec190

This was after going to sleep and coming back to this just now. Last night, I was debugging a different error condition and just came back to find my builds no longer working. I thought it was me hand-hacking in extra linkopts (-L/usr/local/cuda/lib, specifically) into various BUILD files trying to get the dyld error fixed.

@lissyx
Copy link
Contributor

lissyx commented Sep 1, 2016

I can confirm this also building on a Debian (sid, uptodate of today) system.

@jmhodges
Copy link
Contributor

jmhodges commented Sep 1, 2016

I've found I can induce this by Ctrl-C'ing in the middle of a fresh bazel build.

@FlorinAndrei
Copy link

FlorinAndrei commented Sep 3, 2016

Ubuntu-16.04, CUDA 8, java 1.8.0_101, bazel 0.3.1

Building from master today

Started ./configure in a virtual instance in VirtualBox, did a CTRL-C because it was taking too long. Went home, fired up the instance again, deleted tensorflow repo, cloned it again.

Did ./configure again with same options as before, it worked well except one warning at the beginning:

Found stale PID file (pid=20777). Server probably died abruptly, continuing...

Ignored it, and did the command to build for GPU:

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

And it failed immediately:

INFO: Waiting for response from Bazel server (pid 9635)...
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path.
INFO: Elapsed time: 4.667s

EDIT:

Tried bazel clean and try again. bazel clean --expunge and try again ./configure and bazel build. Nothing helps. Fails the same way always. :(

@davidzchen
Copy link
Contributor

davidzchen commented Sep 3, 2016

There are two issues being discussed in this thread. @trevorwelch, let's move the Library not loaded: @rpath/libcudart.7.5.dylib discussion over to #4187.

For those experiencing the '@local_config_cuda//crosstool': BUILD file not found issue:

  • If the Bazel output directories (i.e. bazel-tensorflow, etc.) exist, what is the output of ls -l bazel-tensorflow/external/local_config_cuda/crosstool?
  • If not, what is the output of ls -l $(bazel info output_base)/external/local_config_cuda/crosstool?

In the meantime, I am still trying to reproduce this.

@asimonov
Copy link

asimonov commented Sep 3, 2016

i experience the same issue. tf 0.10. mac os el capitain. bazel 0.3.0

@davidzchen
Copy link
Contributor

@asimonov - Can you print the contents of the local_config_cuda/crosstool directory as I mentioned in my comment above?

@FlorinAndrei
Copy link

FlorinAndrei commented Sep 3, 2016

root@machine-learning:/vagrant/packages/tensorflow# ls *bazel*
ls: cannot access '*bazel*': No such file or directory
root@machine-learning:/vagrant/packages/tensorflow# ls -l $(bazel info output_base)/external/local_config_cuda/crosstool
ls: cannot access '/root/.cache/bazel/_bazel_root/b0bb79a433b74dfa52314ef9af1d2ddd/external/local_config_cuda/crosstool': No such file or directory

contents of bazel cache after BUILD file not found

Here's how to reproduce it:

Clone this repo: https://github.com/FlorinAndrei/ml-setup

Checkout the ubuntu1604 branch, then launch the virtual machine and run the ansible installer, then compile TF by hand:

git clone https://github.com/FlorinAndrei/ml-setup
cd ml-setup
git checkout ubuntu1604
vagrant up
vagrant ssh

sudo su -
cp /vagrant/bash_profile_example /root/.bash_profile
exit
sudo su -

cd /vagrant
# this will take a long time
ansible-playbook -i inventory main.yml
exit
sudo su -

cd /vagrant/packages/tensorflow
./configure

# Hit ENTER on every question except:
# Do you wish to build TensorFlow with GPU support? (answer: y)
# Please specify a list of comma-separated Cuda compute capabilities you want to build with. (answer: 6.1)

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

However, if you delete the tensorflow repo, re-clone and try again, it starts compiling:

cd
rm -rf /vagrant/packages/tensorflow
cd /vagrant
ansible-playbook -i inventory 40-tensorflow.yml

cd /vagrant/packages/tensorflow
./configure

# Hit ENTER on every question except:
# Do you wish to build TensorFlow with GPU support? (answer: y)
# Please specify a list of comma-separated Cuda compute capabilities you want to build with. (answer: 6.1)

contents of bazel cache after ./configure

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

And now it starts compiling.

EDIT: Even on second try, it still fails to compile all the way to the end, but that seems like a different issue, which I've opened here:

#4190

@asimonov
Copy link

asimonov commented Sep 4, 2016

David,

I cannot find local_config_cuda/crosstool directory anywhere in tensorflow directory.

Kind Regards,
Alexey

On 3 Sep 2016, at 20:59, David Z. Chen notifications@github.com wrote:

@asimonov https://github.com/asimonov - Can you print the contents of the local_config_cuda/crosstool directory as I mentioned in my comment above?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub #4105 (comment), or mute the thread https://github.com/notifications/unsubscribe-auth/AAOr9jkI6txztNytQvsTCtcTUtlP5lrgks5qmdGbgaJpZM4Jw5M2.

@rasmi
Copy link
Member

rasmi commented Sep 5, 2016

@davidzchen At bazel-tensorflow/external/local_config_cuda/crosstool, I'm getting a broken symlink to

~/.cache/bazel/_bazel_user/d217f35631206796f447d50c6f1d6243/external/local_config_cuda/crosstool

Maybe worth noting that there does exist a cuda directory at

~/.cache/bazel/_bazel_/d217f35631206796f447d50c6f1d6243/external/local_config_cuda/cuda

And so the symlink to it at bazel-tensorflow/external/local_config_cuda/cuda is valid.

@gopi77
Copy link

gopi77 commented Sep 17, 2016

Got this problem ERROR: no such package '@local_config_cuda//crosstool': again today.
After various attempts the below steps worked.

  1. sudo apt-get upgrade bazel

  2. ./configure (i didn't repeat this step >> $ git clone https://github.com/tensorflow/tensorflow)

  3. To build with GPU support:

    bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

  4. mkdir _python_build
    cd _python_build
    ln -s ../bazel-bin/tensorflow/tools/pip_package/build_pip_package.runfiles/org_tensorflow/* .
    ln -s ../tensorflow/tools/pip_package/* .
    python setup.py develop

  5. cd tensorflow/models/image/mnist
    python convolutional.py

@suiyuan2009
Copy link
Contributor

meet the same issue.

davidzchen added a commit to davidzchen/tensorflow that referenced this issue Sep 19, 2016
* Run bazel clean and bazel fetch in the configure script even when building
  without GPU support to force clean+fetch if the user re-runs ./configure
  with a different setting.
* Print a more actionable error messsage if the user attempts to build with
  --config=cuda but did not configure TensorFlow to build with GPU support.
* Update the BUILD file in @local_config_cuda to use repository-local labels.

Fixes tensorflow#4105
@obo
Copy link

obo commented Sep 19, 2016

Hi. I'm getting "ERROR: no such package '@local_config_cuda//crosstool': BUILD file not found on package path." as well with:

The issue for me happens deterministically, if I run tensorflow ./configure while trying to avoid interactive questions:

set vars to avoid interactive

export PYTHON_BIN_PATH=/usr/bin/python

No way to confirm the following default value to ./util/python/python_config.sh without actually hitting the Return key: :-((

/usr/lib/python3/dist-packages

export TF_NEED_GCP=n
export TF_NEED_CUDA=y
export GCC_HOST_COMPILER_PATH=/usr/bin/gcc
export TF_CUDA_VERSION=7.5
export CUDA_TOOLKIT_PATH=$CUDA_HOME
export TF_CUDNN_VERSION=4
export CUDNN_INSTALL_PATH=$CUDA_HOME
export TF_CUDA_COMPUTE_CAPABILITIES=3.0

If I run ./configure without setting the above variables, ie.:

ubuntu@aws17:~/tensorflow$ ./configure
~/tensorflow ~/tensorflow
Please specify the location of python. [Default is /usr/bin/python]:
Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]
No Google Cloud Platform support will be enabled for TensorFlow
Found possible Python library paths:
/usr/local/lib/python3.5/dist-packages
/usr/lib/python3/dist-packages
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]

/usr/local/lib/python3.5/dist-packages
Do you wish to build TensorFlow with GPU support? [y/N] y
GPU support will be enabled for TensorFlow
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Please specify the Cuda SDK version you want to use, e.g. 7.0. [Leave empty to use system default]:
Please specify the location where CUDA toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify the Cudnn version you want to use. [Leave empty to use system default]:
Please specify the location where cuDNN library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
libcudnn.so resolves to libcudnn.4
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size.

then the compilation works.

Note that there is probably no way to pass blank values (indicating "use the default" as opposed to undefined values indicating "I have not answered yet") for several of the variables. So ./configure in the interactive mode is getting various things blank while the less interactive ./configure has these values filled.

davidzchen added a commit to davidzchen/tensorflow that referenced this issue Sep 19, 2016
* Run bazel clean and bazel fetch in the configure script even when building
  without GPU support to force clean+fetch if the user re-runs ./configure
  with a different setting.
* Print a more actionable error messsage if the user attempts to build with
  --config=cuda but did not configure TensorFlow to build with GPU support.
* Update the BUILD file in @local_config_cuda to use repository-local labels.

Fixes tensorflow#4105
davidzchen added a commit to davidzchen/tensorflow that referenced this issue Sep 21, 2016
* Run bazel clean and bazel fetch in the configure script even when building
  without GPU support to force clean+fetch if the user re-runs ./configure
  with a different setting.
* Print a more actionable error messsage if the user attempts to build with
  --config=cuda but did not configure TensorFlow to build with GPU support.
* Update the BUILD file in @local_config_cuda to use repository-local labels.

Fixes tensorflow#4105
martinwicke pushed a commit that referenced this issue Sep 21, 2016
…4285)

* Run bazel clean and bazel fetch in the configure script even when building
  without GPU support to force clean+fetch if the user re-runs ./configure
  with a different setting.
* Print a more actionable error messsage if the user attempts to build with
  --config=cuda but did not configure TensorFlow to build with GPU support.
* Update the BUILD file in @local_config_cuda to use repository-local labels.

Fixes #4105
@kamal94
Copy link

kamal94 commented Sep 24, 2016

@martinwicke A similar error now appears during the building process:

/home/kamal/.cache/bazel/_bazel_kamal/f9ae4eca457b390bb2ebe780caca64e0/external/protobuf/BUILD:333:1: Linking of rule '@protobuf//:protoc' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command 
  (cd /home/kamal/.cache/bazel/_bazel_kamal/f9ae4eca457b390bb2ebe780caca64e0/execroot/tensorflow && \
  exec env - \
  external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -o bazel-out/host/bin/external/protobuf/protoc bazel-out/host/bin/external/protobuf/_objs/protoc/external/protobuf/src/google/protobuf/compiler/main.o bazel-out/host/bin/external/protobuf/libprotoc_lib.a bazel-out/host/bin/external/protobuf/libprotobuf.a bazel-out/host/bin/external/protobuf/libprotobuf_lite.a -lpthread -lstdc++ -B/usr/bin/ -pie -Wl,-z,relro,-z,now -no-canonical-prefixes -pass-exit-codes '-Wl,--build-id=md5' '-Wl,--hash-style=gnu' -Wl,-S -Wl,--gc-sections): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
bazel-out/host/bin/external/protobuf/_objs/protoc/external/protobuf/src/google/protobuf/compiler/main.o: In function `main':
main.cc:(.text.startup.main+0x2ad): undefined reference to `vtable for google::protobuf::compiler::php::Generator'
main.cc:(.text.startup.main+0x5fc): undefined reference to `vtable for google::protobuf::compiler::php::Generator'
main.cc:(.text.startup.main+0x707): undefined reference to `vtable for google::protobuf::compiler::php::Generator'
collect2: error: ld returned 1 exit status
Target //tensorflow/cc:tutorials_example_trainer failed to build

I think this might be related to 4316aeb

@sskgit
Copy link

sskgit commented Oct 2, 2016

Hi,

Facing similar issues for tensorflow build.
I am trying Tensoflow from the source and receive build error for
a) C++ compilation of rule '@grpc//:gpr' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc

b) ERROR: I/O error while writing action log: No space left on device.

Enviornment:

Cuda 8.0
CuDNN 5
Ubuntu 16.04
bazel 0.31
Nvidia K80
Azure VM N6 (56 GB Memory)

Tried ./configure and build several times. Configure is successful but build fails.

Also tried, bazel clean, bazel clean --explunge and ran the build with reduced number of jobs
bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

but the error continues..

Looked at this thread and also

#190

Here is the full error message:

52adb8ea4f53b1b72067611e8a7eb020/external/grpc/BUILD:69:1: C++ compilation of rule '@grpc//:gpr' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 38 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
ERROR: I/O error while writing action log: No space left on device.
java.util.logging.ErrorManager: 2
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
at java.util.logging.FileHandler$MeteredStream.flush(FileHandler.java:196)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:297)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.util.logging.StreamHandler.flush(StreamHandler.java:259)
at java.util.logging.FileHandler.publish(FileHandler.java:683)
at java.util.logging.Logger.log(Logger.java:738)
at java.util.logging.Logger.doLog(Logger.java:765)
at java.util.logging.Logger.log(Logger.java:788)
at java.util.logging.Logger.info(Logger.java:1489)
at com.google.devtools.build.lib.profiler.AutoProfiler$LoggingElapsedTimeReceiver.accept(AutoProfiler.java:315)
at com.google.devtools.build.lib.profiler.AutoProfiler$SequencedElapsedTimeReceiver.accept(AutoProfiler.java:262)
at com.google.devtools.build.lib.profiler.AutoProfiler.completeAndGetElapsedTimeNanos(AutoProfiler.java:226)
at com.google.devtools.build.lib.buildtool.ExecutionTool.saveCaches(ExecutionTool.java:725)
at com.google.devtools.build.lib.buildtool.ExecutionTool.executeBuild(ExecutionTool.java:470)
at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:201)
at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:333)
at com.google.devtools.build.lib.runtime.commands.BuildCommand.exec(BuildCommand.java:69)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:488)
at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:324)
at com.google.devtools.build.lib.runtime.CommandExecutor.exec(CommandExecutor.java:49)
at com.google.devtools.build.lib.server.RPCService.executeRequest(RPCService.java:70)

@darrengarvey
Copy link
Contributor

darrengarvey commented Oct 2, 2016

@sskgit - This looks like b) no space left on device is the reason for a) compilation of a file fails. You'll notice the bazel-* symlinks in the tensorflow directory point to $HOME/.cache/bazel/..., so check you've got enough space on the partition you have $HOME mounted to. You'll need about 10GB, possibly more.

@sskgit
Copy link

sskgit commented Oct 3, 2016

@darrengarvey Thanks for your response.

I tried sudo df and here is the output

df
Filesystem 1K-blocks Used Available Use% Mounted on
udev 28837212 0 28837212 0% /dev
tmpfs 5771196 9244 5761952 1% /run
/dev/sda1 29711408 29329736 365288 99% /
tmpfs 28855968 0 28855968 0% /dev/shm
tmpfs 5120 0 5120 0% /run/lock
tmpfs 28855968 0 28855968 0% /sys/fs/cgroup
none 64 0 64 0% /etc/network/interfaces.dynamic.d
/dev/sdb1 356513788 135524 356378264 1% /mnt
tmpfs 5771196 0 5771196 0% /run/user/1000

It shows mounted on / has almost no space, and I think $VM/Username (i.e. $HOME) is on the same mount /, as there is no $HOME mounted on, in the output. Is this the right command?

At / (used space)

du -sch
2.3G .
2.3G total

At $HOME (used space)

du -sch
4.4G .
4.4G total

So, I am not sure what is occupying the remaining space? Total storage on this VM is 380 GB.

Is there any way to get rid of bazel logs? Is it causing the space issue? If so, where?
Also, if it is how do I free up some space?

Thanks

@davidzchen
Copy link
Contributor

@sskgit Running bazel clean --expunge should remove all of the generated files for the workspace. How much free space do you have after running that command?

@martinwicke Sorry for the late reply. I have been on call for a good part of the past week. That looks like a linker error in protobuf and does not seem related to this particular change. crosstool_wrapper_driver_is_not_gcc is a wrapper script that calls gcc. Is this on a newer version of protobuf?

@sskgit
Copy link

sskgit commented Oct 3, 2016

@davidzchen Thanks for your response.

used bazel clean --expunge (this cleaned 500MB space) and re-configured tensorflow using ./configure

Run the bazel build again..

bazel build -c opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

ERROR: /$HOME/Downloads/tensorflow/tensorflow/core/kernels/BUILD:1710:1: error while parsing .d file: /$HOME/.cache/bazel/_bazel_gpuadmin/52adb8ea4f53b1b72067611e8a7eb020/execroot/tensorflow/bazel-out/local_linux-opt/bin/tensorflow/core/kernels/_objs/depth_space_ops_gpu/tensorflow/core/kernels/depthtospace_op_gpu.cu.pic.d (No such file or directory).
nvcc warning : option '--relaxed-constexpr' has been deprecated and replaced by option '--expt-relaxed-constexpr'.
: fatal error: when writing output to : No space left on device
compilation terminated.
Target //tensorflow/tools/pip_package:build_pip_package failed to build

At $HOME:
du -sch
4.6G .
4.6G total

At /:
du -sch
2.3G .
2.3G total

df -h
Filesystem Size Used Avail Use% Mounted on
udev 28G 0 28G 0% /dev
tmpfs 5.6G 9.1M 5.5G 1% /run
/dev/sda1 29G 29G 9.8M 100% /
tmpfs 28G 0 28G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 28G 0 28G 0% /sys/fs/cgroup
none 64K 0 64K 0% /etc/network/interfaces.dynamic.d
/dev/sdb1 340G 133M 340G 1% /mnt
tmpfs 5.6G 0 5.6G 0% /run/user/1000

This is a fairly new machine with few installer software (<200MB)

@davidzchen
Copy link
Contributor

davidzchen commented Oct 3, 2016

@sskgit Interesting. Can you open a bug at https://github.com/bazelbuild/bazel for this issue? Thanks.

@sskgit
Copy link

sskgit commented Oct 3, 2016

@davidzchen Opened an issue with bazel as well. I have tried the build almost 10 times in last couple of days. Still trying to figure why the build fails and how do I complete the build successfully.

npanpaliya pushed a commit to ibmsoe/tensorflow that referenced this issue Oct 12, 2016
…ensorflow#4285)

* Run bazel clean and bazel fetch in the configure script even when building
  without GPU support to force clean+fetch if the user re-runs ./configure
  with a different setting.
* Print a more actionable error messsage if the user attempts to build with
  --config=cuda but did not configure TensorFlow to build with GPU support.
* Update the BUILD file in @local_config_cuda to use repository-local labels.

Fixes tensorflow#4105
@sskgit
Copy link

sskgit commented Oct 14, 2016

Space issue caused my Tensorflow build to fail. Clearing some space on the / mount made the build successful and Tensorflow now works as expected.

Thanks everyone for your help!

@bitnom
Copy link

bitnom commented Jul 29, 2017

This is happening for me too. I already have TF installed and working for GPU via the runfile but I wanted to compile it for optimizations. I get:

ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 1039
                _create_local_cuda_repository(repository_ctx)
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 976, in _create_local_cuda_repository
                _host_compiler_includes(repository_ctx, cc)
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 145, in _host_compiler_includes
                get_cxx_inc_directories(repository_ctx, cc)
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 120, in get_cxx_inc_directories
                set(includes_cpp)
depsets cannot contain mutable items
WARNING: Target pattern parsing failed.
ERROR: no such package '@local_config_cuda//crosstool': Traceback (most recent call last):
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 1039
                _create_local_cuda_repository(repository_ctx)
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 976, in _create_local_cuda_repository
                _host_compiler_includes(repository_ctx, cc)
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 145, in _host_compiler_includes
                get_cxx_inc_directories(repository_ctx, cc)
        File "/home/user/bin/tensorflow/third_party/gpus/cuda_configure.bzl", line 120, in get_cxx_inc_directories
                set(includes_cpp)
depsets cannot contain mutable items
INFO: Elapsed time: 4.869s
FAILED: Build did NOT complete successfully (3 packages loaded)
    currently loading: tensorflow/tools/pip_package

Steps to reproduce:

git clone --recurse-submodules https://github.com/tensorflow/tensorflow
cd tensorflow
$ ./configure
WARNING: Running Bazel server needs to be killed, because the startup options are different.
Please specify the location of python. [Default is /home/user/.pyenv/versions/3.6.2/bin/python]: 
Found possible Python library paths:
/home/user/.pyenv/versions/3.6.2/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is /home/user/.pyenv/versions/3.6.2/lib/python3.6/site-packages
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [y/N]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: y
XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL support? [y/N]: n
No OpenCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-8.0
"Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-8.0]:/usr/lib/x86_64-linux-gnu 
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished
$ bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package

If you notice that I've done something wrong, please let me know. I saw someone before mention something about LD_LIBRARY_PATH which I have set to /usr/local/cuda-8.0/lib64. I figured it was fine since as I said, I already have TF installed and running from the runfile download. I would like to be able to compile from source though.

@itssujeeth
Copy link

I’m also facing the same challenge - unable to build tensor flow on a gpu server. Below given are the details. OS is Ubuntu 16.04LTS

user@gpu-devbox:~/Workouts/tensorflow$ python --version
Python 2.7.12
user@gpu-devbox:~/Workouts/tensorflow$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1-16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
user@gpu-devbox:~/Workouts/tensorflow$ bazel version
Build label: 0.5.3
Build target: bazel-out/local-fastbuild/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Jul 28 08:34:59 2017 (1501230899)
Build timestamp: 1501230899
Build timestamp as int: 1501230899
user@gpu-devbox:~/Workouts/tensorflow$ ./configure 
WARNING: Running Bazel server needs to be killed, because the startup options are different.
Please specify the location of python. [Default is /usr/bin/python]: 
Found possible Python library paths:
/usr/local/lib/python2.7/dist-packages
/usr/lib/python2.7/dist-packages
Please input the desired Python library path to use.  Default is /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: 
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]: 
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [y/N]: 
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: 
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: 
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL support? [y/N]: 
No OpenCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: Y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 
Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: 
"Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 
Please specify the location where cuDNN 6 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1,6.1,6.1,6.1]6.1
Do you want to use clang as CUDA compiler? [y/N]: 
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 
Do you wish to build TensorFlow with MPI support? [y/N]: 
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 
Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished
user@gpu-devbox:~/Workouts/tensorflow$ echo $CUDA_HOME
/usr/local/cuda-8.0

user@gpu-devbox:~/Workouts/tensorflow$ echo $LD_LIBRARY_PATH
/usr/local/cuda-8.0/lib64
user@gpu-devbox:~/Workouts/tensorflow$ bazel build --config=opt --config=cuda --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0" ./tensorflow/tools/pip_package:build_pip_package 
.......
ERROR: no such package '@local_config_cuda//crosstool': Traceback (most recent call last):
    File "/home/u19061/Workouts/tensorflow/third_party/gpus/cuda_configure.bzl", line 1039
        _create_local_cuda_repository(repository_ctx)
    File "/home/user/Workouts/tensorflow/third_party/gpus/cuda_configure.bzl", line 976, in _create_local_cuda_repository
        _host_compiler_includes(repository_ctx, cc)
    File "/home/user/Workouts/tensorflow/third_party/gpus/cuda_configure.bzl", line 145, in _host_compiler_includes
        get_cxx_inc_directories(repository_ctx, cc)
    File "/home/user/Workouts/tensorflow/third_party/gpus/cuda_configure.bzl", line 120, in get_cxx_inc_directories
        set(includes_cpp)
depsets cannot contain mutable items
INFO: Elapsed time: 5.488s
FAILED: Build did NOT complete successfully (3 packages loaded)

@itssujeeth
Copy link

itssujeeth commented Aug 1, 2017

Also -when checked the cache doesn’t contain crosstool - Am I missing something here?

user@devbox:~/Workouts/tensorflow$ ls -l $(bazel info output_base)/external/local_config_cuda/crosstool
total 4
-rwxrwxr-x 1 user user 1267 Aug  1 16:32 BUILD

@sh1r0
Copy link

sh1r0 commented Aug 3, 2017

@itssujeeth #11949 fixes the issue when building tensorflow with gpu support using bazel 0.5.3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests