-
Notifications
You must be signed in to change notification settings - Fork 74k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Undeclared inclusions and missing dependencies in building #2109
Comments
following the build from source instructions I also received the next error when trying to build:
|
Happens to me to when I compile from master for bazel and tensorflow. I am guessing some inconsistency between the latest bazel and latest tensorflow? Update: correcting tensorflow/tensorflow/tensorflow.bzl line 15, to parse correctly for the version number, fixes the build for me. @Dapid: I think you are experiencing some misalignment between bazel and tensorflow on top of this bug. More update: @Dapid: you are right, there is still something wrong with the version when you fix it... I think the only hack that will work is to set the version manually, to the one that will work :) It does compile when I put "2" for every number, in line 15: int(number) => 2 err, it was compiling I think, but now it breaks with: ERROR: error loading package 'external': Package 'external' contains errors. not sure what is going on. Update: ok, I think I know what fixes it. You need to clone the repository from scratch (force) before you attempt to compile with the line 15 hack, like this: number = "".join([c for c in number if c.isdigit()]) |
Hi. This is my story of how I got TensorFlow to compile on my Fedora 23 machine. Maybe it will be useful for you: How to compile TensorFlow from source on Fedora 23 with a custom compiler.Compiling TensorFlow with GPU support is possible, but a bit tricky on Fedora 23 and up. Compiling GCCFor CUDA version 7.5, you need to obtain the source code for GCC version 4.9. You can obtain it from here. Next, you need to install GCC compile-time dependencies:
Now you have to configure the GCC build. For details, check out the GCC configuration page. I suggest installing into a custom prefix, such as
When this step is done, you can compile GCC with the following command:
This assumes you want to use 4 processing cores. You can use more or less, or omit the -j option entirely. Finally, run as root:
Compiling bazelObtain the bazel source code. You need the current master branch, NOT any of the recent releases.
To compile bazel, you need to specify
This will produce the bazel binary in Compiling TensorFlowObtain the TensorFlow source code
Modify the file Replace the following lines: cxx_builtin_include_directory: "/usr/lib/gcc/" with the following lines: Next, run the To compile the source, use the following command line:
Explanations:
I sincerely hope that this guide will be obsolete very soon, and you can just get cracking without all these workarounds. But for now, this will probably be useful. |
It would be great to get these instructions into the documentation somewhere. @martinwicke : is there a place for build help specific to linux distributions? |
Honestly, I think a better approach would be actually fixing the bugs in the TensorFlow build system and bazel itself, for example
This guide was born out of my desparation, but new users probably shouldn't jump through all the hoops. |
@akors: I see this issue happening not on fedora but ubuntu as well. Just a plain development setup from sources cloning master bazel and tensorflow |
For the four first issues:
Explanations extended: build: what bazel should do
|
@akors I followed your instructions, but unfortunately I keep getting the same failures. No idea what the difference between your system and mine may be. I'll try again, maybe some bugs have been squash in Bazel in the meantime. |
@Dapid did you use Bazel HEAD and Tensorflow HEAD? And you are still getting "missing dependency declaration" errors? |
@fat-lobyte Yes. I just retried with latest master for both TF and Bazel:
|
@Dapid weird. If you have modified your third_party/gpus/crosstool/CROSSTOOL file to point to your private install, then I am out of ideas. What happens if you add another line
? |
@akors well spotted. This is how the relevant section of CROSTOOL looks like, and the build finishes:
|
stuck me a lot of time until I reach this post, may be we should rewrite CROSTOOL to make gcc path not static |
I've got a solution and it performed very well.
The idea is simple: using package installation to detect dependency problem, fix the problem and follow the usual guide. |
Yes, it would be good to file separate bugs for the individual issues, since catch all bugs like these tend to get filled with lots of hard to parse info. |
@girving OK, so I redid all the testing with the most recent HEADs of TensorFlow and bazel. I will try to summarize comprehensively: Custom GCC, Bazel 0.2.3 (Release), TensorFlow head (b3621c9): compilation fails with the undeclared inclusion messages above. Custom GCC, Bazel head (e7e2301) compiled with CC and CXX set to the custom GCC, TensorFlow head (b3621c9): compilation fails with the undeclared inclusion messages above. Custom GCC, Bazel head (e7e2301) compiled with CC and CXX set to the custom GCC, TensorFlow head (b3621c9) with CROSSTOOL Custom GCC, Bazel 0.2.3 (Release), TensorFlow head (b3621c9) with CROSSTOOL That's what's up. I believe the first two methods should also work, but you can divide this issue up as you wish. ps.: Might I quote from the CROSSTOOL file, right above the
I would agree to that. This will need to be fixed ;) |
@martinwicke @damienmg Can you advise on how to split this bug across Bazel and Github? |
@girving: the TODO is copied from the legacy CROSSTOOL of Bazel. In beetween we did C++ autoconfiguration. If I understand correctly there is only one issue: cxx_builtin_include_directory? In which case it should be covered by the work that @davidzchen has started on auto-detecting nvcc config. |
This is the tracking bug for cuda autoconf: #2873 |
follow @akors guidance, I finally succeed on centos 6.5, with cuda7.5 and cudnn v5.0.5. I set compute capability to 3.0 though my gpu is actually 2.0 (Quadro 4000). cuda pathin ~/.bash_profile
build gccfollow this https://gcc.gnu.org/wiki/InstallingGCC, I use gcc4.9.2 build tensorflowif you get error like genrule missing dependency of a list of head files, just add these head files paths(like /path/to/include) to the about gpuI always failed when I set compute capability to 2.0 when configure tensorflow build, error is like |
@davidzchen is this still being worked on? Or is the procedure by @suiyuan2009 a general fix? |
@alextp Apologies for the wait. I just returned from vacation. We are ironing out the remaining CI build failure for the PR for CUDA autodetection (#3269). I am hoping to have the PR submitted soon, but feel free to use @suiyuan2009's fix as a workaround in the meantime. |
@alextp The cuda autoconf change has been merged. Can you try building with the latest tensorflow HEAD and see if you still run into any issues? |
@davidzchen
Where This means, I still have to hack the CROSSTOOL.tpl file to get TF to compile. I believe it's very important to include those directories (depending on the nvcc host compiler configuration choice) in the configuration as well. |
@akors That's a good point. It should not be a difficult fix, and we should be able to reuse |
Clsoing due to inactivity. |
The following commands were run on a Fedora 23 64 bits on Tensorflow master, but I get similar results on the 0.8 branch.
Using Bazel 0.2.1 on a laptop without graphics card:
On a machine with Cuda 7.5 and cudnn 5 installed, GCC 4.9 built locally (but I get the same results with system's GCC 5.3.1):
With bazel 0.1.1:
With Bazel 0.2.1:
With Bazel 0.1.5:
Bazel 2.2b fails when tensorflow tries to parse the version:
Correcting that mistake I get a bulld error again:
How can I get it to work? I am interested in building Tensorflow with vector instructions for my laptop and linking against the latest cudnn for my workstation.
The text was updated successfully, but these errors were encountered: