New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed to build from source due to missing libcudart.so.7.5: #2053
Comments
Fun fact: Symlinking all Cuda shared libraries to the custom GCC library directory also doesn't work. |
I have been having this issue, too. For now I resorted to using the nightly builds, which work just fine, and hope that I won't have to touch the C++ portions of tensorflow or contribute a patch that requires rebuilding... |
@black-puppydog Thanks for your information. Another fun fact: I'm out of ideas over here. I will go back to the CPU-based build, which seems to work and come back when some dev has shed some light on this issue. |
gcc 4.9.2 is what I currently use and it has worked for me. @keveman in case he has any ideas, or maybe we have to rope in bazel devs. |
@vrv Thanks for the reply. And you are able to build the GPU-version of TensorFlow from source? |
I also just tried from another machine and it still works for me (just synced to HEAD today). On this machine, I'm using ubuntu 14.04, gcc 4.8.4 provided by distro. If you manually run bazel-out/host/bin/tensorflow/cc/ops/random_ops_gen_cc, do you still see the failure? Do you see a bazel-out/host/bin/tensorflow/cc/ops/random_ops_gen_cc.runfiles/third_party/gpus/cuda/lib64/libcudart.so symlink? |
Btw the reason we don't use gcc 5+ is that nvcc currently isn't compatible with it, so we're all stuck on 4.8 or 4.9 :( |
No, then the library can be found.
Not
I understand. Did you also have to patch your CROSSTOOL file?
Thanks. Does this mean you are using CUDA 7.0, not 7.5? |
Hmm, this is probably a bazel-related problem, since the binary itself does seem to have the right linkages. @damienmg for some help, if he has ideas. (No, it works for me at 7.5 too, I did not have to patch my CROSSTOOL file). To be completely honest, I'm not sure why some configurations work and others don't. |
It is running with sandboxing so cannot find the sjared library |
Sorry shared. I should not use my phone configured in French ever again... Try to add |
Thanks a lot! Adding For the curious, the full command line is now: I will now try to minimize all the changes that I did to get it to run on my system, and then write them up in a comprehensible way. As for this bug: compiling in standalone mode is not very obvious, so at the very least I think it should go in the documentation. I do believe that some fixing is required (either for bazel or for the tensorflow build rules), but you can close this issue at your discretion. |
Actually tensorflow's configure should add that flag, what is in tools/bazel.rc for you? |
This is my
Is it possible that the |
/cc @aehlig who knows exactly how this file is parsed. Yes it is definitely possible but should not |
My understanding is that Note, however, that on the command-line you specified |
Hi, this is for the people coming from Google, trying to get TensorFlow to compile on their machines: How to compile TensorFlow from source on Fedora 23 with a custom compiler.Compiling TensorFlow with GPU support is possible, but a bit tricky on Fedora 23 and up. Compiling GCCFor CUDA version 7.5, you need to obtain the source code for GCC version 4.9. You can obtain it from here. Next, you need to install GCC compile-time dependencies:
Now you have to configure the GCC build. For details, check out the GCC configuration page. I suggest installing into a custom prefix, such as
When this step is done, you can compile GCC with the following command:
This assumes you want to use 4 processing cores. You can use more or less, or omit the -j option entirely. Finally, run as root:
Compiling bazelObtain the bazel source code. You need the current master branch, NOT any of the recent releases.
To compile bazel, you need to specify
This will produce the bazel binary in Compiling TensorFlowObtain the TensorFlow source code
Modify the file Replace the following lines: cxx_builtin_include_directory: "/usr/lib/gcc/" with the following lines: Next, run the To compile the source, use the following command line:
Explanations:
I sincerely hope that this guide will be obsolete very soon, and you can just get cracking without all these workarounds. But for now, this will probably be useful. |
Here's the output with
Here's the output with both
I really don't know how |
As discussed offline with @aehlig, --spawn_strategy and --genrule_strategy On Tue, Apr 26, 2016 at 11:35 PM Alexander Korsunsky <
|
--genrule_strategy=standalone also helped me build on Fedora 23. Now I can run the example trainer. |
Hi, @itsmeolivia , is there any particular reason why you closed this issue? Because I just checked again with the latest head (5681406), and I still have the same error as described in the original message. I believe that compilation should succeed without any magical options that are not described in the tutorial. And my setup really isn't that exotic ;) |
@akors Thanks a lot for pointing this out - it took me quite some time until I found this thread... |
…upstream-sync-230410 Develop upstream sync 230410
Hi! I tried to compile the tutorials_example_trainer file, and I have quite a journey behind me. I recompiled GCC several times, I recompiled bazel dozens of times and did a fair share of CROSSTOOLS editing.
At this point, I am stuck. The compliation fails with the message:
bazel-out/host/bin/tensorflow/cc/ops/random_ops_gen_cc: error while loading shared libraries: libcudart.so.7.5: cannot open shared object file: No such file or directory Target //tensorflow/cc:tutorials_example_trainer failed to build
I am using Tensorflow HEAD (currently 7b536cd), I have CUDA 7.5 installed in /usr/local/cuda-7.5 .
My LD_LIBRARY_PATH is set to :/usr/local/cuda/lib64:/usr/local/cuda/lib64 , and the files exist there.
I tried to point bazel to the library directory by adding
+ linker_flag: "-L/usr/local/cuda/lib64"
.
Environment info
Operating System: Fedora 23
Installed version of CUDA and cuDNN: 7.5 and 4.0.7
If installed from sources, provide the commit hash: 7b536cd
Steps to reproduce
What have you tried?
linker_flag: "-L/usr/local/cuda/lib64"
tothird_party/gpus/crosstool/CROSSTOOL
linker_flag: "-Wl,-R/usr/local/cuda/lib64"
tothird_party/gpus/crosstool/CROSSTOOL
linker_flag: "-Wl,-R/usr/local/cuda/lib64"
intools/cpp/CROSSTOOL
/etc/ld.so.conf.d/LOCAL_cuda-lib64.conf
with the contents/usr/local/cuda/lib64
and ranldconfig
Logs or other output that would be helpful
Here's the full output of the last operation:
ps.: Out of curiosity, what are you TensorFlow devs using for a development machine? Has any of you actually tried using a modern Linux distribution that comes with GCC newer than 4.9 for compilation? You really should. It's quite the experience.
The text was updated successfully, but these errors were encountered: