Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compatibility with Cuda 10.1? #26289

Closed
jiapei100 opened this issue Mar 3, 2019 · 87 comments
Closed

Compatibility with Cuda 10.1? #26289

jiapei100 opened this issue Mar 3, 2019 · 87 comments
Assignees
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues

Comments

@jiapei100
Copy link

jiapei100 commented Mar 3, 2019

Just wanna compile tensorflow 1.12 against Cuda 10.1 ?

➜  tensorflow git:(master) bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
DEBUG: Rule 'build_bazel_rules_swift' modified arguments {"commit": "001736d056d7eae20f1f4da41bc9e6f036857296", "shallow_since": "1547844730 -0800"} and dropped ["tag"]
DEBUG: ~/.cache/bazel/_bazel_user/15086820fc7a6f1383d8c38c62220208/external/build_bazel_rules_apple/apple/repositories.bzl:35:5: 
WARNING: `build_bazel_rules_apple` depends on `bazel_skylib` loaded from https://github.com/bazelbuild/bazel-skylib.git (tag 0.6.0), but we have detected it already loaded into your workspace from None (tag None). You may run into compatibility issues. To silence this warning, pass `ignore_version_differences = True` to `apple_rules_dependencies()`.

ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': in ....../tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "....../third_party/gpus/cuda_configure.bzl", line 1501
                _create_local_cuda_repository(repository_ctx)
        File "....../third_party/gpus/cuda_configure.bzl", line 1266, in _create_local_cuda_repository
                _find_libs(repository_ctx, cuda_config)
        File "....../third_party/gpus/cuda_configure.bzl", line 859, in _find_libs
                _find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
        File "....../third_party/gpus/cuda_configure.bzl", line 773, in _find_cuda_lib
                find_lib(repository_ctx, [("%s/%s%s" % (bas...], ...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 750, in find_lib
                auto_configure_fail(("No library found under: " + ",...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 341, in auto_configure_fail
                fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: No library found under: /usr/local/cuda-10.1/lib64/libcublas.so.10.1, /usr/local/cuda-10.1/lib64/stubs/libcublas.so.10.1, /usr/local/cuda-10.1/lib/powerpc64le-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x86_64-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x64/libcublas.so.10.1, /usr/local/cuda-10.1/lib/libcublas.so.10.1, /usr/local/cuda-10.1/libcublas.so.10.1
WARNING: Target pattern parsing failed.
ERROR: error loading package 'tensorflow/tools/pip_package': in ....../tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
        File "....../third_party/gpus/cuda_configure.bzl", line 1501
                _create_local_cuda_repository(repository_ctx)
        File "....../third_party/gpus/cuda_configure.bzl", line 1266, in _create_local_cuda_repository
                _find_libs(repository_ctx, cuda_config)
        File "....../third_party/gpus/cuda_configure.bzl", line 859, in _find_libs
                _find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
        File "....../third_party/gpus/cuda_configure.bzl", line 773, in _find_cuda_lib
                find_lib(repository_ctx, [("%s/%s%s" % (bas...], ...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 750, in find_lib
                auto_configure_fail(("No library found under: " + ",...)))
        File "....../third_party/gpus/cuda_configure.bzl", line 341, in auto_configure_fail
                fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: No library found under: /usr/local/cuda-10.1/lib64/libcublas.so.10.1, /usr/local/cuda-10.1/lib64/stubs/libcublas.so.10.1, /usr/local/cuda-10.1/lib/powerpc64le-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x86_64-linux-gnu/libcublas.so.10.1, /usr/local/cuda-10.1/lib/x64/libcublas.so.10.1, /usr/local/cuda-10.1/lib/libcublas.so.10.1, /usr/local/cuda-10.1/libcublas.so.10.1
INFO: Elapsed time: 10.828s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded)
    currently loading: tensorflow/tools/pip_package
    Fetching @local_config_cuda; fetching

Any suggestions?

@jvishnuvardhan
Copy link
Contributor

@jiapei100 For Faster resolutions, Please provide the details requested at issue template here.

Please check tested build configurations here. Tensorflow 1.12 was built with CUDA9.0. So when you install some of the paths in different modules are looking for CUDA9.0. If you want to use CUDA10 or CUDA 10.1, use latest version of TF and check whether CUDA and cuDNN are referencing correct paths. First, uninstall python and tensorflow and reinstall following the instructions here but use new version of CUDA and cuDNN. Please let me know how it progresses. Thanks!

@jvishnuvardhan jvishnuvardhan added type:build/install Build and install issues stat:awaiting response Status - Awaiting response from author labels Mar 3, 2019
@wjcskqygj2015
Copy link

wjcskqygj2015 commented Mar 5, 2019

Since based on cublas issues and cublas new features. the cublas is in the seperate directory and therefore you may could use soft link to temporarily fix it.

@devillove084
Copy link

I meet the same problem about cublas. Path /usr/local/cuda/lib64, does not include cublas library.

@devillove084
Copy link

So, how to modify the code and complie the path for CUDA 10.1.

@beew
Copy link

beew commented Mar 17, 2019

@wjcskqygj2015

Symlinking doesn't work, try build tensorflow master

ERROR: Skipping '//tensorflow/tools/pip_package:build_pip_package': error loading package 'tensorflow/tools/pip_package': in /home/bernard/opt/cuda_test/tensorflow/tensorflow/tensorflow.bzl: Encountered error while reading extension file 'cuda/build_defs.bzl': no such package '@local_config_cuda//cuda': Traceback (most recent call last):
	File "/home/bernard/opt/cuda_test/tensorflow/third_party/gpus/cuda_configure.bzl", line 1502
		_create_local_cuda_repository(repository_ctx)
	File "/home/bernard/opt/cuda_test/tensorflow/third_party/gpus/cuda_configure.bzl", line 1266, in _create_local_cuda_repository
		_find_libs(repository_ctx, cuda_config)
	File "/home/bernard/opt/cuda_test/tensorflow/third_party/gpus/cuda_configure.bzl", line 859, in _find_libs
		_find_cuda_lib("cublas", repository_ctx, cpu_value, c..., ...)
	File "/home/bernard/opt/cuda_test/tensorflow/third_party/gpus/cuda_configure.bzl", line 773, in _find_cuda_lib
		find_lib(repository_ctx, [("%s/%s%s" % (bas...], ...)))
	File "/home/bernard/opt/cuda_test/tensorflow/third_party/gpus/cuda_configure.bzl", line 747, in find_lib
		auto_configure_fail(("None of the libraries match th...)))
	File "/home/bernard/opt/cuda_test/tensorflow/third_party/gpus/cuda_configure.bzl", line 341, in auto_configure_fail
		fail(("\n%sCuda Configuration Error:%...)))

Cuda Configuration Error: None of the libraries match their SONAME: /home/bernard/opt/cuda_test/cuda/lib64/libcublas.so.10.1

Besides, all libcu***.so.10.1 are missing, not just libcublas, it is a pain to have to create symlink for all these even if symlinking works.

@madeye-matt
Copy link

madeye-matt commented Mar 18, 2019

I have the same problem using tensorflow 1.13.1 and CUDA 10.1. Use of symlinks 'worked' for me in so far as there were no path errors - however attempting to run the tensorflow sanity check results in a core dump, presumably as there are genuine incompatabilities between those versions of tensorflow and CUDA

(venv) ➜  tensorflow python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
python: Relink `/lib64/libmount.so.1' with `/lib64/librt.so.1' for IFUNC symbol `clock_gettime'
python: Relink `/lib64/libsystemd.so.0' with `/lib64/librt.so.1' for IFUNC symbol `clock_gettime'
[1]    24472 segmentation fault (core dumped)  python -c 

EDIT: the following link commands fixed the path errors

sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcublas.so.10.1.0.105 /usr/lib64/libcublas.so.10.0
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcusolver.so.10.1.105 /usr/lib64/libcusolver.so.10.0 
sudo ln -s /usr/local/cuda-10.1/targets/x86_64-linux/lib/libcudart.so.10.1.105 /usr/lib64/libcudart.so.10.0 

@jvishnuvardhan jvishnuvardhan added stat:awaiting tensorflower Status - Awaiting response from tensorflower subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues and removed stat:awaiting response Status - Awaiting response from author labels Mar 20, 2019
@wjcskqygj2015
Copy link

@beew
It seems mainly because the SONAME is incompatible for some library in cuda
I just modify the third_party/gpus/cuda_configure.bzl at line 871, 878, 885, 892, 907
replace cuda_config.cuda_version, with "10",
then using symlinks it works, I hope it will help you

@plopresti
Copy link
Contributor

If you are heading down this particular rabbit hole, you will find #26155 relevant.

@prassanna-ravishankar
Copy link

Interestingly, got it working on windows, with an unsupported compiler - #28086

@kgizdov
Copy link

kgizdov commented Apr 23, 2019

This is due to conflicting behaviour at configure/build time and run time. Have a look at the patch in #28093 for a temporary workaround.

@gunan
Copy link
Contributor

gunan commented May 11, 2019

@chsigg @annarev has done a lot on this. I think we are compatible now.

@tensorflowbutler tensorflowbutler removed the stat:awaiting tensorflower Status - Awaiting response from tensorflower label May 11, 2019
@lfaino
Copy link

lfaino commented May 13, 2019

Dear
i`m trying to install tensorflow using an Nvidia 1070, CUDA 10.1, cudnn7.5.1.10 on Ubuntu 18.04.
the nvidia drivers are NVIDIA-SMI 418.40.04 Driver Version: 418.40.04 CUDA Version: 10.1 (this comes from nvidia-smi).

I install tensorflow-gpu using pip3.

the error that i get when i run:
python3 -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
is the following

File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/usr/lib/python3.6/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/usr/lib/python3.6/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions. Include the entire stack trace
above this error message when asking for help.

can you please help me?

Cheers
Luigi

@chsigg
Copy link
Contributor

chsigg commented May 13, 2019

If you want to use TensorFlow with CUDA 10.1, you currently need to build it from source. The binaries we ship are built for CUDA 10.0.

@ymodak
Copy link
Contributor

ymodak commented May 15, 2019

If you want to use TensorFlow with CUDA 10.1, you currently need to build it from source. The binaries we ship are built for CUDA 10.0.

Closing this issue since chsigg's explanation addresses the issue. Feel free to reopen if have any further questions. Thanks!

@ymodak ymodak closed this as completed May 15, 2019
@leimao
Copy link

leimao commented Jun 11, 2019

chsigg's suggestion is not helpful. Most of us are building from source.

@leimao
Copy link

leimao commented Jun 11, 2019

Please re-open this issue.

@ymodak ymodak reopened this Jun 12, 2019
@leimao
Copy link

leimao commented Jun 12, 2019

tensorflow/lite/delegates/nnapi/nnapi_delegate.cc: In static member function 'static void tflite::StatefulNnApiDelegate::DoFreeBufferHandle(TfLiteContext*, TfLiteDelegate*, TfLiteBufferHandle*)':
tensorflow/lite/delegates/nnapi/nnapi_delegate.cc:2136:31: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
   if (*handle >= 0 && *handle < delegate_data->tensor_memory_map.size()) {
                       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ERROR: /home/leimao/.cache/bazel/_bazel_leimao/b5222491e5f5e6954d481a492cdf3412/external/nccl_archive/BUILD.bazel:54:1: undeclared inclusion(s) in rule '@nccl_archive//:device_lib':
this rule is missing dependency declarations for the following files included by 'external/nccl_archive/src/collectives/device/functions.cu.cc':
  'external/nccl_archive/src/collectives/device/common.h'
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 101.561s, Critical Path: 19.52s
INFO: 513 processes: 513 local.
FAILED: Build did NOT complete successfully

@chsigg
Copy link
Contributor

chsigg commented Jun 12, 2019 via email

@AngledLuffa
Copy link

AngledLuffa commented Oct 28, 2019 via email

@sytelus
Copy link

sytelus commented Oct 28, 2019

I would also like to point out (in addition to everything that is already said) that PyTorch officially only displays CUDA 10.1 support. For someone like me who often end up having TF and PyTorch in same environment, this makes it very difficult to have them install side by side. Fortunately PyTorch builds have been superbly flexible with CUDA and I eventually figured out that merely including cudatoolkit=10.0 in conda install makes it work with CUDA 10.0 as well! I hope something like this might be possible for TF.

@JeffreyWardman
Copy link

In addition to what @sytelus said, the exact statement is:
conda install pytorch torchvision cudatoolkit=10.0 python==X -c pytorch
where X is your current python version if you don't want to upgrade it.

@AngledLuffa
Copy link

AngledLuffa commented Nov 7, 2019 via email

@chsigg
Copy link
Contributor

chsigg commented Nov 7, 2019

We are working on updating TF builds to CUDA 10.1.

@dexception
Copy link

Release a timeline. So that we can adjust accordingly. Simply staying mum for 6 months is unacceptable. We also have jobs.

@AndreMaz
Copy link

Looking at #34327 seems like TF team already solved the compatibility issue. Hopefully the 2.1 release happens soon

@phoerious
Copy link

As an intermediary workaround, I had no issues symlinking the affected libraries to *.so.10.0.

@dexception
Copy link

Very unprofessional attitude towards managing a software.

@chsigg
Copy link
Contributor

chsigg commented Nov 21, 2019

Nightly release builds and r2.1 release are built against CUDA 10.1. I'm going to close this issue.

@dexception, I agree I could have handled this better. Sorry.

@chsigg chsigg closed this as completed Nov 21, 2019
@tensorflow-bot
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@rjurney
Copy link

rjurney commented Nov 21, 2019

2.1? Try 2.0.1. 2.1 will be forever, right?

@chsigg
Copy link
Contributor

chsigg commented Nov 21, 2019

TF 2.0 release will probably stay on CUDA 10.0. You can build TF with CUDA 10.1 from any branch, or use nightly releases before TF 2.1 is released.

@megazone87
Copy link

megazone87 commented Dec 6, 2019

You can build TF with CUDA 10.1 from any branch

Are you sure about that? But I build from source on branch: r1.13 with the cuda 10.1, and obviously it does not work.
Errors:

  • library cannot find: libcublas.so.10.1 etc. Solved through syslink.
  • cannot find cublas_v2.h. Solved through syslink.
  • tensorflow/tensorflow/compiler/xla/service/gpu/BUILD:661:1: C++ compilatio n of rule '//tensorflow/compiler/xla/service/gpu:gpu_compiler' failed I have no clue for this...

Could you help me, to verify tf-1.* work with cuda10.1?


To be precise, I compiled with XLA=on (tensorRT=off). If turn it off, everything just fine. Thus i thought the above XLA related problem is not big and solvable. Please check it.

@chsigg
Copy link
Contributor

chsigg commented Dec 6, 2019

TF r1.13 does not build with CUDA 10.1. Please use r1.15 or r2.x.

@deimsdeutsch
Copy link

9 months and Tensorflow is still catching up. Oh boy ! Is this library too complicated to work with ?

@dishkakrauch
Copy link

Version 2.1 uploaded yesterday.
It supports cuda 10.1, enjoy)

@SalahAdDin
Copy link

@dishkakrauch What's about CUDA 10,2?

@mihaimaruseac
Copy link
Collaborator

You can always compile from source to get compatibility with other CUDA versions.

We only provide a subset of the infinite amount of combinations that you can build with (python version, operating system version, cuda version, compiler version, libraries/dependencies versions, etc.) For all other cases all that we can do is provide instructions on how to build from source and let community do the builds.

@SalahAdDin
Copy link

Just downgraded to Cuda 10,1, thanks.

@uranusx86
Copy link

I have made a script for install tf 1.12 with CUDA 10.1 in Ubuntu.
check my repo https://github.com/uranusx86/Tensorflow1.12-CUDA10.1-Build

@youweiliang
Copy link

youweiliang commented Sep 3, 2020

For folks who want to use tensorflow 1.xx with CUDA 10.1, I find conda has the correct compilation of tensorflow against CUDA 10.1. Just install tensorflow 1.xx with conda within your conda environment. E.g., I install with the following command
conda install tensorflow=1.14

@woodcockjosh
Copy link

I "fixed" by changing this line to always return false

def _should_check_soname(version, static):
      return False

def _should_check_soname(version, static):
return version and not static

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests