Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will TensorFlow 2.2.0 support CUDA 10.2? #38194

Closed
Farxial opened this issue Apr 3, 2020 · 21 comments
Closed

Will TensorFlow 2.2.0 support CUDA 10.2? #38194

Farxial opened this issue Apr 3, 2020 · 21 comments
Assignees
Labels
stat:awaiting response Status - Awaiting response from author TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues

Comments

@Farxial
Copy link

Farxial commented Apr 3, 2020

Hi :) I am going to use neural networking and TensorFlow.
I'm trying to install different versions of tensorflow and tensorflow-gpu using pip (for example, 2.1.0 both tensorflow and tensorflow-gpu, 2.2.0-rc0 both tensorflow and tensorflow-gpu) and in Python (3.7) I get error about loading cudart64_101.dll, like this:
>>> import tensorflow as tf
2020-03-31 03:30:42.120394: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'cudart64_101.dll'; dlerror: cudart64_101.dll not found 2020-03-31 03:30:42.134395: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
I copied cuDNN files, also I set CUDA_HOME env. to value of CUDA_PATH env. My hardware meets the requirements.
As far as I understand, TensorFlow 2.1.0 should work fine with CUDA 10.1. But I don't want to use CUDA 10.1 unless emergency, I just install 10.2 and don't want to reinstall it to reinstall back to 10.2 again in future.
I ready to wait for 2.2.0 release, if that makes sense in my case. So my question is: Will TensorFlow 2.2.0 support CUDA 10.2?

@Farxial Farxial added the type:others issues not falling in bug, perfromance, support, build and install or feature label Apr 3, 2020
@gadagashwini-zz gadagashwini-zz added type:build/install Build and install issues and removed type:others issues not falling in bug, perfromance, support, build and install or feature labels Apr 6, 2020
@gadagashwini-zz
Copy link
Contributor

@Farxial, To use CUDA 10.2 with Tensorflow 2.2. Please build the Tensorflow from source.
Follow the instructions mentioned here. Thanks

@gadagashwini-zz gadagashwini-zz added the stat:awaiting response Status - Awaiting response from author label Apr 6, 2020
@mihaimaruseac
Copy link
Collaborator

CUDA 10.2 should be compatible with CUDA 10.1. We are building the official pips with CUDA 10.1 as we already changed infrastructure a lot to enable Python3.8 pips. Next release will have infrastructure changed for newer CUDA versions.

Until then, you can try compiling from source, or symlinking the libraries.

@Farxial
Copy link
Author

Farxial commented Apr 8, 2020

Symlinking works.
Nice :)
Thanks for answers :)

@gadagashwini-zz
Copy link
Contributor

@Farxial, Closing since the issue is resolved. Thanks!

@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@petervandenabeele
Copy link

petervandenabeele commented May 17, 2020

UPDATE: WARNING in #34759 (comment)

The symlink works for me too, details below (installed on Ubuntu 20.04):

  • actual 10.2 libcudart code is in /usr/local/cuda-10.2/
  • the tensorflow 2.2 code looks in a number of places (and fails to find it in all of them)
strace -o test1.log /usr/bin/python .../quick_tour.py
...
openat(AT_FDCWD, "/home/peter_v/.local/lib/python3.8/site-packages/tensorflow/python/../libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/peter_v/.local/lib/python3.8/site-packages/tensorflow/python/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/home/peter_v/.local/lib/python3.8/site-packages/tensorflow/python/../libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 20
fstat(20, {st_mode=S_IFREG|0644, st_size=83403, ...}) = 0
mmap(NULL, 83403, PROT_READ, MAP_PRIVATE, 20, 0) = 0x7fb8ad602000
close(20)                               = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/tls/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/tls/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/libcudart.so.10.1", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)

Somewhat at random, I decided to symlink from /usr/lib/x86_64-linux-gnu/ to the libcudart.so.10.2 file.

sudo ln -s /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2 /usr/lib/x86_64-linux-gnu/libcudart.so.10.1

I am actually using mostly the CPU (my 8 core CPU seems faster than a smallish laptop GPU and also the GPU runs easily into OOM for real work-loads).

@bbqf
Copy link

bbqf commented May 25, 2020

Just to confirm, symlink idea works on Windows too. I symlinked C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudart64_102.dll as cudart64_101.dll in the same folder.

@palisadoes
Copy link

In Ubuntu 20.04 you don't have to symlink, nor build from source. You just need to modify the installation steps in the TensorFlow documentation at https://www.tensorflow.org/install/gpu to match the new Cuda 10-2 package names.

Here are the modifications to the https://www.tensorflow.org/install/gpu instructions that worked for me:

# Download the 10-2 packages
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install the ubuntu drivers, if not done so already
sudo ubuntu-drivers autoinstall

# Install the 10-2 versions of packages
apt-get install -y --no-install-recommends \
cuda-10-2 \ 
libcudnn7=7.6.5.32-1+cuda10.2  \
libcudnn7-dev=7.6.5.32-1+cuda10.2 \
libnvinfer7=7.0.0-1+cuda10.2 \
libnvinfer-dev=7.0.0-1+cuda10.2 \
libnvinfer-plugin7=7.0.0-1+cuda10.2\
cuda-cudart-10-1

This works for a clean install.

For pre-existing configurations you may need to uninstall previous Cuda 10-1 packages beforehand.

@legel
Copy link

legel commented May 31, 2020

Just to confirm solution by @palisadoes works.
Just make sure you have installed everything (developer version) and then you can run:
sudo apt-get install cuda-cudart-10-1

@saket424
Copy link

saket424 commented Jun 7, 2020

I had to install libnvinfer-plugin-dev to fix /usr/include/x86_64-linux-gnu/NvInferPlugin.h file not found

dpkg -l | grep libnvinfer
ii libnvinfer-dev 7.0.0-1+cuda10.2 amd64 TensorRT development libraries and headers
ii libnvinfer-plugin-dev 7.0.0-1+cuda10.2 amd64 TensorRT plugin libraries
ii libnvinfer-plugin7 7.0.0-1+cuda10.2 amd64 TensorRT plugin libraries
ii libnvinfer7 7.0.0-1+cuda10.2 amd64 TensorRT runtime libraries

@thomasaarholt
Copy link

Expanding on the Windows fix for people who aren't familiar (like myself) with symlinks and just want it to work.
As admin, in cmd, paste:

mklink /H "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudart64_101.dll" "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\cudart64_102.dll"

Alternatively and more clearly written, navigate to the directory and do the same thing:

cd "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin"
mklink /H cudart64_101.dll cudart64_102.dll

I'm quite surprised that there doesn't exist out-of-the-box support for CUDA 10.2 yet. I mean, CUDA 11 is out.

@jjl-jjl
Copy link

jjl-jjl commented Jul 21, 2020

after trying most every solution I could find for windows even with @thomasaarholt fix,
turned out tensor flow could not find any dll's .even with setting the system path
"C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin."
the solution that worked for me with python 3.8 was

import os
os.add_dll_directory("C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.2/bin")

all the tensorflow dlls could be imported and everything works

got the hint to try this here , https://stackoverflow.com/questions/59330863/cant-import-dll-module-in-python

@GoingMyWay
Copy link

After many months, CUDA 10.2 still cannot work with TF 2.2?

@mihaimaruseac
Copy link
Collaborator

TF2.2 won't be patched to support newer CUDA versions. We can only bring new versions of CUDA with newer versions of TF (likely 2.4)

@oliharvey
Copy link

I am desperate for 10.2 support! - my company has bought me a graphics card and I can't get it to play with Cuda desite all the above suggestions. I have tried the nightly build of Tensorflow (which is 2.4) - but seems it still looks for 10.1.

Has anybody produced a build that supports 10.2 ?

@GoingMyWay
Copy link

Please try conda install -c anaconda tensorflow-gpu=1.15.0. Anaconda built TF under CUDA 10.2

@oliharvey
Copy link

Please try conda install -c anaconda tensorflow-gpu=1.15.0. Anaconda built TF under CUDA 10.2

thanks very much.
I did give this a shot, but no luck so far. For one thing it seems the max conda version of tensorflow is 2.1 - actually I need at least 2.2 for what I'm doing (Tensorflow.Net). Also - I am seeing the following error which I can't make much sense of:

The following specifications were found to be incompatible with your CUDA driver:

  - feature:/win-64::__cuda==10.2=0
  - feature:|@/win-64::__cuda==10.2=0

Your installed CUDA driver is: 10.2

@mihaimaruseac
Copy link
Collaborator

I think that in this case the best solution is to try building on the target machine with the 10.2 CUDA headers. NVidia claims compatibility between 10.1 and 10.2 so it should be possible to compile from source and have something working

@ZhihuaLiuEd
Copy link

on my windows machine with RTX2060, symlink works again for the cudnn.

cd "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin"
mklink /H cudnn64_7.dll cudnn64_8.dll

@alexshvid
Copy link

Compiled v2.3.1 for Cuda 10.2 in my fork:
v2.3.1-cuda10.2

@jung-youjin
Copy link

In Ubuntu 20.04 you don't have to symlink, nor build from source. You just need to modify the installation steps in the TensorFlow documentation at https://www.tensorflow.org/install/gpu to match the new Cuda 10-2 package names.

Here are the modifications to the https://www.tensorflow.org/install/gpu instructions that worked for me:

# Download the 10-2 packages
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.2.89-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install the ubuntu drivers, if not done so already
sudo ubuntu-drivers autoinstall

# Install the 10-2 versions of packages
apt-get install -y --no-install-recommends \
cuda-10-2 \ 
libcudnn7=7.6.5.32-1+cuda10.2  \
libcudnn7-dev=7.6.5.32-1+cuda10.2 \
libnvinfer7=7.0.0-1+cuda10.2 \
libnvinfer-dev=7.0.0-1+cuda10.2 \
libnvinfer-plugin7=7.0.0-1+cuda10.2\
cuda-cudart-10-1

This works for a clean install.

For pre-existing configurations you may need to uninstall previous Cuda 10-1 packages beforehand.

Is this valid only on Ubuntu 20.04? I'm curious if it works for Ubuntu 18.04 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests