Tensorflow 2.2 and 2.3 not detecting GPU with CUDA 10.1 #43236

javedsha · 2020-09-15T09:54:59Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04):
TensorFlow installed from (source or binary): binary
TensorFlow version: 2.3 and 2.2
Python version: 3.6
Installed using virtualenv? pip? conda?: venv and pip
GCC/Compiler version (if compiling from source): 7.5
CUDA/cuDNN version: 10.1
GPU model and memory: K80

Describe the problem
After installing tensorflow, GPU is not detected and getting error: 'Cannot open dynamic library libcublas.so.10'.

Provide the exact sequence of commands / steps that you executed before running into the problem

All the steps are followed from the official tensorflow page as it is: https://www.tensorflow.org/install/gpu and https://www.tensorflow.org/install/pip.
Also, i have to install cuda-toolkit separately.
Finally added CUDA-10.1 path in bashrc file.

How I fix the problem:

I started with a clean VM on Azure with nothing installed. Then followed the tensorflow guides (above) to install NVIDIA-Driver, CUDA 10.1, cuDNN, cuda-toolkit and tensorflow.

After all these steps, my local folder had two cuda folders (don't know why):
/usr/local/cuda-10.1/lib64/
/usr/localo/cuda-10.2/lib64/

The error which I was getting was for dynamic library 'libcublas.so.10'. And this file was not present in folder 'cuda-10.1', but instead it was present in 'cuda-10.2' (note, that i have installed everything in venv)

I have to manually copy all the files (including files inside the 'stubs' folder). And then it works.

This site also mention this issue, where they say that with CUDA 10.1, some of the libraries are installed differently - https://forums.developer.nvidia.com/t/cublas-for-10-1-is-missing/71015/4 (the steps here are when you install libraries at system level and not venv).

Expected Behaviour:
Either tensorflow should automatically refer to the missing dynamic libraries or mention how to fix this in Install Set up.

Note: The errors are similar when you install CUDA 10.2, it's just the dynamic library version are different.

ravikyram · 2020-09-15T11:00:37Z

@javedsha

TensorFlow v2.3 is compatible with CUDA 10.1 and cuDNN 7.6. For more information regarding this please take a look at the tested build configurations.

And the CUDA version mismatch query has been explained in this StackOverflow comment.

Can you paste the output of nvida-smi?
Thanks!

javedsha · 2020-09-15T20:12:43Z

@ravikyram

Output of NVIDIA-SMI:

Driver Version: 450.1.06
CUDA Version: 11.0

nvcc --version
10.1

I followed all the steps mentioned in the tensorflow gpu guide, the only thing extra I did was install cuda-toolkit 'sudo apt-get install cuda-toolkit'

What am i doing wrong?

ahtik · 2020-09-16T08:03:16Z

@javedsha The following is a procedure I use for Ubuntu 18.04, confirmed to work with the Ubuntu-shipped python 3.6. Hope it helps to pinpoint your issue.

In your case, the trouble possibly started with the sudo apt-get install cuda-toolkit, as it's not fixed to 10.1. Having 10.1 parallel to 10.2 and 11.0 is not advisable, nor practically feasible due to the env vars.

Btw, CUDA version that is reported by the nvidia-smi is not necessarily the CUDA version that Tensorflow picks up (longer story), but with my installation procedure it should report 10.1.

# To start fresh, clean up all the nivida-related packages. Be careful when using the same system as a desktop!
sudo apt-get --purge remove 'cuda*'
sudo apt-get --purge remove 'nvidia*'
sudo apt-get --purge remove 'libnvidia*'

# Check if all clean
sudo find /usr/local/cuda/ -name '*blas*'
sudo find /usr/lib/ -name '*blas*'

# CUDA 10.1 instructions for creating a locally available repo and installing from it
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update

# Make sure the driver number matches the GPU. Also -440 would most likely work.
sudo apt install nvidia-driver-418
sudo apt install cuda-10.1

# Make sure the libs are now in place
sudo find /usr/local/cuda/ -name '*blas*'
sudo find /usr/lib/ -name '*blas*'

# Run nvidia-smi for sanity check
nvidia-smi

python3 -m venv ~/.venv-tf2.3-sanity
. ~/.venv-tf2.3-sanity/bin/activate
pip install -U pip
pip install tensorflow==2.3
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([10000, 10000])))"

javedsha · 2020-09-16T13:57:31Z

@ahtik is the local thing same as the stable version? I will give this a try today and will post here. Thank you.

ahtik · 2020-09-16T15:11:01Z

Yes, it's the latest 10.1 cuda, just makes the deployment a bit easier for my use case.

…

On September 16, 2020 3:57:53 PM Javed Shaikh ***@***.***> wrote: @ahtik is the local thing same as the stable version? I will give this a try today and will post here. Thank you. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Zapunidi · 2020-09-17T09:36:50Z

I have the same problem with libcublas.so.10. Same OS, same python version, tf 2.3 etc. The only difference was that I didn't use venv and have different GPU.
I followed Ubutnu 18.04 instructions from official guide: https://www.tensorflow.org/install/gpu
I also found cuda 10.2 folder near cuda 10.1 folder with former having libcublas.so.10 and latter having all other libs.

My solution was to install cuda 10.2 even if it contradicts the guide. GPU is working in tensorflow now. I have taken cuda-10.2 from nvidia website as a deb package.

Also the guide itself (https://www.tensorflow.org/install/gpu) seems to be not perfectly written. It tells you to install CUPTI when there is no way to install it separately. It read as: "Install CUPTI which ships with the CUDA® Toolkit. Append its installation directory to the $LD_LIBRARY_PATH environmental variable:" when it should be IMHO "Install CUDA Toolkit. You will have CUPTI library installed. Append its installation directory to the $LD_LIBRARY_PATH environmental variable:" And I still don't get how section with CUPTI goes before section with cuda installation on Ubuntu. I hope my feedback will be useful.

ahtik · 2020-09-17T11:39:42Z

@Zapunidi Indeed, the official guide for Ubuntu doesn't seem to work for me either (all other libs load fine, getting one Warning):

2020-09-17 12:50:29.307000: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
2020-09-17 12:50:29.307313: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory
2020-09-17 12:50:29.334711: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10
2020-09-17 12:50:29.340930: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10
2020-09-17 12:50:29.391160: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10
2020-09-17 12:50:29.400149: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10
2020-09-17 12:50:29.507706: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7

Almost like something in the NVIDIA machine-learning repo still manages to force an upgrade from 10.1..

I do not have this warning when using the local installation method I posted previously. For cudnn and tensorrt/libnvinfer I have a separate tensorrt-cuda10.1 setup.

javedsha · 2020-09-22T07:41:53Z

So the issue is for everyone. It will make sense to upgrade the tensorflow documentation as it is not working.
@Zapunidi did you follow the same steps as mentioned in https://www.tensorflow.org/install/gpu, except that you installed Cuda 10.2 from official package. Did you also installed CUDA toolkit? Could you please post all the steps (detailed) as it can help everyone.

Zapunidi · 2020-09-23T19:01:49Z

@Zapunidi did you follow the same steps as mentioned in https://www.tensorflow.org/install/gpu, except that you installed Cuda 10.2 from official package. Did you also installed CUDA toolkit? Could you please post all the steps (detailed) as it can help everyone.

I can't see the difference between CUDA and CUDA Toolkit. Even https://www.tensorflow.org/install/gpu joggle these two terms like "The following NVIDIA® software must be installed on your system: ... CUDA® Toolkit —TensorFlow supports CUDA® 10.1 (TensorFlow >= 2.1.0)..." Then for Linux setup the manual just mentions "CUDA" without "Toolkit".
I didn't written down the exact steps, so my report is not reliable, sorry. I remember that I

Installed CUDA Toolkit 10.1 from the link from manual: https://developer.nvidia.com/cuda-toolkit-archive
Installed cuDNN SDK 7.6.5 from nvidia website.
Rebooted
Executed every command from Ubuntu 18.04 console commands block https://www.tensorflow.org/install/gpu I didn't reboot in the middle of the block as it tells me to do because I already had required drivers and kernel module. A violation, yes.

So it was not a clean install. I do not have a spare machine with supported GPU to make clean test for you guys. I also don't think that GPU virtualization is mature to use virtual machine on my primary PC.

ahtik · 2020-09-23T19:26:30Z

@Zapunidi Yes, CUDA and CUDA Toolkit is 100% the same. This reboot in the middle does not matter, as long as you still reboot after the last step.

One thing that might work is to run on top of everything still the "local" installation method like this and see what happens after reboot (taken from my comment above):

# CUDA 10.1 instructions for creating a locally available repo and installing from it
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update

sudo apt install cuda-10.1

# Check which libs are now where, just for your own sanity; env vars should be set after reboot by themselves.
sudo find /usr/local/cuda/ -name '*blas*'
sudo find /usr/lib/ -name '*blas*'

IF this fails and still curious, you can try with my instructions in #43236 (comment) using the "local" repo installation method and this way the CUDA version remains 10.1&TF works fine. Just make sure not to reboot before the end and ensure most recent nvidia-driver-418 suitable for your GPU is used (the same that you currently have). This does involve some risk when using on a primary PC, I just don't know a better way to quickly clean up everything cuda-related without removing the nvidia driver at the same time.

bnsblue · 2020-10-01T18:57:09Z

The problem is that libcublas seems to be missing when installing cuda-10.1 via apt
You'll not be able to find libcublas.so.10 under /usr/local/cuda-10.1/lib64/ (default path of installation)

A work around seems to be installing cuda-10.1 via runfile. I encountered another error during the installation process but maybe it works for you.

Check this thread for more details: https://forums.developer.nvidia.com/t/cublas-for-10-1-is-missing/71015/18

ahtik · 2020-10-01T20:38:17Z

@bnsblue If using the local installer method that I detailed in my comment above then libcublas.so.10 is being installed into /usr/lib/x86_64-linux-gnu/libcublas.so.10 and everything works fine without additional tweaks [1]. This works both for Ubuntu 18.04 and 20.04. Indeed, the TensorFlow official GPU installation method does not work for me as well. Btw, for Ubuntu 20.04 one should still use the 1804 repo in order to get access to cuda-10.1 (2004 apt only seems to have cuda-11).

[1] It has involved a bit for our use and does not include the libcudnn7 and tensorrt bits, but this should still work as well. For nvidia drivers using v455. If you're interested, I can provide the full instruction that I'm using.

javedsha · 2020-10-04T10:57:54Z

@ravikyram why this is waiting for author response? The steps in the documentation doesn't work.

bnsblue · 2020-10-06T00:56:38Z

@ahtik Thanks for the response! It would be awesome if you could share the full instructions :)

johntmyers · 2020-10-10T21:00:02Z

@ahtik I also tried your "local install" and libcublas.so.10 does not get installed into that location. Any ideas?

johntmyers · 2020-10-11T16:06:02Z

Must have gotten something wrong the first time, the libraries show up now. But nvidia-smi fails now.

sudo find /usr/lib/ -name '*blas*'
/usr/lib/x86_64-linux-gnu/libnvblas.so.10.2.1.243
/usr/lib/x86_64-linux-gnu/libcublas_static.a
/usr/lib/x86_64-linux-gnu/libcublasLt_static.a
/usr/lib/x86_64-linux-gnu/libcublas.so.10
/usr/lib/x86_64-linux-gnu/libnvblas.so.10
/usr/lib/x86_64-linux-gnu/libcublasLt.so.10.2.1.243
/usr/lib/x86_64-linux-gnu/libcublas.so
/usr/lib/x86_64-linux-gnu/libnvblas.so
/usr/lib/x86_64-linux-gnu/stubs/libcublas.so
/usr/lib/x86_64-linux-gnu/stubs/libcublasLt.so
/usr/lib/x86_64-linux-gnu/libcublasLt.so
/usr/lib/x86_64-linux-gnu/libcublasLt.so.10
/usr/lib/x86_64-linux-gnu/libcublas.so.10.2.1.243
/usr/lib/pkgconfig/cublas-10.pc

Then nvidia-smi yields: "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running."

I will continue to try and get this working.

ahtik · 2020-10-11T17:22:24Z

@johntmyers Did you make sure to restart the machine after all the driver and CUDA installation steps? This error is usually from not restarting.

johntmyers · 2020-10-11T17:32:22Z

@ahtik Yes, same issue. FWIW I'm using 18.04 on Google Compute Engine, so I'm not sure if something there is not working properly.

ravikyram · 2020-10-19T10:00:20Z

@javedsha

Any updates on the issue please. Thanks!

google-ml-butler · 2020-10-26T10:49:19Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

jchwenger · 2020-10-31T18:13:47Z

@ahtik thanks for your setup above, I would also be grateful if you shared your TensorRT instructions!

google-ml-butler · 2020-11-07T18:34:08Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2020-11-07T18:34:10Z

Are you satisfied with the resolution of your issue?
Yes
No

guruvishnuvardan · 2020-12-25T16:27:00Z

@ahtik

System information

OS Platform and Distribution (e.g., Linux Ubuntu 18.04): Ubuntu 18.04
TensorFlow installed from (source or binary): binary
TensorFlow version: 2.3
Python version: 3.6
Installed using virtualenv? pip? conda?: pip3
GCC/Compiler version (if compiling from source): 7.5
CUDA/cuDNN version: 10.1
GPU model and memory: 1080 TI

I have exactly followed the instruction dated Sep1-16. I am encountering the following error:

ioz@ioz-B250M-DS3H:$ python3.6 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([10000, 10000])))"
2020-12-25 21:48:47.947244: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 41, in
from tensorflow.python.tools import module_util as _module_util
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 45, in
from tensorflow.python import data
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/init.py", line 25, in
from tensorflow.python.data import experimental
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/init.py", line 125, in
from tensorflow.python.data.experimental.ops.parsing_ops import parse_example_dataset
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/ops/parsing_ops.py", line 26, in
from tensorflow.python.ops import parsing_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_ops.py", line 27, in
from tensorflow.python.ops import parsing_config
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_config.py", line 31, in
from tensorflow.python.ops import sparse_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py", line 42, in
from tensorflow.python.ops import special_math_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/special_math_ops.py", line 30, in
import opt_einsum
ModuleNotFoundError: No module named 'opt_einsum'
ioz@ioz-B250M-DS3H:$ python3.6 -c "import tensorflow as tf;

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
bash: unexpected EOF while looking for matching `"'
bash: syntax error: unexpected end of file
ioz@ioz-B250M-DS3H:~$ python3.6
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
2020-12-25 21:50:17.792384: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 41, in
from tensorflow.python.tools import module_util as _module_util
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 45, in
from tensorflow.python import data
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/init.py", line 25, in
from tensorflow.python.data import experimental
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/init.py", line 125, in
from tensorflow.python.data.experimental.ops.parsing_ops import parse_example_dataset
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/ops/parsing_ops.py", line 26, in
from tensorflow.python.ops import parsing_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_ops.py", line 27, in
from tensorflow.python.ops import parsing_config
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_config.py", line 31, in
from tensorflow.python.ops import sparse_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py", line 42, in
from tensorflow.python.ops import special_math_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/special_math_ops.py", line 30, in
import opt_einsum
ModuleNotFoundError: No module named 'opt_einsum'
import tensorflow as tf
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 41, in
from tensorflow.python.tools import module_util as _module_util
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 45, in
from tensorflow.python import data
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/init.py", line 25, in
from tensorflow.python.data import experimental
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/init.py", line 125, in
from tensorflow.python.data.experimental.ops.parsing_ops import parse_example_dataset
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/ops/parsing_ops.py", line 26, in
from tensorflow.python.ops import parsing_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_ops.py", line 27, in
from tensorflow.python.ops import parsing_config
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_config.py", line 31, in
from tensorflow.python.ops import sparse_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py", line 42, in
from tensorflow.python.ops import special_math_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/special_math_ops.py", line 30, in
import opt_einsum
ModuleNotFoundError: No module named 'opt_einsum'

ioz@ioz-B250M-DS3H:~$ python3.6
Python 3.6.9 (default, Oct 8 2020, 12:12:24)
[GCC 8.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import tensorflow as tf
2020-12-25 21:50:37.806079: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.6/dist-packages/tensorflow/init.py", line 41, in
from tensorflow.python.tools import module_util as _module_util
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/init.py", line 45, in
from tensorflow.python import data
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/init.py", line 25, in
from tensorflow.python.data import experimental
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/init.py", line 125, in
from tensorflow.python.data.experimental.ops.parsing_ops import parse_example_dataset
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/data/experimental/ops/parsing_ops.py", line 26, in
from tensorflow.python.ops import parsing_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_ops.py", line 27, in
from tensorflow.python.ops import parsing_config
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/parsing_config.py", line 31, in
from tensorflow.python.ops import sparse_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/sparse_ops.py", line 42, in
from tensorflow.python.ops import special_math_ops
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/special_math_ops.py", line 30, in
import opt_einsum
ModuleNotFoundError: No module named 'opt_einsum'

Can you please help me with your thoughts.

Thanks
Guru

LucaUrbinati44 · 2024-04-03T15:46:43Z

@javedsha The following is a procedure I use for Ubuntu 18.04, confirmed to work with the Ubuntu-shipped python 3.6. Hope it helps to pinpoint your issue.

In your case, the trouble possibly started with the sudo apt-get install cuda-toolkit, as it's not fixed to 10.1. Having 10.1 parallel to 10.2 and 11.0 is not advisable, nor practically feasible due to the env vars.

Btw, CUDA version that is reported by the nvidia-smi is not necessarily the CUDA version that Tensorflow picks up (longer story), but with my installation procedure it should report 10.1.

# To start fresh, clean up all the nivida-related packages. Be careful when using the same system as a desktop!
sudo apt-get --purge remove 'cuda*'
sudo apt-get --purge remove 'nvidia*'
sudo apt-get --purge remove 'libnvidia*'

# Check if all clean
sudo find /usr/local/cuda/ -name '*blas*'
sudo find /usr/lib/ -name '*blas*'

# CUDA 10.1 instructions for creating a locally available repo and installing from it
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update

# Make sure the driver number matches the GPU. Also -440 would most likely work.
sudo apt install nvidia-driver-418
sudo apt install cuda-10.1

# Make sure the libs are now in place
sudo find /usr/local/cuda/ -name '*blas*'
sudo find /usr/lib/ -name '*blas*'

# Run nvidia-smi for sanity check
nvidia-smi

python3 -m venv ~/.venv-tf2.3-sanity
. ~/.venv-tf2.3-sanity/bin/activate
pip install -U pip
pip install tensorflow==2.3
python -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([10000, 10000])))"

I thank @ahtik for his post. I would like to contibute with my own recipe that I derived from @ahtik 's one. Some comments down below.

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/nvidia-driver-440_440.33.01-0ubuntu1_amd64.deb
sudo apt install -y ./nvidia-driver-440_440.33.01-0ubuntu1_amd64.deb 

sudo apt-mark hold libnvidia-cfg1-440 libnvidia-compute-440 libnvidia-decode-440 libnvidia-encode-440 libnvidia-fbc1-440 libnvidia-gl-440 libnvidia-ifr1-440 nvidia-compute-utils-440 nvidia-dkms-440 nvidia-driver-440 nvidia-kernel-common-440 nvidia-kernel-source-440 nvidia-utils-440 xserver-xorg-video-nvidia-440

sudo apt install cuda-drivers=440.33.01-1 cuda-runtime-10-1 cuda-demo-suite-10-1 cuda-10.1

wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7_7.6.0.64-1+cuda10.1_amd64.deb
wget https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/libcudnn7-dev_7.6.0.64-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7_7.6.0.64-1+cuda10.1_amd64.deb libcudnn7-dev_7.6.0.64-1+cuda10.1_amd64.deb
sudo dpkg -l | grep cudnn

cat /usr/include/cudnn.h | grep CUDNN_MAJOR -A 2

My server GPU model: GeForce GTX 1070
Driver version: cat /proc/driver/nvidia/version 440.33.01

I had to force cuda-drivers to be 440.33.01-1 to make my nvidia-smi work after the installation.
I had to apt-mark hold some libraries to prevent their automatic upgrade during the installation of cuda 10.1.
I installed only python 3.8 and tensorflow 2.3.0 in my conda environment, to meet tensorflow requirements for cuda 10.1: https://www.tensorflow.org/install/source#gpu.
I realized that in my system I had to use cudnn 7.6.0 instad of 7.6.5 to make E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error disappear.
I did NOT install cudnn and cudatoolkit in my conda environment.
I exported the following environment variables at login in my .bashrc (see https://stackoverflow.com/a/64472380/11644517 for setting LD_LIBRARY_PATH correctly):

export CUDA_HOME=/usr/local/cuda
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:/usr/local/cuda-10.2/lib64
export XLA_FLAGS=--xla_gpu_cuda_data_dir=$CUDA_HOME
export TF_XLA_FLAGS="--tf_xla_enable_xla_devices"

Check if the GPU is working with:

python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

javedsha added the type:build/install Build and install issues label Sep 15, 2020

google-ml-butler bot assigned ravikyram Sep 15, 2020

ravikyram added the TF 2.3 Issues related to TF 2.3 label Sep 15, 2020

ravikyram added the stat:awaiting response Status - Awaiting response from author label Sep 15, 2020

Saduf2019 mentioned this issue Sep 18, 2020

Using Tensorflow-2.3.0 with GPU #43337

Closed

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Sep 19, 2020

ravikyram added the stat:awaiting response Status - Awaiting response from author label Sep 25, 2020

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Oct 6, 2020

ravikyram added the stat:awaiting response Status - Awaiting response from author label Oct 19, 2020

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Oct 26, 2020

CarolinaFurtado mentioned this issue Nov 2, 2020

upgrade software versions mit-quest/necstlab-damage-segmentation#75

Closed

google-ml-butler bot closed this as completed Nov 7, 2020

ravikyram added the subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues label Nov 10, 2020

LucaUrbinati44 mentioned this issue Apr 3, 2024

CUDA_ERROR_NOT_INITIALIZED: initialization error zhangruochi/Mol-HGT#1

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensorflow 2.2 and 2.3 not detecting GPU with CUDA 10.1 #43236

Tensorflow 2.2 and 2.3 not detecting GPU with CUDA 10.1 #43236

javedsha commented Sep 15, 2020

ravikyram commented Sep 15, 2020 •

edited

javedsha commented Sep 15, 2020

ahtik commented Sep 16, 2020 •

edited

javedsha commented Sep 16, 2020

ahtik commented Sep 16, 2020 via email

Zapunidi commented Sep 17, 2020

ahtik commented Sep 17, 2020

javedsha commented Sep 22, 2020

Zapunidi commented Sep 23, 2020

ahtik commented Sep 23, 2020 •

edited

bnsblue commented Oct 1, 2020 •

edited

ahtik commented Oct 1, 2020 •

edited

javedsha commented Oct 4, 2020

bnsblue commented Oct 6, 2020

johntmyers commented Oct 10, 2020

johntmyers commented Oct 11, 2020

ahtik commented Oct 11, 2020

johntmyers commented Oct 11, 2020

ravikyram commented Oct 19, 2020

google-ml-butler bot commented Oct 26, 2020

jchwenger commented Oct 31, 2020

google-ml-butler bot commented Nov 7, 2020

google-ml-butler bot commented Nov 7, 2020

guruvishnuvardan commented Dec 25, 2020 •

edited

LucaUrbinati44 commented Apr 3, 2024 •

edited

Tensorflow 2.2 and 2.3 not detecting GPU with CUDA 10.1 #43236

Tensorflow 2.2 and 2.3 not detecting GPU with CUDA 10.1 #43236

Comments

javedsha commented Sep 15, 2020

ravikyram commented Sep 15, 2020 • edited

javedsha commented Sep 15, 2020

ahtik commented Sep 16, 2020 • edited

javedsha commented Sep 16, 2020

ahtik commented Sep 16, 2020 via email

Zapunidi commented Sep 17, 2020

ahtik commented Sep 17, 2020

javedsha commented Sep 22, 2020

Zapunidi commented Sep 23, 2020

ahtik commented Sep 23, 2020 • edited

bnsblue commented Oct 1, 2020 • edited

ahtik commented Oct 1, 2020 • edited

javedsha commented Oct 4, 2020

bnsblue commented Oct 6, 2020

johntmyers commented Oct 10, 2020

johntmyers commented Oct 11, 2020

ahtik commented Oct 11, 2020

johntmyers commented Oct 11, 2020

ravikyram commented Oct 19, 2020

google-ml-butler bot commented Oct 26, 2020

jchwenger commented Oct 31, 2020

google-ml-butler bot commented Nov 7, 2020

google-ml-butler bot commented Nov 7, 2020

guruvishnuvardan commented Dec 25, 2020 • edited

LucaUrbinati44 commented Apr 3, 2024 • edited

ravikyram commented Sep 15, 2020 •

edited

ahtik commented Sep 16, 2020 •

edited

ahtik commented Sep 23, 2020 •

edited

bnsblue commented Oct 1, 2020 •

edited

ahtik commented Oct 1, 2020 •

edited

guruvishnuvardan commented Dec 25, 2020 •

edited

LucaUrbinati44 commented Apr 3, 2024 •

edited