Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory #26182

Closed
gian1312 opened this issue Feb 27, 2019 · 88 comments
Assignees
Labels
stat:awaiting response Status - Awaiting response from author subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues

Comments

@gian1312
Copy link

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

  • Linux Mint
    -Anaconda - pip install tensorflow-gpu
  • 9.0/7.5:
  • 1080 ti

I was using tensorflow gpu last year. I wanted to set it up again. I got it running on my Windows 10 partition. Now I have tried to set it up again on my Mint partition. I always get the following error.
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory.
I thought TF needs cuda 9.0 and not 10.0?

The error occurs if I execute the following code.

import tensorflow as tf
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

@ppwwyyxx
Copy link
Contributor

  • Latest TensorFlow supports cuda 8-10. cudnn 6-7.
  • Each TensorFlow binary has to work with the version of cuda and cudnn it was built with. If they don't match, you have to change either the TensorFlow binary or the Nvidia softwares.
  • Official tensorflow-gpu binaries (the one downloaded by pip or conda) are built with cuda 9.0, cudnn 7 since TF 1.5, and cuda 10.0, cudnn 7 since TF 1.13. These are written in the release notes. You have to use the matching version of cuda if using the official binaries.
  • If you don't like to change your Nvidia software, you can:
    (1) Use a different version of TensorFlow
    (2) Use non-official binaries built by others. e.g.: https://github.com/mind/wheels/releases, https://github.com/hadim/docker-tensorflow-builder#builds,
    https://github.com/inoryy/tensorflow-optimized-wheels
    (3) Build the binaries by yourself from source with your version of Nvidia software.

@jvishnuvardhan jvishnuvardhan self-assigned this Feb 27, 2019
@jvishnuvardhan jvishnuvardhan added type:build/install Build and install issues subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues labels Feb 27, 2019
@jvishnuvardhan
Copy link
Contributor

@gian1312 I think it is looking for CUDA10 file. The error is due to mismatch is CUDA version. Best approach is install TF from clean state. Please follow @ppwwyyxx suggestion to select best versions (TF1.12, CUDA9.0 or TF1.13,CUDA10.0) for your need. Please uninstall python and tensorflow and then follow the instructions to install TF fresh. Please let me know how it progresses. Thanks!

@jvishnuvardhan jvishnuvardhan added the stat:awaiting response Status - Awaiting response from author label Feb 27, 2019
@rhinsall
Copy link

identical problem here.

clean installation of Nvidia drivers, CUDA 10.1 and TF

libcublas.so.10.0 error as soon as TF is called.

Ubuntu 18.04.2 LTS; Also Anaconda install of Python 3.7 (is the anaconda install relevant?); 2070

@jvishnuvardhan
Copy link
Contributor

@rhinsall Which TF version you are trying to install? Could you install CUDA10 or correctly reference the CUDA10.1 path in cuDNN. Thanks

@OmnipotentEntity
Copy link

OmnipotentEntity commented Feb 28, 2019

It does not seem possible to install Tensorflow with default packaging on Ubuntu 18.04. You have to either build TF from scratch, which requires sourcing an older version of bazel than is available through the default repositories, or manually install specific versions of nvidia drivers and libraries.

None of the linked wheels from upthread are yet built against CUDA 10.1.

@gian1312
Copy link
Author

Thanks a lot. I relyed on the website and haven't realised, that a new version came out a few days ago. I am sorry. I downgraded to 1.12. Now, my graphic card gets found with the mentioned code.

Sadly, the code (an example from a lecture I attend) which runs on my Windows installation perfectly fine (30 s) takes 6 min on my Linux installation an puts the CPU under load. Is there a work around to force Tensorflow to use the GPU?

@rhinsall
Copy link

@rhinsall Which TF version you are trying to install? Could you install CUDA10 or correctly reference the CUDA10.1 path in cuDNN. Thanks

I'll come home much later and report the exact numbers and paths - but it's a fresh install, downloaded yesterday, CUDA 10.1 per Nvidia's instructions and TF clean install using PIP & Python 3.7

@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Mar 1, 2019
@ghost
Copy link

ghost commented Mar 2, 2019

@rhinsall
I just found this out myself, not sure if it's common knowledge, but got around this by doing

conda install cudatoolkit
conda install cudnn

I have cuda-10.1 installed on my box, this installed a local conda-only cuda-10.0. Obviously this is to just keep tensorflow working while waiting for better support.

@rhinsall
Copy link

rhinsall commented Mar 2, 2019

Excellent advice. Immediate rescue. Thank you very much fabricatedmath.

@jvishnuvardhan
Copy link
Contributor

@gian1312 That is strange. There is a guide on using gpu here. Using those instructions you can force TF to use a gpu. Some times it is better to uninstall and reinstall TF. Please let me know how it progresses. If the issue was resolved, please close the ticket. Thanks!

@ivineetm007
Copy link

hi,
I am having the similar problem. So , I created new conda environment and installed tensorflow-gpu as
`
conda install tensorflow-gpu
Collecting package metadata: done
Solving environment: done

Package Plan

environment location: /home/lasii/anaconda3/envs/drunk2

added / updated specs:
- tensorflow-gpu

The following packages will be downloaded:

package                    |            build
---------------------------|-----------------
_tflow_select-2.1.0        |              gpu           2 KB  defaults
absl-py-0.4.1              |           py35_0         144 KB  defaults
astor-0.7.1                |           py35_0          43 KB  defaults
cupti-9.2.148              |                0         1.7 MB  defaults
gast-0.2.0                 |           py35_0          15 KB  defaults
grpcio-1.12.1              |   py35hdbcaa40_0         1.7 MB  defaults
libprotobuf-3.6.0          |       hdbcaa40_0         4.1 MB  defaults
markdown-2.6.11            |           py35_0         104 KB  defaults
mkl_fft-1.0.6              |   py35h7dd41cf_0         149 KB  defaults
mkl_random-1.0.1           |   py35h4414c95_1         362 KB  defaults
numpy-1.15.2               |   py35h1d66e8a_0          47 KB  defaults
numpy-base-1.15.2          |   py35h81de0dd_0         4.2 MB  defaults
protobuf-3.6.0             |   py35hf484d3e_0         615 KB  defaults
six-1.11.0                 |           py35_1          21 KB  defaults
tensorboard-1.10.0         |   py35hf484d3e_0         3.3 MB  defaults
tensorflow-1.10.0          |gpu_py35hd9c640d_0           3 KB  defaults
tensorflow-base-1.10.0     |gpu_py35had579c0_0       190.6 MB  defaults
tensorflow-gpu-1.10.0      |       hf154084_0           2 KB  defaults
termcolor-1.1.0            |           py35_1           7 KB  defaults
------------------------------------------------------------
                                       Total:       207.1 MB

The following NEW packages will be INSTALLED:

_tflow_select pkgs/main/linux-64::_tflow_select-2.1.0-gpu
absl-py pkgs/main/linux-64::absl-py-0.4.1-py35_0
astor pkgs/main/linux-64::astor-0.7.1-py35_0
blas pkgs/main/linux-64::blas-1.0-mkl
cudatoolkit pkgs/main/linux-64::cudatoolkit-9.2-0
cudnn pkgs/main/linux-64::cudnn-7.3.1-cuda9.2_0
cupti pkgs/main/linux-64::cupti-9.2.148-0
gast pkgs/main/linux-64::gast-0.2.0-py35_0
grpcio pkgs/main/linux-64::grpcio-1.12.1-py35hdbcaa40_0
intel-openmp pkgs/main/linux-64::intel-openmp-2019.1-144
libgfortran-ng pkgs/main/linux-64::libgfortran-ng-7.3.0-hdf63c60_0
libprotobuf pkgs/main/linux-64::libprotobuf-3.6.0-hdbcaa40_0
markdown pkgs/main/linux-64::markdown-2.6.11-py35_0
mkl pkgs/main/linux-64::mkl-2018.0.3-1
mkl_fft pkgs/main/linux-64::mkl_fft-1.0.6-py35h7dd41cf_0
mkl_random pkgs/main/linux-64::mkl_random-1.0.1-py35h4414c95_1
numpy pkgs/main/linux-64::numpy-1.15.2-py35h1d66e8a_0
numpy-base pkgs/main/linux-64::numpy-base-1.15.2-py35h81de0dd_0
protobuf pkgs/main/linux-64::protobuf-3.6.0-py35hf484d3e_0
six pkgs/main/linux-64::six-1.11.0-py35_1
tensorboard pkgs/main/linux-64::tensorboard-1.10.0-py35hf484d3e_0
tensorflow pkgs/main/linux-64::tensorflow-1.10.0-gpu_py35hd9c640d_0
tensorflow-base pkgs/main/linux-64::tensorflow-base-1.10.0-gpu_py35had579c0_0
tensorflow-gpu pkgs/main/linux-64::tensorflow-gpu-1.10.0-hf154084_0
termcolor pkgs/main/linux-64::termcolor-1.1.0-py35_1
werkzeug pkgs/main/linux-64::werkzeug-0.14.1-py35_0
`
After installation . I just imported tensorflow and got the error.

`Traceback (most recent call last):
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "", line 1, in
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/init.py", line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/init.py", line 49, in
from tensorflow.python import pywrap_tensorflow
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 74, in
raise ImportError(msg)
ImportError: Traceback (most recent call last):
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow.py", line 58, in
from tensorflow.python.pywrap_tensorflow_internal import *
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 28, in
_pywrap_tensorflow_internal = swig_import_helper()
File "/home/lasii/.local/lib/python3.5/site-packages/tensorflow/python/pywrap_tensorflow_internal.py", line 24, in swig_import_helper
_mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 243, in load_module
return load_dynamic(name, filename, file)
File "/home/lasii/anaconda3/envs/drunk2/lib/python3.5/imp.py", line 343, in load_dynamic
return _load(spec)
ImportError: libcublas.so.10.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors
`

I just started using github. Guide me if I am posting improperly.

@codexponent
Copy link

codexponent commented Mar 9, 2019

@ivineetm007 , Can you check the CUDA version!

@ivineetm007
Copy link

@codexponent
It's 9.20
Conda automatically installed it while installing tensorflow-gpu.

@codexponent
Copy link

I think you should update your CUDA version to 10 along.
This link will help you
Link: https://www.nvidia.com/Download/index.aspx?lang=en-us

@ivineetm007
Copy link

@codexponent
I installed cuda 10.0 in conda by
conda install -c fragcolor cuda10.0

Now , there are two cuda in conda environment package list.
cudatoolkit 9.2
cuda 10.0

But the same error occurs on importing tensorflow.

@codexponent
Copy link

@ivineetm007 , Can you do nvidia-smi and check the head of the table! I am sure that you need to update cuda by downloading the nvidia driver from their website.

@ivineetm007
Copy link

ivineetm007 commented Mar 9, 2019

@codexponent
header
NVIDIA-SMI 396.54 Driver Version: 396.54

I am working on a PC in college which is alloted to two or three students. I am not sure if I install cuda by downloading , it will not affect the other environment in conda.
A little history...
I am using code in the link
(https://github.com/DevendraPratapYadav/gsoc18_RedHenLab/tree/master/video_processing_pipeline)
In this link, setup is done on conda . Two weeks ago, tensorflow was [running] perfectly while running the above code.
But someone updated conda in the PC. Now, I am having libculas.so.10.0 error.

@codexponent
Copy link

@ivineetm007 , if this is not your pc i suggest you don't update it as it might break other environments working for cuda 9. Do one thing, create a new environment, install tensorflow with the specific version number
pip install tensorfow==1.10.0 and then test a very simple code like addition of 2 numbers(tf.add). See if this runs or not.

@ivineetm007
Copy link

ivineetm007 commented Mar 9, 2019

@codexponent
I tried your suggestion. It worked fine . Then I tried to install tf-gpu and keras as -
conda install -y -c anaconda tensorflow-gpu==1.7.0
conda install -y keras
Now I am having error-
AttributeError: module 'tensorflow.python.training.checkpointable' has no attribute 'CheckpointableBase'
I followed the solution for this error in the link
(https://github.com/tensorflow/tensorflow/issues/20499l)
which suggested reinstalling.
I think some other version of tensorflow-gpu will work

@codexponent
Copy link

@ivineetm007 , try to do the same thing with opening tf session on the gpu. This link may help
Link: https://www.tensorflow.org/guide/using_gpu

Another solution: Don't install anything from conda, just install from pip
Steps:

  1. Create a fresh environment
  2. pip install tensorflow==1.12.0
  3. pip install tensorflow-gpu==1.12.0
  4. pip install keras==2.1.3
    If you have anything that you want to install from conda, check if it is available on the pip version. If it is not then,
    Let's say that your env name is my_env_1
    after activating that environment, type which conda,
    if this gives the path to your created environment (...\my_env_1...), then you can install other essential environments. If this gives (..\...), then type pip install conda, then install other essential environments. (be sure to check again by typing which conda)

@lipingbj
Copy link

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

@lipingbj
Copy link

Same problem.My cuda version is 10.1,but the the libcublas.so.10.0 file is not in the catalogue of lib64.I am installing the tensorflow-gpu with the command 'pip install tensorflow-gpu'.

It seems that the libcublas-version is removed by the cuda 10

@codexponent
Copy link

@lipingbj , did you update the cuda version from conda command or through nvidia official site, I think doing from the actual site might help t get those .so files
Link: https://www.nvidia.com/Download/index.aspx?lang=en-us

@RenShuhuai-Andy
Copy link

RenShuhuai-Andy commented Oct 30, 2019

Look for compatible tensorflow and cuda versions:
https://www.tensorflow.org/install/source#tested_build_configurations
Look for campatible tensorflow and keras versoins:
https://docs.floydhub.com/guides/environments/

@jtk1919
Copy link

jtk1919 commented Dec 26, 2019

Install CUDA 10.0 into /usr/local/cuda110.0/ where your program will find the new libraries.

If you have CUDA 10.1 installed into /usr/local/cuda-10.1/ along with the nvidia drivers. In that event, skip installing the drivers and only install the cuda libraries. Don't link /usr/local/cuda to 10.0. Leave it linked to 10.1. We just need the libraries in the location tensorflow looks for it. cuda 10.0 successfully shares the drivers 10.1 installed.

@marcfielding1
Copy link

marcfielding1 commented Jan 4, 2020

On a fresh ubuntu 18.04, this does purge nvidia* to try and get a clean install.

sudo apt-get install build-essential -y
sudo apt-get install cmake git unzip zip -y
sudo apt-get install python-dev python3-dev python-pip python3-pip -y
sudo apt-get install linux-headers-$(uname -r) -y
sudo apt-get purge nvidia* -y
sudo apt-get autoremove -y
sudo apt-get autoclean -y
sudo rm -rf /usr/local/cuda* 
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update 
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-0 cuda-drivers -y

Reboot, no really.

echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig

CUDNN

CUDNN can be downloaded from here note if you're redirected to the NVIDIA home page you either need to sign in, or create an account, come back here click that link again it'll take you to the download page.

Select the one that says 'Download cuDNN v7.something, for CUDA 10.0' make sure it's for version 10.0

tar -xzvf cudnn-10.0-linux-x64-v7.6.*.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*

Reboot, welcome to the party. This is the most reliable fiddle free way I can find, it always seems to work whereas conda etc seems to be flakey for sometimes.

@aaron-michaux
Copy link

Why is this closed? It is definitely an issue, where tensorflow doesn't seem to be observing the naming conventions of unix shared libraries. So, for tensorflow-gpu==2.1.0, it's trying to load libcublas.so.10.0, but it should be trying to load libcublas.so.10 for cuda10.

Either that, or there's more stringent requirements on the individual cuda versions. (Is Cuda really that unstable from version to version?)

Whatever the case, tensorflow 2.1.0 is supposed to be compatible with cuda 10.1 -- the installation instructions show installing cuda 10.1. But it's then trying to load a 10.0 shared library. That is a bug in tensorflow, or in the instructions.

@zhuangz-ma
Copy link

pip3 install tensorflow-gpu==2.0.0b1
2.0.0b1 this version works for me.

@annabechang
Copy link

@mostafaelhoushi
When i am running this command find / -name "libcublas.so.10.0"
the output is

/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/tf_serving/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/tensorflow_serving/model_servers/tensorflow_model_server.runfiles/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/bin/_solib_local/_U@local_Uconfig_Ucuda_S_Scuda_Ccublas___Uexternal_Slocal_Uconfig_Ucuda_Scuda_Scuda_Slib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/k8-opt/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/33ff618e94595ffbdc09016439dc6a469fa8adc3ec3b5231f776d6065aab7968/diff/root/.cache/bazel/_bazel_root/e53bbb0b0da4e26d24b415310219b953/execroot/tf_serving/bazel-out/host/genfiles/external/local_config_cuda/cuda/cuda/lib/libcublas.so.10.0
/var/lib/docker/overlay2/97cb0c942535cde4622f53bf094251cd1aef1cfc744e8ddda1472ee691f87618/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/2fb234250d278545f55a004fcd436b4cba5e847c40503b990ffe800f3b440cb5/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/var/lib/docker/overlay2/c704b6be3bc1a5d25119fa46216a4e64f872d8001d8bed6d40930f6420ffb091/diff/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcublas.so.10.0
/usr/local/cuda-10.0/lib64/libcublas.so.10.0

OK. I see libcublas.so.10.0 is found in /usr/local/cuda-10.0/lib64/.
Try running this command:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.0/lib64/

and try again.
NOTE: I see the library is also found in your docker system. I am not familiar with dockers, so maybe someone else could help here. But try the above command and see.

@mostafaelhoushi have given the best solution. Anyone who is confused see this answer. :)

this works for me! thanks!

@TekayaNidham
Copy link

in case anyone still facing this:
i got cuda 10.2 and i just ran into this problem and here's how i solved it :

cd ~
gedit .bashrc
#add this in the end : 
export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda10.2/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

@pratikadarsh
Copy link

In my case, the issue was that the location of libcublas changed with cuda 10.1 and needed me to update my LD_LIBRARY_PATH

Exactly.

  1. find your CUDA install path, in my case it is /usr/local/cuda
  2. export LD_LIBRARY_PATH=/usr/local/cuda/lib64

Then TF follow LD_LIBRARY_PATH to locate libcublas.so.10.0

This worked out for me. I had a tensorflow-gpu 1.13.1 installed inside a python virutal environment. CUDA installation is 10.0. Tried symlinks but didn't work. Setting LD_LIBRARY_PATH did the job.

@DuaneNielsen
Copy link

DuaneNielsen commented Aug 16, 2020

in case anyone still facing this:
i got cuda 10.2 and i just ran into this problem and here's how i solved it :

cd ~
gedit .bashrc
#add this in the end : 
export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda10.2/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Thanks, I was looking for a solution that would allow me to run various versions of tensorflow and pytorch on the same machine. This worked out great!

@matteotosi
Copy link

matteotosi commented Aug 27, 2020

in case anyone still facing this:
i got cuda 10.2 and i just ran into this problem and here's how i solved it :

cd ~
gedit .bashrc
#add this in the end : 
export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda10.2/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Thanks! It worked for me!
I only had a problem with nvcc command, that was not found anymore.

As a workaround, I exported both cuda and cuda-10.2 in the LD_LIBRARY_PATH:

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64:/usr/local/cuda-10.2/extras/CUPTI/lib64

In this way tensorflow is working, alongside nvcc

@maxicus
Copy link

maxicus commented Sep 1, 2020

I found this topic while I was trying to resolve similar problem but seems with newer version. So want to add for those who get here too.

Right now actual Python version 3.8.5 installs TensorFlow 2.3.0.
It requires cuda 10.1 and default.

Easiest way to install it together with nvidia drivers (thanks @marcfielding1) is:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-1 cuda-drivers -y

Once done, it cause the same:

 Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

That happens because libcublas.so.10 sits in /usr/local/cuda-10.2/lib64 (surprise from nvidia - installation of 10.1 installs some 10.2 stuff) but only /usr/local/cuda is in include path which points to /usr/local/cuda-10.1.

Adding it to include path helps and everything seems working:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

@nityansuman
Copy link

I found this topic while I was trying to resolve similar problem but seems with newer version. So want to add for those who get here too.

Right now actual Python version 3.8.5 installs TensorFlow 2.3.0.
It requires cuda 10.1 and default.

Easiest way to install it together with nvidia drivers (thanks @marcfielding1) is:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-1 cuda-drivers -y

Once done, it cause the same:

 Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

That happens because libcublas.so.10 sits in /usr/local/cuda-10.2/lib64 (surprise from nvidia - installation of 10.1 installs some 10.2 stuff) but only /usr/local/cuda is in include path which points to /usr/local/cuda-10.1.

Adding it to include path helps and everything seems working:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Adding

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

to your .bashrc or .zshrc would do the trick.
Before that make sure you have both cuda-10.1 and cuda-10.2 available at /usr/local/

@bulatnv
Copy link

bulatnv commented Sep 2, 2020

in case anyone still facing this:
i got cuda 10.2 and i just ran into this problem and here's how i solved it :

cd ~
gedit .bashrc
#add this in the end : 
export PATH=/usr/local/cuda-10.2/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda10.2/targets/x86_64linux\${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Thanks! It worked for me!
I only had a problem with nvcc command, that was not found anymore.

As a workaround, I exported both cuda and cuda-10.2 in the LD_LIBRARY_PATH:

export PATH=$PATH:/usr/local/cuda/bin
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH
In this way tensorflow is working, alongside nvcc

This worked perfectly for me, plus I added export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/targets/x86_64-linux/lib/.

After that tensorflow found physical GPU.
Good luck.

@mkuchnik
Copy link
Contributor

mkuchnik commented Sep 8, 2020

I found this topic while I was trying to resolve similar problem but seems with newer version. So want to add for those who get here too.

Right now actual Python version 3.8.5 installs TensorFlow 2.3.0.
It requires cuda 10.1 and default.

Easiest way to install it together with nvidia drivers (thanks @marcfielding1) is:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-1 cuda-drivers -y

Once done, it cause the same:

 Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

That happens because libcublas.so.10 sits in /usr/local/cuda-10.2/lib64 (surprise from nvidia - installation of 10.1 installs some 10.2 stuff) but only /usr/local/cuda is in include path which points to /usr/local/cuda-10.1.

Adding it to include path helps and everything seems working:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Specifically, the 10-2 packages come from libcublas10 (which may get installed automatically with the others, but not if apt thinks it's already installed e.g., when I deleted the "surprise" files). A possible corner case for some.

sudo apt-get -o Dpkg::Options::="--force-overwrite" install --reinstall cuda-10-1 cuda-drivers libcublas10 -y

I then symbolic linked it into cuda.

sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda
sudo ln -s /usr/local/cuda-10.2/lib64/* /usr/local/cuda/lib64/

@rfryeSigma
Copy link

SUCESS!

$ export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64

rfrye@SunWukong:~/Code/General-Development/Roger/c2111_whole_anomaly_ML/experiments$ python3.8 img_stack.py CHANNELS Machine_Learning_Recoat_Interaction_2_Stable_Indra_FullScans 2 -f ted,tep -l 130,131 --template 704,608 -vt 0b010 -A VH 2020-10-13 15:58:35.873728: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 Initial config length 2 0: Build Machine_Learning_Recoat_Interaction_2_Stable_Indra_FullScans config {5: [126, 127, 128, 129, 130, 131, 132, 133, 134, 135], 2: [128, 131, 132, 133, 134]} 2020-10-13 15:58:36.762308: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2020-10-13 15:58:36.807565: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:3b:00.0 name: Quadro P6000 computeCapability: 6.1 coreClock: 1.645GHz coreCount: 30 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 403.49GiB/s 2020-10-13 15:58:36.807615: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-10-13 15:58:36.810036: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-10-13 15:58:36.812030: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2020-10-13 15:58:36.812371: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2020-10-13 15:58:36.814467: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2020-10-13 15:58:36.815718: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2020-10-13 15:58:36.820158: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2020-10-13 15:58:36.822504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2020-10-13 15:58:36.822858: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2020-10-13 15:58:36.839247: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2400000000 Hz 2020-10-13 15:58:36.842438: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5e531c0 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2020-10-13 15:58:36.842512: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2020-10-13 15:58:36.967553: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5670d20 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices: 2020-10-13 15:58:36.967624: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Quadro P6000, Compute Capability 6.1 2020-10-13 15:58:36.970244: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:3b:00.0 name: Quadro P6000 computeCapability: 6.1 coreClock: 1.645GHz coreCount: 30 deviceMemorySize: 23.88GiB deviceMemoryBandwidth: 403.49GiB/s 2020-10-13 15:58:36.970326: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-10-13 15:58:36.970366: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.10 2020-10-13 15:58:36.970393: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcufft.so.10 2020-10-13 15:58:36.970419: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcurand.so.10 2020-10-13 15:58:36.970467: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusolver.so.10 2020-10-13 15:58:36.970494: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcusparse.so.10 2020-10-13 15:58:36.970521: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.7 2020-10-13 15:58:36.974574: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1858] Adding visible gpu devices: 0 2020-10-13 15:58:36.974639: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-10-13 15:58:37.468505: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-10-13 15:58:37.468542: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] 0 2020-10-13 15:58:37.468548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1276] 0: N 2020-10-13 15:58:37.470189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1402] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 22737 MB memory) -> physical GPU (device: 0, name: Quadro P6000, pci bus id: 0000:3b:00.0, compute capability: 6.1)

and it ran about 5x faster on this very simple problem.

@MinaGabriel
Copy link

Check what do you have in
cd /usr/local/cuda + hit tab

if for example you get

cuda/ cuda-10.1/ cuda-10.2

you need to export cuda-10.1 and cuda-10.2 as following

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.1/lib64/

and

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-10.2/lib64/

This is what works for me

@marcfielding1
Copy link

I found this topic while I was trying to resolve similar problem but seems with newer version. So want to add for those who get here too.
Right now actual Python version 3.8.5 installs TensorFlow 2.3.0.
It requires cuda 10.1 and default.
Easiest way to install it together with nvidia drivers (thanks @marcfielding1) is:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-1 cuda-drivers -y

Once done, it cause the same:

 Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

That happens because libcublas.so.10 sits in /usr/local/cuda-10.2/lib64 (surprise from nvidia - installation of 10.1 installs some 10.2 stuff) but only /usr/local/cuda is in include path which points to /usr/local/cuda-10.1.
Adding it to include path helps and everything seems working:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

Specifically, the 10-2 packages come from libcublas10 (which may get installed automatically with the others, but not if apt thinks it's already installed e.g., when I deleted the "surprise" files). A possible corner case for some.

sudo apt-get -o Dpkg::Options::="--force-overwrite" install --reinstall cuda-10-1 cuda-drivers libcublas10 -y

I then symbolic linked it into cuda.

sudo ln -s /usr/local/cuda-10.1 /usr/local/cuda
sudo ln -s /usr/local/cuda-10.2/lib64/* /usr/local/cuda/lib64/

Doesn't this section fix that? Just asking as I was about to do a clean install :-)

echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig

@alexkyllo
Copy link

I just ran into this issue installing tensorflow 2.3.1 with the current instructions at https://www.tensorflow.org/install/gpu which are currently:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-450
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-1 \
    libcudnn7=7.6.5.32-1+cuda10.1  \
    libcudnn7-dev=7.6.5.32-1+cuda10.1


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
    libnvinfer-dev=6.0.1-1+cuda10.1 \
    libnvinfer-plugin6=6.0.1-1+cuda10.1

I fixed it by just adding these steps at the end:

sudo apt install --reinstall libcublas10

Add to ~/.bashrc:

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-10.2/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH

@HardRockDude
Copy link

Massive kudos to the scholar and gentleman @alexkyllo . His two simple additions to the official guide saved my life.
Running a GeForce 1660 Ti on Ubuntu 20.10, Python 3.7, Tensorflow 2.3.1.

To reiterate once more:

  1. Purge everything NVidia and CUDA related. Yes, some of these lines are redundant. Just make sure everything gets deleted and you start off on a clean machine.
sudo apt-get --purge remove "*cublas*" "cuda*" "nsight*" 
sudo apt-get --purge remove "*nvidia*"
sudo rm -rf /usr/local/cuda*
sudo apt-get purge nvidia*
  1. Follow official guide precisely and exactly (https://www.tensorflow.org/install/gpu#ubuntu_1804_cuda_101). (Yes, even if you're running Ubuntu 20.10, because there is nothing newer posted on this guide.)

  2. Then when trying to use TF: "Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory"

  3. Use @alexkyllo s fixes:
    4.1 sudo apt install --reinstall libcublas10
    4.2 add this to ~/.bashrc:
    export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda-10.2/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH

  4. Reboot one last time. Just do it.

  5. ENJOY A WORKING TENSORFLOW.

What a mess the tensorflow installation process is. Hours wasted for nothing. This thread alone goes back 1.5 years.

@fucker007
Copy link

may anaconda evrioment is wrong, uninstall cudunn in anaconda

@Akhtar303
Copy link

Akhtar303 commented Jan 3, 2021

I found this topic while I was trying to resolve similar problem but seems with newer version. So want to add for those who get here too.

Right now actual Python version 3.8.5 installs TensorFlow 2.3.0.
It requires cuda 10.1 and default.

Easiest way to install it together with nvidia drivers (thanks @marcfielding1) is:

sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-1 cuda-drivers -y

Once done, it cause the same:

 Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

That happens because libcublas.so.10 sits in /usr/local/cuda-10.2/lib64 (surprise from nvidia - installation of 10.1 installs some 10.2 stuff) but only /usr/local/cuda is in include path which points to /usr/local/cuda-10.1.

Adding it to include path helps and everything seems working:

export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

This is Work for me
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64

  • Ubuntu 18.04
  • Cuda 10.1
  • cudnn 7.6.5
  • Tensorflow 2.1.0
  • RTX 3090

Thanks

@plopresti
Copy link
Contributor

In case anyone finds this during a search...

I just had this problem with Tensorflow build under CUDA 11.7. The error was coming from the call to cudnnCreate() in stream_executor/cuda/cuda_dnn.c.

The problem was a mismatch between the versions of CUDA (specifically libcublas) and libcudnn8. In my case, I accidentally installed the cuda10.2 variant of the libcudnn8 RPM. Installing the correct cuda11.7 version resolved it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting response Status - Awaiting response from author subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests