Using tensorflow gpu 2.1 with Cuda 10.2 #34759

zaccharieramzi · 2019-12-02T14:26:12Z

OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: pip
TensorFlow version: 2.1.0rc0
Python version: 3.6.8
Installed using virtualenv? pip? conda?: pip
CUDA/cuDNN version: 10.2
GPU model and memory: Quadro P5000, 16GB

Describe the problem

I want to use tensorflow-gpu==2.1.0rc0 with cuda 10.2 and it seems that it can't work right now.
When I use tensorflow-gpu=2.0.0 it works perfectly fine.

Provide the exact sequence of commands / steps that you executed before running into the problem

mkdir tests2 &&\
cd tests2 &&\
virtualenv -p /usr/bin/python3.6 venv &&\
source venv/bin/activate &&\
pip install tensorflow-gpu==2.1.0rc0 &&\
python -c 'import tensorflow'

Which gives the following warnings:

2019-12-02 15:23:46.869198: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:46.869227: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2019-12-02 15:23:47.516321: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:47.516433: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:47.516449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Any other info / logs
When I do locate libcudart.so, I get the following:

/usr/local/cuda-10.0/doc/man/man7/libcudart.so.7
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0.130
/usr/local/cuda-10.2/doc/man/man7/libcudart.so.7
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89

locate libnvinfer_plugin.so is empty.

The text was updated successfully, but these errors were encountered:

ymodak · 2019-12-02T18:50:13Z

As printed in the stack trace Could not load dynamic library 'libcudart.so.10.1'. You have to roll back to cuda 10.1 in order to use TF 2.1 binary. For using cuda 10.2 you have to install TF from sources.
Also, tensorflow pip package (TF 2.1) now includes GPU support by default (same as tensorflow-gpu) for both Linux and Windows.

zaccharieramzi · 2019-12-02T19:57:34Z

Thanks for your answer.
I am a bit surprised though because I am able to use tf 2.0.0 just fine (with cuda 10.2). How is it possible?

ymodak · 2019-12-02T20:27:25Z

You are able to import tf cpu version in tf 2.0
Reason being when you installed tf 2.0.0 without specifying accelerator(gpu) it installed both CPU and GPU support.
To check this you may try printing;

import tensorflow as tf
tf.test.is_gpu_available()

zaccharieramzi · 2019-12-02T20:29:40Z

No I installed specifically tensorflow gpu, didn't get the warning at import and more than simply using this function to test, I monitored the GPU usage during training (and my training was way faster than when I was masking the GPU).

zaccharieramzi · 2019-12-02T22:04:50Z

When I look at the logs for tf 2.0.0, I see that it's loading packages from the 10.0 version of Cuda. These packages are certainly legacy packages that I still have, and so that's why it's working even though my main Cuda is 10.2.
I'll roll back to 10.1 to use tf 2.1.0

gadagashwini-zz · 2019-12-05T10:24:56Z

@zaccharieramzi, Were you able to install Tensorflow-gpu 2.1 with Cuda 10.1.

zaccharieramzi · 2019-12-05T10:33:54Z

Hi @gadagashwini , I didn't have time to try. I can't really try right now, cos I am too afraid to mess up my conda environment atm(haha). But I can let you know in a few days.

gadagashwini-zz · 2019-12-06T09:07:31Z

Thanks @zaccharieramzi. Please try and let us know how it progresses. Thanks!

EwoutH · 2019-12-09T17:08:30Z

@zaccharieramzi They got Cuda 10.1 working in another thread: #34429 (comment). Could you verify?

pisiiki · 2019-12-12T12:50:02Z

I managed to build master with --config=v1 and cuda 10.2(gcc), numpy 1.17.4, however I get reproducible segfaults on code that runs fine with tf 1.x/cuda 10.1. OS is a fresh new Ubuntu 18.04, cuda/cudnn from debs. Hardware is a xeon 2695v2, 128GB RAM, 1060ti:

2019-12-12 09:53:39.102734: F ./tensorflow/core/util/gpu_launch_config.h:129] Check failed: work_element_count > 0 (0 vs. 0)
[xeon:22138] *** Process received signal ***
[xeon:22138] Signal: Aborted (6)
[xeon:22138] Signal code:  (-6)
[xeon:22138] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x3ef20)[0x7f650fe53f20]
[xeon:22138] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f650fe53e97]
[xeon:22138] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f650fe55801]
[xeon:22138] [ 3] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(+0xaa769c7)[0x7f6464a8b9c7]
[xeon:22138] [ 4] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow7functor9ApplyAdamIN5Eigen9GpuDeviceEfEclERKS3_NS2_9TensorMapINS2_6TensorIfLi1ELi1ElEELi16ENS2_11MakePointerEEESB_SB_NS7_INS2_15TensorFixedSizeIKfNS2_5SizesIJEEELi1ElEELi16ESA_EESH_SH_SH_SH_SH_NS7_INS8_ISD_Li1ELi1ElEELi16ESA_EEb+0x40f)[0x7f646322fb7f]
[xeon:22138] [ 5] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/_pywrap_tensorflow_internal.so(_ZN10tensorflow11ApplyAdamOpIN5Eigen9GpuDeviceEfE7ComputeEPNS_15OpKernelContextE+0x52c)[0x7f646315015c]
[xeon:22138] [ 6] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2(_ZN10tensorflow13BaseGPUDevice7ComputeEPNS_8OpKernelEPNS_15OpKernelContextE+0xe6)[0x7f6475574b76]
[xeon:22138] [ 7] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2(+0xf75665)[0x7f64755df665]
[xeon:22138] [ 8] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2(+0xf75d2f)[0x7f64755dfd2f]
[xeon:22138] [ 9] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2(_ZN5Eigen15ThreadPoolTemplIN10tensorflow6thread16EigenEnvironmentEE10WorkerLoopEi+0x4b1)[0x7f64756cdbc1]
[xeon:22138] [10] /home/i/.local/lib/python3.6/site-packages/tensorflow_core/python/../libtensorflow_framework.so.2(_ZNSt17_Function_handlerIFvvEZN10tensorflow6thread16EigenEnvironment12CreateThreadESt8functionIS0_EEUlvE_E9_M_invokeERKSt9_Any_data+0x43)[0x7f64756caed3]
[xeon:22138] [11] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xbd66f)[0x7f648ff7266f]
[xeon:22138] [12] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db)[0x7f650fbfd6db]
[xeon:22138] [13] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7f650ff3688f]
[xeon:22138] *** End of error message ***

Weirdest thing I'm doing in the code is to have tensors with a dimension axis at 0 (ie: shape@(40,4,0,50)). This happens because I'm doing a network architecture search process.

I'm rolling back to 10.1 meanwhile, Regards.

EwoutH · 2019-12-12T12:54:13Z

@pisiiki Did your build include this patch? #34885

pisiiki · 2019-12-12T13:29:12Z

@EwoutH it think so, I did a pull on master yesterday and this was merged 6 days ago.

edit:

I have just crashed with a prebuilt conda tf-gpu 1.15 package. It seems that my code may crash with a segfault or get some numerical overflows depending on driver, python version, tf version, packages, etc. Conda has its own cuda 10.0 runtime for this package. I'm still on 440.33.01(cuda 10.2) driver btw.

I didn't notice before because my previous setup (py3.6 tf 1.15 cuda 10.1) simply threw exceptions. It turned some vars into infs on a train call and I raised the exceptions myself. I suposed all of these were regular instability/overflows but now I think some may be related with 0 sized dimensions at some tensors.

I will rollback the driver to 10.1, however I guess this is worth checking out. I will also try to reproduce the error with minimal code.

gadagashwini-zz · 2019-12-30T08:38:18Z

@zaccharieramzi, Any update on this issue!

zaccharieramzi · 2019-12-31T11:37:07Z

Sorry, I didn't get a chance to check on this with the holiday season and all.
Would you rather have me try to build with cuda 10.2 or try and install cuda 10.1?

lijiaying · 2020-01-02T01:52:32Z

OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: pip
TensorFlow version: 2.1.0rc0
Python version: 3.6.8
Installed using virtualenv? pip? conda?: pip
CUDA/cuDNN version: 10.2
GPU model and memory: Quadro P5000, 16GB

Describe the problem

I want to use tensorflow-gpu==2.1.0rc0 with cuda 10.2 and it seems that it can't work right now.
When I use tensorflow-gpu=2.0.0 it works perfectly fine.

Provide the exact sequence of commands / steps that you executed before running into the problem

mkdir tests2 &&\
cd tests2 &&\
virtualenv -p /usr/bin/python3.6 venv &&\
source venv/bin/activate &&\
pip install tensorflow-gpu==2.1.0rc0 &&\
python -c 'import tensorflow'

Which gives the following warnings:

2019-12-02 15:23:46.869198: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:46.869227: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2019-12-02 15:23:47.516321: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:47.516433: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:47.516449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Any other info / logs
When I do locate libcudart.so, I get the following:

/usr/local/cuda-10.0/doc/man/man7/libcudart.so.7
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0.130
/usr/local/cuda-10.2/doc/man/man7/libcudart.so.7
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89

locate libnvinfer_plugin.so is empty.

I have similar issues here. The way help me out is to build tensorflow from source. It seems the prebuild tensorflow is not compatible with cuda10.2 quite well.

zaccharieramzi · 2020-01-02T11:03:26Z

I rolled back to cuda 10.1, and everything seems to work fine. I am going to close this since tensorflow 2.1 is not supposed to be directly usable with cuda 10.2, and from @lijiaying 's comment, I understand there is no issue with building from source.

tensorflow-bot · 2020-01-02T11:03:28Z

Are you satisfied with the resolution of your issue?
Yes
No

alokssingh · 2020-01-16T09:33:06Z

OS Platform and Distribution: Linux Ubuntu 16.04
TensorFlow installed from: pip
TensorFlow version: 2.1.0rc0
Python version: 3.6.8
Installed using virtualenv? pip? conda?: pip
CUDA/cuDNN version: 10.2
GPU model and memory: Quadro P5000, 16GB

Describe the problem
I want to use tensorflow-gpu==2.1.0rc0 with cuda 10.2 and it seems that it can't work right now.
When I use tensorflow-gpu=2.0.0 it works perfectly fine.
Provide the exact sequence of commands / steps that you executed before running into the problem

mkdir tests2 &&\
cd tests2 &&\
virtualenv -p /usr/bin/python3.6 venv &&\
source venv/bin/activate &&\
pip install tensorflow-gpu==2.1.0rc0 &&\
python -c 'import tensorflow'

Which gives the following warnings:

2019-12-02 15:23:46.869198: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:46.869227: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2019-12-02 15:23:47.516321: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:47.516433: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: :/usr/local/cuda/extras/CUPTI/lib64
2019-12-02 15:23:47.516449: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Any other info / logs
When I do locate libcudart.so, I get the following:

/usr/local/cuda-10.0/doc/man/man7/libcudart.so.7
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart.so.10.0.130
/usr/local/cuda-10.2/doc/man/man7/libcudart.so.7
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2
/usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudart.so.10.2.89

locate libnvinfer_plugin.so is empty.

I have similar issues here. The way help me out is to build tensorflow from source. It seems the prebuild tensorflow is not compatible with cuda10.2 quite well.

did you get any solution for your problem??

zaccharieramzi · 2020-01-16T10:16:26Z

@aloksingh3110 like I said in the comment just above, I rolled back to cuda 10.1. This person has run into problems when building from source with cuda 10.2, so I advise you to roll back to 10.1.

SalahAdDin · 2020-01-21T11:05:49Z

I have the same problem, but using CUDA 10.1:

Python 3.5.2 (default, Oct  8 2019, 13:06:37) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2020-01-21 16:33:12.576855: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /usr/lib/x86_64-linux-gnu/libnvinfer.so.6)
2020-01-21 16:33:12.577361: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.27' not found (required by /usr/lib/x86_64-linux-gnu/libnvinfer.so.6)
2020-01-21 16:33:12.577381: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2020-01-21 16:33:15.483537: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3491610000 Hz
2020-01-21 16:33:15.484394: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x4de73f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-01-21 16:33:15.484448: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-01-21 16:33:15.487966: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-01-21 16:33:15.518212: I tensorflow/compiler/xla/service/platform_util.cc:205] StreamExecutor cuda device (0) is of insufficient compute capability: 3.5 required, device is 3.0
2020-01-21 16:33:15.518325: I tensorflow/compiler/jit/xla_gpu_device.cc:136] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Internal: no supported devices found for platform CUDA
2020-01-21 16:33:15.518765: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1555] Found device 0 with properties: 
pciBusID: 0000:05:00.0 name: Quadro K2000 computeCapability: 3.0
coreClock: 0.954GHz coreCount: 2 deviceMemorySize: 1.94GiB deviceMemoryBandwidth: 59.60GiB/s
2020-01-21 16:33:15.518997: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2020-01-21 16:33:15.520344: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-01-21 16:33:15.521455: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-01-21 16:33:15.521690: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-01-21 16:33:15.523131: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-01-21 16:33:15.523940: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-01-21 16:33:15.527679: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-01-21 16:33:15.528518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1651] Ignoring visible gpu device (device: 0, name: Quadro K2000, pci bus id: 0000:05:00.0, compute capability: 3.0) with Cuda compute capability 3.0. The minimum required Cuda capability is 3.5.
2020-01-21 16:33:15.528552: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1096] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-01-21 16:33:15.528566: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102]      0 
2020-01-21 16:33:15.528579: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] 0:   N

hd1090 · 2020-01-22T21:08:56Z

I'm relatively very new to this, so I might be wrong, but @SalahAdDin it seems like you're missing the libnvinfer library found in TensorRT (see here). CUDA-10.1 seems to have loaded fine.

maximuslee1226 · 2020-01-28T18:49:17Z

@hd1090. I tried to install tensorRT in cuda 10.1. My gpu is working, but tensorRT installation is erroring out on me with the following message. Do you know what could be wrong here?
(base) prompt$:~$ sudo apt install tensorrt
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
tensorrt : Depends: libnvinfer6 (= 6.0.1-1+cuda10.1) but 6.0.1-1+cuda10.2 is to be installed
Depends: libnvinfer-plugin6 (= 6.0.1-1+cuda10.1) but 6.0.1-1+cuda10.2 is to be installed
Depends: libnvparsers6 (= 6.0.1-1+cuda10.1) but 6.0.1-1+cuda10.2 is to be installed
Depends: libnvonnxparsers6 (= 6.0.1-1+cuda10.1) but 6.0.1-1+cuda10.2 is to be installed
Depends: libnvinfer-bin (= 6.0.1-1+cuda10.1) but it is not going to be installed
Depends: libnvinfer-dev (= 6.0.1-1+cuda10.1) but 7.0.0-1+cuda10.2 is to be installed
Depends: libnvinfer-plugin-dev (= 6.0.1-1+cuda10.1) but 7.0.0-1+cuda10.2 is to be installed
Depends: libnvparsers-dev (= 6.0.1-1+cuda10.1) but 7.0.0-1+cuda10.2 is to be installed
Depends: libnvonnxparsers-dev (= 6.0.1-1+cuda10.1) but 7.0.0-1+cuda10.2 is to be installed
Depends: libnvinfer-samples (= 6.0.1-1+cuda10.1) but it is not going to be installed
Depends: libnvinfer-doc (= 6.0.1-1+cuda10.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

SalahAdDin · 2020-01-29T14:55:15Z

@maximuslee1226
Try doing this:

!sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.0 \
    libnvinfer-dev=6.0.1-1+cuda10.0 \
    libnvinfer-plugin6=6.0.1-1+cuda10.0

Using CUDA 10.1.

alihamid996 · 2020-03-12T16:44:21Z

@zaccharieramzi did you find any other solution other then rolling back to 10.1?

zaccharieramzi · 2020-03-13T09:50:05Z

@alihamid996 I didn't try anything else, so I couldn't tell you myself. @lijiaying 's comment suggests that it's possible to build tf 2.1 from source with cuda 10.2 though.

antonywu · 2020-05-01T21:26:07Z

My head hurts just to see it is such a pain in the rear to get all the moving pieces exactly right to have GPU support. NVidia obviously wants you to install 10.2... and here goes Tensorflow only works with 10.1 out of box. Now I have to recompile the OpenCV from scratch.. Just wonderful.

vitalyisaev2 · 2020-05-14T22:56:27Z

Could you please clarify. are there any plans to support 10.2 in the next release of Tensorflow?

Unfortunately there are no CUDA 10.1 packages for modern RHEL based repos like Centos 8 or Fedora 30+. Only CUDA 10.2 is available for these distributives.

mariusmotea · 2020-05-15T11:18:29Z

New sdcard image for Jetson Nano comes with CUDA 10.2 preinstalled and older images cannot be downloaded anymore. Not sure if is possible to downgrade because it seams nVidia blacklist older packages.

bm777 · 2020-05-18T06:52:19Z

You are able to import tf cpu version in tf 2.0
Reason being when you installed tf 2.0.0 without specifying accelerator(gpu) it installed both CPU and GPU support.
To check this you may try printing;
import tensorflow as tf
tf.test.is_gpu_available()

Is there any tensorflow installation solution for those person which have CUDA 10.2 installed, gcc 7.5.0 and ubuntu18.04? if so, only with tensorflow 2.1?

mihaimaruseac · 2020-05-18T21:53:12Z

#38194 (comment) seem to have a solution

palisadoes · 2020-05-20T21:15:13Z

I had a similar problem with 10.2 using tf.config.experimental.list_physical_devices('GPU'). Cuda v10.2 was installed using this command after installing the ubuntu 18.04 cuda and nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb repos per the TensorFlow documetation at https://www.tensorflow.org/install/gpu

apt-get install -y --no-install-recommends \
cuda-10-2 \ 
libcudnn7=7.6.5.32-1+cuda10.2  \
libcudnn7-dev=7.6.5.32-1+cuda10.2 \
libnvinfer7=7.0.0-1+cuda10.2 \
libnvinfer-dev=7.0.0-1+cuda10.2 \
libnvinfer-plugin7=7.0.0-1+cuda10.2

The error when trying tf.config.experimental.list_physical_devices('GPU') can be seen below:

$ python3
Python 3.8.2 (default, Apr 27 2020, 15:53:34) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf                            
>>> tf.config.experimental.list_physical_devices('GPU')
2020-05-20 14:02:50.885725: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-05-20 14:02:50.903802: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:981] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-05-20 14:02:50.904116: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:01:00.0 name: GeForce GTX 1070 Ti computeCapability: 6.1
coreClock: 1.683GHz coreCount: 19 deviceMemorySize: 7.92GiB deviceMemoryBandwidth: 238.66GiB/s
2020-05-20 14:02:50.904232: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-05-20 14:02:50.905355: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2020-05-20 14:02:50.906434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2020-05-20 14:02:50.906614: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2020-05-20 14:02:50.907827: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2020-05-20 14:02:50.908474: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2020-05-20 14:02:50.910932: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2020-05-20 14:02:50.910947: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[]
>>>

This error Could not load dynamic library 'libcudart.so.10.1' was the clue.

This resolved it:

apt-get -y install cuda-cudart-10-1

petervandenabeele · 2020-05-25T11:22:57Z

UPDATE: WARNING below #34759 (comment)

FYI, we have a tested work-around (symlink fake libcudart.so.10.2 to real libcudart.so.10.1) for TF 2.2 and cuda 10.2 for Ubuntu 20.04 and Windows in #38194 (comment)
and #38194 (comment)

I am using this for days now, mainly with CPU and I never got trouble with it (so libcudart 10.1 and 10.2 really seem compatible, as was promised higher up in that thread NOT SURE !!).

sanjoy · 2020-05-26T05:44:59Z

so libcudart 10.1 and 10.2 really seem compatible, as was promised higher up in that thread

I don't think libcudart 10.1 and 10.2 are ABI compatible (doc). Symlinking 10.2 to 10.1 may seem to work, but there is no guarantee that e.g. this will not fry your GPU.

petervandenabeele · 2020-05-26T13:24:06Z

so libcudart 10.1 and 10.2 really seem compatible, as was promised higher up in that thread

I don't think libcudart 10.1 and 10.2 are ABI compatible (doc). Symlinking 10.2 to 10.1 may seem to work, but there is no guarantee that e.g. this will not fry your GPU.

Wow, that would be bad ...

To be clear, I am doing most of the actual work on the CPU (this smallish GM107GLM [Quadro M1200 Mobile] GPU with 2.5G free RAM, faces OOM very quickly for any real work, and when it does not face OOM, it seems a factor 2 slower than the 8 core CPU).

braindotai · 2020-07-27T06:23:58Z

Can anyone tell about when tensorflow gpu would be able to run with cuda 10.2?

mihaimaruseac · 2020-07-27T15:41:14Z

At least after the TF 2.4 release

sanjoy · 2020-07-28T00:55:05Z

Can anyone tell about when tensorflow gpu would be able to run with cuda 10.2?

Our current plan is to use move TF 2.4 to CUDA 11.

PointCloudYC · 2021-01-19T15:20:30Z

Managed to install tensorflow-gpu 2.3 with cudatoolkit 10.1 on my cuda 10.2 driver(Jan 19th,2021)

conda create --name tf2 python=3.8.3
conda install cudnn==7.6.4
pip install tensorflow-gpu=2.3

gadagashwini-zz self-assigned this Dec 3, 2019

gadagashwini-zz added subtype: ubuntu/linux Ubuntu/Linux Build/Installation Issues TF 2.1 for tracking issues in 2.1 release type:build/install Build and install issues labels Dec 3, 2019

gadagashwini-zz added the stat:awaiting response Status - Awaiting response from author label Dec 5, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 5, 2019

gadagashwini-zz added the stat:awaiting response Status - Awaiting response from author label Dec 30, 2019

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Dec 31, 2019

zaccharieramzi closed this as completed Jan 2, 2020

gadagashwini-zz mentioned this issue Jan 29, 2020

Dyanmic Library not loading tensorflow-gpu #36294

Closed

eduardinjo mentioned this issue Feb 19, 2020

'libnvinfer.so.6' and plugin missing when building custom-op-gpu on ubuntu14 tensorflow/custom-op#46

Closed

besirkurtulmus mentioned this issue Mar 24, 2020

Limit tf gpu usage algorithmiaio/langpacks#136

Merged

gadagashwini-zz mentioned this issue Apr 16, 2020

Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory #38578

Closed

petervandenabeele mentioned this issue May 26, 2020

Will TensorFlow 2.2.0 support CUDA 10.2? #38194

Closed

Saduf2019 mentioned this issue Jun 8, 2020

Tensorflow v2.2 build fails with cuda 10.2 TensorRT 7.0.0.11-1 #40245

Closed

ravikyram mentioned this issue Aug 10, 2020

libcudart.so.10.1 is not included in the cuda 10.2 package #42166

Closed

Saduf2019 mentioned this issue Sep 4, 2020

Tensorflow GPU Support Installation instructions do not work due to missing cuBLAS #42936

Closed

shmpwk mentioned this issue Nov 29, 2021

install_cuda.sh expects to install cuda 10-2 but install cuda-11, 11-5 shmpwk/dotfiles#1

Closed

Using tensorflow gpu 2.1 with Cuda 10.2 #34759

Using tensorflow gpu 2.1 with Cuda 10.2 #34759

Comments

zaccharieramzi commented Dec 2, 2019

ymodak commented Dec 2, 2019 • edited

zaccharieramzi commented Dec 2, 2019

ymodak commented Dec 2, 2019

zaccharieramzi commented Dec 2, 2019

zaccharieramzi commented Dec 2, 2019

gadagashwini-zz commented Dec 5, 2019

zaccharieramzi commented Dec 5, 2019

gadagashwini-zz commented Dec 6, 2019

EwoutH commented Dec 9, 2019 • edited

pisiiki commented Dec 12, 2019

EwoutH commented Dec 12, 2019

pisiiki commented Dec 12, 2019 • edited

gadagashwini-zz commented Dec 30, 2019

zaccharieramzi commented Dec 31, 2019

lijiaying commented Jan 2, 2020

zaccharieramzi commented Jan 2, 2020

tensorflow-bot bot commented Jan 2, 2020

alokssingh commented Jan 16, 2020

zaccharieramzi commented Jan 16, 2020 • edited

SalahAdDin commented Jan 21, 2020 • edited

hd1090 commented Jan 22, 2020 • edited

maximuslee1226 commented Jan 28, 2020

SalahAdDin commented Jan 29, 2020

alihamid996 commented Mar 12, 2020

zaccharieramzi commented Mar 13, 2020

antonywu commented May 1, 2020

vitalyisaev2 commented May 14, 2020

mariusmotea commented May 15, 2020

bm777 commented May 18, 2020

mihaimaruseac commented May 18, 2020

palisadoes commented May 20, 2020

petervandenabeele commented May 25, 2020 • edited

sanjoy commented May 26, 2020

petervandenabeele commented May 26, 2020

braindotai commented Jul 27, 2020

mihaimaruseac commented Jul 27, 2020

sanjoy commented Jul 28, 2020

PointCloudYC commented Jan 19, 2021

ymodak commented Dec 2, 2019 •

edited

EwoutH commented Dec 9, 2019 •

edited

pisiiki commented Dec 12, 2019 •

edited

zaccharieramzi commented Jan 16, 2020 •

edited

SalahAdDin commented Jan 21, 2020 •

edited

hd1090 commented Jan 22, 2020 •

edited

petervandenabeele commented May 25, 2020 •

edited