Status: CUDA driver version is insufficient for CUDA runtime version #21832

mforde84 · 2018-08-23T15:52:15Z

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Kernel: 2.6.32-573.12.1.el6.x86_64
Host: RHEL 6.7
Container: Ubuntu 16.04.5 LTS
TensorFlow installed from (source or binary):
Singularity
TensorFlow version (use command below):
Tensorflow:1.10.0-devel-gpu-py3
Python version:
Python 3.5.2
GCC/Compiler version (if compiling from source):
GCC 5.4.0
CUDA/cuDNN version:
9
GPU model and memory:
Singularity tensorflow:1.10.0-devel-gpu-py3:~> nvidia-smi
Thu Aug 23 00:24:41 2018
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 39C P0 58W / 149W | 22MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
Exact command to reproduce:
$ # install nvidia driver v352.39
$ sudo singularity build --sandbox /path/to/sandbox docker://tensorflow/tensorflow/1.10.0-devel-gpu-py3
$ singularity shell -nv /path/to/sandbox
Singularity tensorflow:1.10.0-devel-gpu-py3:~> nvidia-smi
Thu Aug 23 00:24:41 2018
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 39C P0 58W / 149W | 22MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Singularity tensorflow:1.10.0-devel-gpu-py3:~> python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.

        from tensorflow.python.client import device_lib
        print(device_lib.list_local_devices())
        2018-08-23 00:26:35.424225: I
        tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports
        instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
        2018-08-23 00:26:38.208490: I
        tensorflow/core/common_runtime/gpu/gpu_device.cc:1405] Found device 0 with
        properties:
        name: Tesla K80 major: 3 minor: 7 memoryClockRate(GHz): 0.8235
        pciBusID: 0000:84:00.0
        totalMemory: 11.25GiB freeMemory: 11.16GiB
        2018-08-23 00:26:38.208576: I
        tensorflow/core/common_runtime/gpu/gpu_device.cc:1484] Adding visible gpu
        devices: 0
        Traceback (most recent call last):
        File "", line 1, in
        File
        "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/device_lib.py",
        line 41, in list_local_devices
        for s in pywrap_tensorflow.list_devices(session_config=session_config)
        File
        "/usr/local/lib/python3.5/dist-packages/tensorflow/python/pywrap_tensorflow_internal.py",
        line 1679, in list_devices
        return ListDevices(status)
        File
        "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py",
        line 519, in exit
        c_api.TF_GetCode(self.status.status))
        tensorflow.python.framework.errors_impl.InternalError: cudaGetDevice() failed.
        Status: CUDA driver version is insufficient for CUDA runtime version

Describe the problem

I built a tensorflow container with singularity. I think there might be a mismatch between the some of the card drivers and cuda libraries between the host and container. I have the container built as a sandbox so I'm able to make modifications quiet easily, I was curious if there's a way I can install appropriate cuda driver and runtimes to the container, and have the container run off those instead of pulling libraries from the host which are incompatible with the container? Is this the right way to do it? Or should I be updating the cuda drivers / libraries on the host to match the container?

The text was updated successfully, but these errors were encountered:

tensorflowbutler · 2018-08-24T01:19:53Z

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
Bazel version
Mobile device

mforde84 · 2018-08-24T17:58:30Z

Have I written custom code
N/A
Bazel version
N/A
Mobile device
N/A

mforde84 · 2018-08-24T17:59:42Z

Would https://github.com/NIH-HPC/gpu4singularity be viable for Singularity 2.6.0 with --nv flags or would I need to make additional modification to library paths?

ppwwyyxx · 2018-08-24T20:00:40Z

This is not a tensorflow issue: according to https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html your nvidia driver is not new enough for cuda 9.0

mforde84 · 2018-08-24T21:03:11Z

Sure. But the question is more on how to integrate compatible drivers into a tensorflow container. The adage about containerization is: build once, run anywhere; and not: build once, run anywhere with Nvidia drivers v485 and above plus a kernel supporting experimental filesystem overlays. Even experimental / unofficial documentation on this scenario would be extremely helpful for most HPC environments that are still running epel6. ¯_(ツ)_/¯

ppwwyyxx · 2018-08-24T22:23:04Z

The world is not perfect. I'm afraid "build once, run anywhere with nvidia drivers>=384.81" is the way to go. At least that's what nvidia says: https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements

Running a CUDA container requires a machine with at least one CUDA-capable GPU and a driver compatible with the CUDA toolkit version you are using.

nicolefinnie · 2018-10-13T22:30:49Z

@mforde84 @tensorflowbutler

I hit exactly this problem and someone else with the same combination (tensorflow 1.11 + CUDA runtime 9.0 + cudnn 7.3 + nvidia driver 390 ) hit this problem too, though nvidia driver 390 is new enough for CUDA runtime 9.0. This person opened an issue in the Nvidia DevTalk:

https://devtalk.nvidia.com/default/topic/1042575/cuda-driver-version-is-insufficient-for-cuda-runtime-version/?offset=2#5289688

And I downgraded the tensorflow version from 1.11 (the latest conda version) to 1.7 and this problem got solved. And my question is if the newer tensorflow, say 1.10+, has a dependency on specific nvidia drivers /cuda versions?

mforde84 · 2018-10-13T23:18:56Z

We upgraded to a recent version of drivers 396 and the issue resolved.

nicolefinnie · 2018-10-13T23:38:33Z

@mforde84 Thanks for the confirmation. That's what I was thinking too, but I had trouble upgrading to 396.54 due to a broken dependency, however, after having read your confirmation, I managed to install 396.54 and now it works with tensorflow 1.11.0, Yoho! Thanks! Upgraded the ticket in the Nvidia DevTalk.

azaks2 · 2018-10-15T16:49:57Z

tensorflow 1.11 + CUDA runtime 9.0 + cudnn 7.3 + nvidia driver 390
the combo should have worked. Note with 396.54 there will be one more upgrade once TF switches to CUDA 10.

hello-wangjj · 2018-10-16T11:04:27Z

@nicolefinnie , thanks, I downgraded the tensorflow version fromt o 1.7 and this problem got solved.

saskra · 2018-10-17T07:19:41Z

I tested the recommendations in this thread, but I was not able to install any other driver than 390 on Ubuntu 18.04 and downgrading tensorflow to 1.7 resulted in a new error message:

2018-10-17 09:12:21.434933: E tensorflow/stream_executor/cuda/cuda_dnn.cc:343] Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.2.1.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Segmentation fault (core dumped)

Which is strange, as I had installed version 7.3.1 on my system, but it seems that anaconda installs its own cudnn in the enviroment.

hello-wangjj · 2018-10-17T07:32:09Z

I tested the recommendations in this thread, but I was not able to install any other driver than 390 on Ubuntu 18.04 and downgrading tensorflow to 1.7 resulted in a new error message:
2018-10-17 09:12:21.434933: E tensorflow/stream_executor/cuda/cuda_dnn.cc:343] Loaded runtime CuDNN library: 7.1.2 but source was compiled with: 7.2.1.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Segmentation fault (core dumped)
Which is strange, as I had installed version 7.3.1...

@saskra ,I was use deepin15.8, nvidia-driver==390.67, cuda==9.0,cudnn==7.0, and miniconda installed tensorflow-gpu==1.7,and the problem got solved.

mforde84 · 2018-10-17T07:43:35Z

Saskra are you running in a container?

saskra · 2018-10-17T08:22:31Z

No. But I now found the solution: Anaconda creates an environment with its own incompatible cudnn version which has to be overwritten manually. :-)

PhilipMay · 2018-10-19T08:22:41Z

No. But I now found the solution: Anaconda creates an environment with its own incompatible cudnn version which has to be overwritten manually. :-)

I have the same problem. :-(
Which version of which exact conda module did you have to use to overwrite?

saskra · 2018-10-19T08:56:55Z

I have Ubuntu 18.04 which needs Nvidia driver 390. Anaconda brings cuDNN 7.2.1, which seems to be too old for this driver version: https://anaconda.org/anaconda/cudnn Now I am using the newest cuDNN version (7.3.1) as suggested by the official download site: https://developer.nvidia.com/rdp/cudnn-download btw: Anaconda's cuDNN version depends on its TensorFlow version, I have the newest one here as well (1.11).

PS: I suggested to update the version: ContinuumIO/anaconda-issues#10224

Yongyao · 2018-11-29T01:43:16Z

@mforde84 Would you mind sharing how you upgraded it?

Huixxi · 2019-03-25T06:33:34Z

check whether your nvidia-driver support your cuda version from here https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

mmattklaus · 2019-06-18T03:52:15Z

@mforde84 Would you mind sharing how you upgraded it?

As for me, upgrading my driver worked out. I run a Windows 10 PC and use TF 1.13.
( NOTE: _Just an aside, I needed to activate my virtual environment and start Jupyter notebook in that env before I was able to use TF in the notebook.)

Here is how I upgraded my driver:

Open Device Manager
Expand the display adapters
Locate your NVIDIA Graphics adapter
Right-click and click Update driver

Alternative

I found this software ( GeForce Experience ) on the NVIDIA website for my graphics family which can also be downloaded, installed and used to update the driver(s). This should work as well, though I didn't go that way.

Huixxi · 2019-06-23T03:58:44Z

@mforde84 Maybe you can get the solution from there. https://stackoverflow.com/q/41409842/7121726

ghost · 2019-08-13T21:43:25Z

Same issue here and I can't find an appropriate tensorflow version. I currently have ubuntu version 16.04.6, driver version 410.78, cuda version 10, conda version 4.7.11 and none of the above-mentioned tensorflow versions works for me. I tried 1.13.1, 1.7 and 1.14.
Anaconda installs cudnn with version 7.6.0. Edit: I forced conda to use the version 10.0 for cudatoolkit and not cuda10.1_0 as it was before (according to @saskra's suggestion), but nothing changed unfortunately.

Updating anaconda also didn't help. In fact, conda update --all and conda update conda outputs many new errors like:
InvalidArchiveError('Error with archive ... You probably need to delete and re- download or re-create this file. Message from libarchive was:...

Creating a conda environment with my current specs or simply running my python script also produces various InvalidArchiveError messages like above:

channels:
  - conda-forge
  - defaults
dependencies:
  - keras=2.2.4
  - nltk=3.3.0
  - numpy=1.15.4
  - pandas=0.23.4
  - python=3.6.6
  - scikit-learn=0.20.0
  - scipy=1.1.0
  - tensorflow=1.7
  - tensorflow-gpu=1.7
  - cython=0.29
  - pip:
    - fasttext==0.8.3
    - fuzzywuzzy==0.17.0
    - python-levenshtein==0.12.0
    - subsample==0.0.6
    - talos
    - tabulate==0.8.3

agostini01 · 2019-08-16T01:46:11Z

I had a similar issue using driver 384.130. Turns out that versions of the cudatoolkit inside anaconda environment and the cuda supported by my driver did not match.

These two links helped me identifying my driver and cuda version and, later, to install the correct version of tensorflow_gpu that matched the cuda in my machine

To select the appropriate version based on your cuda installation:
https://www.tensorflow.org/install/source#tested_build_configurations

Version	Python version	Compiler	Build tools	cuDNN	CUDA
tensorflow_gpu-1.14.0	2.7, 3.3-3.7	GCC 4.8	Bazel 0.24.1	7.4	10.0
tensorflow_gpu-1.13.1	2.7, 3.3-3.7	GCC 4.8	Bazel 0.19.2	7.4	10.0
tensorflow_gpu-1.12.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0	7	9
tensorflow_gpu-1.11.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0	7	9
tensorflow_gpu-1.10.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.15.0	7	9
tensorflow_gpu-1.9.0	2.7, 3.3-3.6	GCC 4.8	Bazel 0.11.0	7	9

The cuda versions may have minor-versions (9.0, 9.2), thus you should double check what exactly you are installing with conda.
To check what you have inside your conda enviroment and how to install a different version
https://stackoverflow.com/a/55351774/2971299

So, I identified my cuda version

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

And installed the correct anaconda environment:

conda create -n gpu tensorflow-gpu==1.9.0 jupyter

ghost · 2019-09-19T20:15:51Z

Thank you very much @agostini01 . I actually have all versions aligned correctly. The only thing that actually worked out is the second answer here: https://stackoverflow.com/questions/41402409/tensorflow-doesnt-seem-to-see-my-gpu
I uninstalled tensorflow and reinstalled tensorflow-gpu. Apparently they don't go well together?
Now Python sees my GPUs and when I do watch-smi I can see my job using them.

agostini01 · 2019-09-19T20:46:12Z

@KonstantinaLazaridou no problems. I believe your suggested link is for when you are installing cuda system wide.

This line: conda create -n gpu tensorflow-gpu==1.9.0 jupyter cudatoolkit==XX should work as long as you match the anaconda tensorflow-gpu version with the correct anaconda cudatoolkit (XX) and "system-wide installed" cuda driver. Unfortunately I dont remember what to use for the XX value anymore.
|Apparently they don't go well together?
indeed! Nice catch. The advantage of using conda is that you can have tensorflow in one environment and tensorflow-gpu in another.

MagaretJi · 2019-12-17T02:38:29Z

@mforde84 I had a similar issue using driver 384.81,but Nvidia recommended Tesla k80 need install driver 384.183.So upgraded to a recent version of drivers 396 is a good choice???
GPU Tesla k80
tensorflow-gpu 1.10.0
CDUNN 7.0.5
CUDA 9.0

2019-12-17 09:55:46.558571: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-17 09:55:46.558747: E tensorflow/stream_executor/cuda/cuda_dnn.cc:463] possibly insufficient driver version: 384.81.0
2019-12-17 09:55:46.558864: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)

turinglife · 2020-03-09T12:22:42Z

### nvidia drivers mismatch

my nvidia driver is 384.90.

before: error which is same as the title of the thread.
tensorflow-gpu 1.15.0 with cudatoolkit 10.0.130 + cudnn 7.6.5

after: Worked
tensorflow-gpu 1.12.0 with cudatoolkit 9.0

solution:
conda uninstall cudatoolkit (10.0.130)
conda install tensorflow-gpu 1.12 cudatoolkit=9.0

shivam1702 · 2021-06-14T11:55:05Z

This error also occurs if you create a symbolic link for any CUDA shared object file with a higher version to a shared object. with a lower version.

For example, for me this error was occurring because I had a symbolic link from /usr/local/cuda-10.0/lib64/libcudart.so pointing towards: /usr/local/cuda/lib64/libcudart.so.10.1, among other symlinks.

When I removed just this symlink, the error vanished, but I noticed that there was no significant difference between the training times between GPU and CPU, despite the GPU process showing up in nvidia-smi, while the other one obviously didn't. They were exactly the same. Weird issue.

tensorflowbutler assigned tatianashp Aug 24, 2018

tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Aug 24, 2018

mforde84 closed this as completed Aug 24, 2018

tatianashp assigned azaks2 and unassigned tatianashp Oct 13, 2018

tatianashp added the type:build/install Build and install issues label Oct 13, 2018

pcpLiu mentioned this issue Oct 16, 2018

Could not create cudnn handle & Possibly insufficient driver version pcpLiu/DeepSeqPan#2

Closed

wonjininfo mentioned this issue May 14, 2019

What version of tensorflow/CUDA was used? dmis-lab/biobert#30

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Status: CUDA driver version is insufficient for CUDA runtime version #21832

Status: CUDA driver version is insufficient for CUDA runtime version #21832

mforde84 commented Aug 23, 2018

tensorflowbutler commented Aug 24, 2018

mforde84 commented Aug 24, 2018

mforde84 commented Aug 24, 2018

ppwwyyxx commented Aug 24, 2018

mforde84 commented Aug 24, 2018 •

edited

ppwwyyxx commented Aug 24, 2018

nicolefinnie commented Oct 13, 2018

mforde84 commented Oct 13, 2018

nicolefinnie commented Oct 13, 2018 •

edited

azaks2 commented Oct 15, 2018

hello-wangjj commented Oct 16, 2018

saskra commented Oct 17, 2018 •

edited

hello-wangjj commented Oct 17, 2018

mforde84 commented Oct 17, 2018

saskra commented Oct 17, 2018

PhilipMay commented Oct 19, 2018

saskra commented Oct 19, 2018 •

edited

Yongyao commented Nov 29, 2018

Huixxi commented Mar 25, 2019

mmattklaus commented Jun 18, 2019

Huixxi commented Jun 23, 2019 •

edited

ghost commented Aug 13, 2019 •

edited by ghost

agostini01 commented Aug 16, 2019

ghost commented Sep 19, 2019

agostini01 commented Sep 19, 2019

MagaretJi commented Dec 17, 2019

turinglife commented Mar 9, 2020 •

edited

shivam1702 commented Jun 14, 2021 •

edited

Status: CUDA driver version is insufficient for CUDA runtime version #21832

Status: CUDA driver version is insufficient for CUDA runtime version #21832

Comments

mforde84 commented Aug 23, 2018

Describe the problem

tensorflowbutler commented Aug 24, 2018

mforde84 commented Aug 24, 2018

mforde84 commented Aug 24, 2018

ppwwyyxx commented Aug 24, 2018

mforde84 commented Aug 24, 2018 • edited

ppwwyyxx commented Aug 24, 2018

nicolefinnie commented Oct 13, 2018

mforde84 commented Oct 13, 2018

nicolefinnie commented Oct 13, 2018 • edited

azaks2 commented Oct 15, 2018

hello-wangjj commented Oct 16, 2018

saskra commented Oct 17, 2018 • edited

hello-wangjj commented Oct 17, 2018

mforde84 commented Oct 17, 2018

saskra commented Oct 17, 2018

PhilipMay commented Oct 19, 2018

saskra commented Oct 19, 2018 • edited

Yongyao commented Nov 29, 2018

Huixxi commented Mar 25, 2019

mmattklaus commented Jun 18, 2019

Alternative

Huixxi commented Jun 23, 2019 • edited

ghost commented Aug 13, 2019 • edited by ghost

agostini01 commented Aug 16, 2019

ghost commented Sep 19, 2019

agostini01 commented Sep 19, 2019

MagaretJi commented Dec 17, 2019

turinglife commented Mar 9, 2020 • edited

shivam1702 commented Jun 14, 2021 • edited

mforde84 commented Aug 24, 2018 •

edited

nicolefinnie commented Oct 13, 2018 •

edited

saskra commented Oct 17, 2018 •

edited

saskra commented Oct 19, 2018 •

edited

Huixxi commented Jun 23, 2019 •

edited

ghost commented Aug 13, 2019 •

edited by ghost

turinglife commented Mar 9, 2020 •

edited

shivam1702 commented Jun 14, 2021 •

edited