New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Status: CUDA driver version is insufficient for CUDA runtime version #21832
Comments
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. |
Have I written custom code |
Would https://github.com/NIH-HPC/gpu4singularity be viable for Singularity 2.6.0 with --nv flags or would I need to make additional modification to library paths? |
This is not a tensorflow issue: according to https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html your nvidia driver is not new enough for cuda 9.0 |
Sure. But the question is more on how to integrate compatible drivers into a tensorflow container. The adage about containerization is: build once, run anywhere; and not: build once, run anywhere with Nvidia drivers v485 and above plus a kernel supporting experimental filesystem overlays. Even experimental / unofficial documentation on this scenario would be extremely helpful for most HPC environments that are still running epel6. ¯_(ツ)_/¯ |
The world is not perfect. I'm afraid "build once, run anywhere with nvidia drivers>=384.81" is the way to go. At least that's what nvidia says: https://github.com/NVIDIA/nvidia-docker/wiki/CUDA#requirements
|
I hit exactly this problem and someone else with the same combination ( And I downgraded the tensorflow version from |
We upgraded to a recent version of drivers 396 and the issue resolved. |
@mforde84 Thanks for the confirmation. That's what I was thinking too, but I had trouble upgrading to |
|
@nicolefinnie , thanks, I downgraded the tensorflow version fromt o 1.7 and this problem got solved. |
I tested the recommendations in this thread, but I was not able to install any other driver than 390 on Ubuntu 18.04 and downgrading tensorflow to 1.7 resulted in a new error message:
Which is strange, as I had installed version 7.3.1 on my system, but it seems that anaconda installs its own cudnn in the enviroment. |
@saskra ,I was use deepin15.8, nvidia-driver==390.67, cuda==9.0,cudnn==7.0, and miniconda installed tensorflow-gpu==1.7,and the problem got solved. |
Saskra are you running in a container? |
No. But I now found the solution: Anaconda creates an environment with its own incompatible cudnn version which has to be overwritten manually. :-) |
I have the same problem. :-( |
I have Ubuntu 18.04 which needs Nvidia driver 390. Anaconda brings cuDNN 7.2.1, which seems to be too old for this driver version: https://anaconda.org/anaconda/cudnn Now I am using the newest cuDNN version (7.3.1) as suggested by the official download site: https://developer.nvidia.com/rdp/cudnn-download btw: Anaconda's cuDNN version depends on its TensorFlow version, I have the newest one here as well (1.11). PS: I suggested to update the version: ContinuumIO/anaconda-issues#10224 |
@mforde84 Would you mind sharing how you upgraded it? |
check whether your nvidia-driver support your cuda version from here https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html |
As for me, upgrading my driver worked out. I run a Windows 10 PC and use TF 1.13. Here is how I upgraded my driver:
Alternative
|
@mforde84 Maybe you can get the solution from there. https://stackoverflow.com/q/41409842/7121726 |
Same issue here and I can't find an appropriate tensorflow version. I currently have ubuntu version Updating anaconda also didn't help. In fact, Creating a conda environment with my current specs or simply running my python script also produces various
|
I had a similar issue using driver 384.130. Turns out that versions of the cudatoolkit inside anaconda environment and the cuda supported by my driver did not match. These two links helped me identifying my driver and cuda version and, later, to install the correct version of tensorflow_gpu that matched the cuda in my machine To select the appropriate version based on your cuda installation:
The cuda versions may have minor-versions (9.0, 9.2), thus you should double check what exactly you are installing with conda. So, I identified my cuda version
And installed the correct anaconda environment:
|
Thank you very much @agostini01 . I actually have all versions aligned correctly. The only thing that actually worked out is the second answer here: https://stackoverflow.com/questions/41402409/tensorflow-doesnt-seem-to-see-my-gpu |
@KonstantinaLazaridou no problems. I believe your suggested link is for when you are installing cuda system wide. This line: |
@mforde84 I had a similar issue using driver 384.81,but Nvidia recommended Tesla k80 need install driver 384.183.So upgraded to a recent version of drivers 396 is a good choice??? 2019-12-17 09:55:46.558571: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED |
### nvidia drivers mismatch my nvidia driver is 384.90. before: error which is same as the title of the thread. after: Worked solution: |
This error also occurs if you create a symbolic link for any CUDA shared object file with a higher version to a shared object. with a lower version. For example, for me this error was occurring because I had a symbolic link from When I removed just this symlink, the error vanished, but I noticed that there was no significant difference between the training times between GPU and CPU, despite the GPU process showing up in |
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
Kernel: 2.6.32-573.12.1.el6.x86_64
Host: RHEL 6.7
Container: Ubuntu 16.04.5 LTS
TensorFlow installed from (source or binary):
Singularity
TensorFlow version (use command below):
Tensorflow:1.10.0-devel-gpu-py3
Python version:
Python 3.5.2
GCC/Compiler version (if compiling from source):
GCC 5.4.0
CUDA/cuDNN version:
9
GPU model and memory:
Singularity tensorflow:1.10.0-devel-gpu-py3:~> nvidia-smi
Thu Aug 23 00:24:41 2018
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 39C P0 58W / 149W | 22MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
Exact command to reproduce:
$ # install nvidia driver v352.39
$ sudo singularity build --sandbox /path/to/sandbox docker://tensorflow/tensorflow/1.10.0-devel-gpu-py3
$ singularity shell -nv /path/to/sandbox
Singularity tensorflow:1.10.0-devel-gpu-py3:~> nvidia-smi
Thu Aug 23 00:24:41 2018
+------------------------------------------------------+
| NVIDIA-SMI 352.39 Driver Version: 352.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:84:00.0 Off | 0 |
| N/A 39C P0 58W / 149W | 22MiB / 11519MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Singularity tensorflow:1.10.0-devel-gpu-py3:~> python3
Python 3.5.2 (default, Nov 23 2017, 16:37:01)
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
Describe the problem
I built a tensorflow container with singularity. I think there might be a mismatch between the some of the card drivers and cuda libraries between the host and container. I have the container built as a sandbox so I'm able to make modifications quiet easily, I was curious if there's a way I can install appropriate cuda driver and runtimes to the container, and have the container run off those instead of pulling libraries from the host which are incompatible with the container? Is this the right way to do it? Or should I be updating the cuda drivers / libraries on the host to match the container?
The text was updated successfully, but these errors were encountered: