New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #8879
Comments
@gunan, may have insight, but in general, mac support for nvidia gpus is relatively poor, so it is difficult for us to support them. |
Looks like previous instances of the same error messages were usually cuDNN version mismatches. If this is caused by cuDNN version you may need to build TF from source. |
My cuDNN version is 5.1 (OSX). I noticed if there is more GPU memory available then command line version of the program works but jupyter notebook crashes with this error. |
5.1 are only major and minor cudnn versions. There is also a 3rd integer, patch version. You can check this via looking into your I feel like there is more we can get from your logs. |
Here it is: Also terminal output: |
@crack00ns I just encountered this when I tried to free up the GT 750m by taking out the external monitor, which switches the display to Iris Pro. Running the imagenet tutorial it crashes out like you described:
However, as soon as I enabled Discrete graphics again by using the external monitor it started to work again. So I suggest you use gfxCardStatus to switch on the Nvidia GPU before you run the code. Given that from 1.1 onwards MAC GPU will be unsupported, I will move away to a Linux desktop with a GTX 1080 Ti soon.... |
Thank you so much. Using gfxCardStatus and forcing it to Discrete mode seems to work! |
Looks like the issue is resolved? |
@gunan I encountered similar problem, how to resolve it? |
@wangg12 I have a simple workaround to fix this. It is probably some memory related issue. Get gfxcardstatus and force switch to integrated and then switch back to Discrete only. It kinda resets the graphics card. Check the GPU memory with cuda-smi. This process frees up GPU memory. Run your code again. Should work. I assume you are using macOS. |
@crack00ns No, I am using ubuntu14.04. I dont know what gfxcardstatus is. |
@wangg12 Ah then try resetting GPU with nvidia-smi. Check your GPU memory too. This is some memory related bug/issue in my opinion. |
Maybe this is related to cuda driver version. I tried the same code on another machine which has higher driver version, everything goes fine. But I'am not 100% sure because I can't update the driver version to test it for now. Thanks anyway @crack00ns. |
You're welcome. Could be driver issue too. Perhaps @gunan could provide better insight. Workaround works for me for now. |
I'm having this issue and can't figure out why it's happening because Theano works. The interesting thing is that I've to run theano with sudo or pygpu can't find cudnn handle either. If I try to run this TF script with sudo I crash immediately with the good old I'm on MacOS 10.12 with 1080ti, CUDA 8 and cudnn 5.1. The log is pretty much the same as above:
I tested the out of memory angle but it doesn't seem to be the issue because if I set TF to have |
On Ubuntu 16.04 calling 'nvidia-smi' fixed the problem. thanks to @crack00ns |
@jagadeesr how do you solve the problem, I use "sudo nvidia-smi -r -i 0" but display: |
@SunTiecheng I was able to run 'nvidia-smi' without sudo. After running this cmd, it worked for me. No arguments passed to the cmd. |
@jagadeesr It still can't work for me. But thank you very much! |
I faced this issue and in my case the root cause was not reseting device via nvidia-smi --reset-gpu --id=0 for example, but by disabling CNMeM. I work with Theano and this is enabled there via entry in ~/.theanorc as following: So removing this in my case helped resolution of this or very similar issue. Unfortunately TensorFlow uses its own memory management and doesn't utilize cnmem delivered by nvidia, so I don't know how to configure this here. There unfortunately isn't anything like externally configurable memory manager for GPU in Tensorflow as per my understanding and you can tune this only directly via code => gpu usage Note: my knowledge of TensorFlow is limited, just pointing out what I have discovered and how resolved issue with Theano on mobile graphic. |
I am having the same issue : Linux Mint 18.1 Serena
Per another thread, this appears to be an issue with the GPU running out of memory. I have tried to use this code snippet
but no luck. I have been checking nvidia-smi. I get the following :
|
Was this issue solved? - I too am having the same error with TF 1.3.0, cuDNN 6.0, CUDA 8.0 and keras. Altering the GPU-options didn't work. Many answers suggest checking for Zombie processes running in the background taking up GPU memory (I have none) |
@SimonWalsh1000 I am not sure if this helps but I was able to fix it by nuking the conda environment I was working in, uninstalling CUDA then reinstalling everything using conda (just Also, it seemed that the issue may have been associated with the latest NVIDIA driver. You may want to uninstall your nvidia driver as well, ( |
I think you may have needed cuDNN 6.0 for TF 1.3. |
I also had this issue, to verify this is related memory issue you can try disabling gup and try using only cpu. |
run this fix the issue. sudo rm -rf ~/.nv |
I'm on Windows10 and encountered this issue. Running "C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe" doesn't solve it for me... There isn't a .nv file/directory under home directory to delete either. Has anyone solved this for Windows Environments? Below is the result of nvidia-smi.exe
|
I had the same problem: The solution for me was downgrading from cudnn 7.1.2 to 7.005. |
with tf.Graph().as_default(): i lower gpu_memory_fraction, and there is no problem |
Well, I had the same error, for me, simply reboot the Linux Ubuntu 16.04 solved the problem. |
Similar problem happend when having dual gpus and running the code while gpu0 is occupied. |
Thanks, that is useful. |
Hi,
I installed tensorflow 1.0.1 GPU version on my Macbook Pro with GeForce GT 750M. Also installed CUDA 8.0.71 and cuDNN 5.1. I am running a tf code that works fine with non CPU tensorflow but on GPU version , I get this error (once a while it works too).
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
What is happening here? Is there a bug in tensorflow. Please advise.
Thanks
The text was updated successfully, but these errors were encountered: