Crash: Could not create cuDNN handle when convnets are used #6698

ymfa · 2017-01-06T17:29:09Z

Tensorflow (GPU) was imported successfully, but when running a session that involves a convolutional neural network (CNN), Python crashes with the following message:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

The problem persists on any combination of CUDA toolkit 7.5/8.0 and Tensorflow installed from pip/source. Test sessions that do not use CNNs are run successfully.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

The issue is similar to #6586, where I first commented. But since I experience the problem on a Mac, I was suggested to open a separate issue.

Environment info

Operating System: macOS Sierra 10.12.2
Xcode version 8.2 (8C38) (When I later tried CUDA 7.5, I installed Command Line Tools version 7.3.1 because CUDA 7.5 lacked support of the more recent compilers.)
Python 3.5.2 (anaconda)

Installed version of CUDA: tried both 8.0 (initially) and 7.5 (reported here, toolkit only -- the driver is still 8.0)
Installed version of cuDNN: 5.1 (different installations according to CUDA versions)
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

lrwxr-xr-x  1 root   wheel        33  5 Jan 20:33 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x@ 1 root   wheel      8280 13 Apr  2016 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x@ 1 root   wheel        45 13 Apr  2016 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudadevrt.a
lrwxr-xr-x@ 1 root   wheel        50 13 Apr  2016 /usr/local/cuda/lib/libcudart.7.5.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.7.5.dylib
lrwxr-xr-x@ 1 root   wheel        46 13 Apr  2016 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.dylib
lrwxr-xr-x@ 1 root   wheel        49 13 Apr  2016 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart_static.a
lrwxr-xr-x  1 root   wheel        16  5 Jan 17:14 /usr/local/cuda/lib/libcudnn.5 -> libcudnn.5.dylib
-rwxr-xr-x@ 1 ymfa   staff  58975112 10 Jun  2016 /usr/local/cuda/lib/libcudnn.5.dylib
lrwxr-xr-x@ 1 ymfa   staff        16 10 Jun  2016 /usr/local/cuda/lib/libcudnn.dylib -> libcudnn.5.dylib
lrwxr-xr-x  1 root   wheel        16  5 Jan 17:14 /usr/local/cuda/lib/libcudnn5.dylib -> libcudnn.5.dylib
-rw-r--r--@ 1 ymfa   staff  56392320 10 Jun  2016 /usr/local/cuda/lib/libcudnn_static.a

I tried both installing from pip and source. I first installed from binary pip package:

A link to the pip package you installed:
tensorflow-gpu
The output from python -c "import tensorflow; print(tensorflow.__version__)".
0.12.head

Later I installed from source (the pip package was uninstalled):

The commit hash (git rev-parse HEAD)
d67c09d98a576e1fbf2f3609ddb842e53890f31c
The output of bazel version

Build label: 0.4.3-homebrew
Build target: bazel-out/local-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 22 15:20:15 2016 (1482420015)
Build timestamp: 1482420015
Build timestamp as int: 1482420015

If possible, provide a minimal reproducible example

I made a minimal example by simplifying the network and reducing the training data to only twenty images and two classes for classification. issue.zip contains the Python code and the data. I wrote two convolutional layers because I found the network with only one convolutional layer runs without problem.

Complete log using CUDA 7.5 and Tensorflow compiled from source

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.7.5.dylib locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 740.18MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

Complete log using CUDA 8.0 and Tensorflow installed from pip

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 590.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:392] error retrieving driver version: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

The text was updated successfully, but these errors were encountered:

EncodeTS · 2017-01-07T01:16:44Z

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 3.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I met exactly the same problem as you do with CUDA8 and TF r0.12.1.

ymfa · 2017-01-07T16:43:38Z

@EncodeTS I just added a minimal reproducible example to my first post. Could you check if it reproduces the problem on your machine? On my machine, one convolutional layer works but not two convolutional layers, which led me to think that the problem might be caused by some resource limitations.

yaroslavvb · 2017-01-07T22:52:42Z

I can confirm that @ymfa minimal example fails on MacOS NVidia 750, but also same example works on Linux/Titan X

EncodeTS · 2017-01-08T03:48:51Z

The minimal example works on my Ubuntu. It looks like the issue I had encountered has a very low occurrence probability on my computer.

axakak · 2017-01-25T21:06:49Z

I'm encountering the same problem. The graph will run fine when forced to the cpu, but crashed on the gpu.

Environment

OS: macOS 10.12.2
GPU: GeForce GT 750M
TF: 0.12.1 (pip install)
Python: 3.6.0
CUDA: 8.0
cuDNN: 5.1

(output of ls -l /path/to/cuda/lib/libcud*):

lrwxr-xr-x  1 root  wheel     33 Dec 14 14:25 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x  1 root  wheel  13504 Dec  2 16:48 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x  1 root  wheel     45 Nov  3 11:40 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudadevrt.a
lrwxr-xr-x  1 root  wheel     50 Nov  3 11:40 /usr/local/cuda/lib/libcudart.8.0.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.8.0.dylib
lrwxr-xr-x  1 root  wheel     46 Nov  3 11:40 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.dylib
lrwxr-xr-x  1 root  wheel     49 Nov  3 11:40 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart_static.a
lrwxr-xr-x  1 root  wheel     47 Dec 14 10:21 /usr/local/cuda/lib/libcudnn.5.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.5.dylib
lrwxr-xr-x  1 root  wheel     45 Dec 14 10:21 /usr/local/cuda/lib/libcudnn.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.dylib
lrwxr-xr-x  1 root  wheel     48 Dec 14 10:21 /usr/local/cuda/lib/libcudnn_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn_static.a

Example

The minimal example provided by @ymfa both fails and succeeds on my setup. The following are three outputs that have been produced.
fail(1)

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Abort trap: 6

fail(2)

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.53GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
E tensorflow/stream_executor/cuda/cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "issue.py", line 52, in <module>
    sess.run(training_operation, feed_dict={x: X, y: Y})
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

Caused by op 'MatMul', defined at:
  File "issue.py", line 43, in <module>
    logits = SimpleNet(x)
  File "issue.py", line 34, in SimpleNet
    logits = tf.matmul(fc1, fc1_W) + fc1_b
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

pass

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.71GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
Training complete!

aselle · 2017-03-03T23:44:57Z

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

colinator · 2017-03-13T21:01:39Z

Not so fast, I see this crash too. Macbook pro, geforce 650. TF v1. Running via jupyter kernels, which I have to frequently restart. Maybe this graphics card is just too weak? Seeing as how the op uses the same card: likely.

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.8.0.dylib locally
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 870.46MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

TheTesla · 2017-03-16T00:05:39Z

I have the same problem with GTX 960m, cudnn5.1.5 and cuda-8.0.44.

gururao001 · 2017-03-16T20:19:45Z

Have the same problem with centOS, titan X

BanR · 2017-03-18T03:11:28Z

Have the same problem with ubuntu(14.04) and GRID K520 (aws g2.2)

strickon · 2017-03-28T05:17:01Z

Have the same problem windows 10 cudnn 5.1 cuda 8 gtx 1060. Program works on cpu version of tensor flow but get these same errors with the gpu version.

lajos · 2017-04-06T17:09:36Z

I had the same issue with gtx1060, win8.1, cuda8.0.60, cudnn5.0. Upgraded to the latest stable tensorflow-gpu nightly build (currently http://ci.tensorflow.org/job/nightly-win/133/) and cudnn5.1. Problem solved.

jake17007 · 2017-04-07T05:54:36Z

Same issue here.

I was having this issue with the software versions listed below, except TF was version 1.0.0. I then upgraded to TF 1.0.1. I ran the same program once and it worked. I then ran it again and it didn't work -- it produced the same error as before.

Tensorflow-gpu 1.0.1
Mac OS X 10.12.3
Cuda 8.0.61
CuDNN 5.1
GeForce GT 750M

dinara92 · 2017-04-07T11:03:00Z

having the same problem with gtx650, ubuntu 16.04, CUDA Version 8.0.61, TF version 1.0.0
it was working just now, but giving some low memory warnings. However, it was running
Now it doesn't run at all, giving me same Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) error

rtbins · 2017-04-08T05:31:38Z

Having the same issue with gtx 1080 ti, windows 10, CUDA Version 8.0.61, TF version 1.0.1, 5.1 Cudann, cuda 8.0.61

strickon · 2017-04-09T03:16:46Z

I was able to get a program to work by limiting the gpu usage. In my case with a 3gb gtx 1060 on ubuntu 16.04, if I set gpu option per_process_gpu_memory_fraction to .7 it works. Anything higher, I get these errors

E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

It could be a case of bad error reporting by tensorflow. Seems completely unrelated. Maybe it is a clue to getting this resolved in a better manner?

drpngx · 2017-04-17T17:31:25Z

@zheng-xq is there an obvious setup issue?

Dibel · 2017-04-22T16:22:29Z

Same issue too. I'm on Windows 10, GTX1070, CUDA 8.0, cuDNN 5.1.

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

serans1 · 2017-04-25T20:44:22Z

If it helps anyone, seems there are sometimes zombie processes left which prevent from tf to start again properly and gave me this error. killing them work around the issue.

strickon · 2017-04-25T22:06:20Z

Here is a bit more info on how I temporarily resolved it. I believe these issues are all related to GPU memory allocation and have nothing to do with the errors being reported. There were other errors before this indicating some sort of memory allocation problem but the program continued to progress, eventually giving the cudnn errors that everyone is getting. The reason I believe it works sometimes is that if you use the gpu for other things besides tensorflow such as your primary display, the available memory fluctuates. Sometimes you can allocate what you need and other times it can't.

From the API
https://www.tensorflow.org/versions/r0.12/how_tos/using_gpu/
"By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation."

I think this default allocation is broken in some way that causes this erratic behavior and certain situations to work and others to fail.

I have resolved this issue by changing the default behavior of TF to allocate a minimum amount of memory and grow as needed as detailed in the webpage.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

I have also tried the alternate way and was able to get it to work and fail with experimentally choosing a percentage that worked. In my case it ended up being about .7.

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Still no word from anyone on the TF team confirming this but it is worth a shot to see if others can confirm similar behavior.

vburca · 2017-04-26T07:14:37Z

I am also getting the CUDNN_STATUS_NOT_INITIALIZED error. Here is the full error log:

2017-04-26 00:08:57.526234: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
2017-04-26 00:09:01.111706: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2017-04-26 00:09:01.111805: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2017-04-26 00:09:01.114040: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-04-26 00:09:01.114232: F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I am on Windows 10, CUDA 8.0, cuDNN 5.1 . Can anything be done to avoid these? I was able to run earlier some other tensorflow tests and it worked fine (including conv op), but now it doesn't work on this new test...

@serans1 What zombie processes are you referring to?

Please let me know if there is a workaround for this. Thank you!

EDIT This might have been a newbie mistake, but I will just mention it here, in case someone else runs in the same issue:
My problem was that I already had running an instance of a Jupyter Python Notebook (whose cells were all ran already, hence loaded in the memory), and also some other process that was taking up GPU memory (minimized video game). Therefore, when I checked the memory usage on my GPU, it was already at around 4+GB (50+%). I closed the Jupyter Notebook and the other application, and re-ran my tensorflow test. Now everything ran smoothly :) Also, while running I noticed that at peak it uses up to 90% of my GPU memory, and thus it makes sense why it couldn't initialize CUDNN when it had less than 50% available in my initial situation.

Sorry again for my mistake! I'm just at the beginning of playing around with this :)

ghost · 2017-05-07T12:07:59Z

The same problem,is there any solution to it ?

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.35GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support.
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:392] error retrieving driver version: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

lazysquid · 2017-05-09T07:46:40Z

I have exactly same issue.
But I can run my codes with root access(with sudo).
Currently I'm working on Ubuntu 16.04 with GTX 960.
My CUDA version is 8.0 and I'm using tensorflow 1.01

ThomasWarn · 2019-02-15T17:57:57Z

I had the error and I 'fixed' it by closing my multiple instances of Jupyter and closing other applications. I'm new to working with tensorflow in general so it's likely this only fixed my problem.

kl3eo · 2019-03-05T11:13:13Z

E tensorflow/stream_executor/cuda/cuda_dnn.cc:353] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I had this issue with 10.1 Cuda+cuDNN7.5 and TF 1.11 compiled from source with cuda. The script I was trying to use needed these lines inserted somewhere:
config = tf.ConfigProto() config.gpu_options.allow_growth = True

and then later:
sess = tf.Session(graph=detection_graph,config=config)

This done, a lot of "GPU out of memory errors" - but detection goes on very quickly as I suppose it should when we're using GPU. Thanks for sharing!

HenryChanhy · 2019-03-10T15:12:08Z

I faced the same issues.and use below line fixed it. check here get detail.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

sungwonida · 2019-04-05T05:23:52Z

@EncodeTS I just added a minimal reproducible example to my first post. Could you check if it reproduces the problem on your machine? On my machine, one convolutional layer works but not two convolutional layers, which led me to think that the problem might be caused by some resource limitations.

Actually, I'm working on Ubuntu 18.04, not macOS, but this looks to make sense that it might be caused by some resource limitations. Me either faced the same issue on GTX 1050 ti (4 GB) but the issue has gone away when I run the same architecture on GTX 1080 ti (11 GB). Though all the environments are not the same between the two systems, I tried my best by utilizing the docker container.

woshidandan · 2019-04-16T11:36:08Z

This problem is generally related to the version of cuda and GPU memory, if former, the easiest way is to change your cuda version by Anaconda！if later, you can find some ways to solve in other answers.
这个问题一般与显存和cuda版本有关，如果尝试了上面的更改GPU memory的方法无效，考虑更改cuda版本，最简单的方法是不用去管系统装了什么cuda版本，直接在Anaconda中的项目环境下修改cuda版本即可，亲测有效。

pathnirvana · 2019-06-06T06:31:19Z

if you are still getting this issue, try the following. it worked for me
tf.config.gpu.set_per_process_memory_growth(True); tf.config.gpu.set_per_process_memory_fraction(0.4);

tensorflow 2 alpha
cuda 10.0
GTX 1650

nayash · 2019-06-15T14:37:31Z

I have similar issue: CUDNN_STATUS_ALLOC_FAILED.
I broke my head for 3-4 hours. Finally fixed.
this indeed works, as mentioned above by many :
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

But the key is to write it immediately below "import tensorflow as tf" which I wasn't doing. I had written it after all the imports.

liuzc188 · 2019-07-06T12:01:53Z

May be tensorflow-gpu version has problems, you should check your own versions try again and again, uninstall and install..... tensorflow-gpu找到对应的版本号然后卸载再重装

imflash217 · 2019-07-25T09:20:43Z

it worked for me when adding these lines of code to the begining of script @Codersadis

add the following code to the very beginning of the .py file, which solves my problem.

from future import print_function, division
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))

I am getting the same error with tensorflow-gpu == 1.8.0, cudnn version = 7.0.5 and cuda 9.1.85
, ubuntu 16.04 even after I add the above suggested solution.
Following is the stack-trace:

INFO - Waveunet Training - Running command 'run'
INFO - Waveunet Training - Started
SCRIPT START
EPOCH: 0
Dataset ready!
Training...
Sep_Vars: 10265550
Num of variables65
2019-07-25 05:10:09.872823: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-25 05:10:10.286584: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-25 05:10:10.286914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:00:05.0
totalMemory: 7.92GiB freeMemory: 7.83GiB
2019-07-25 05:10:10.286964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-07-25 05:10:10.640890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-25 05:10:10.640952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-07-25 05:10:10.640968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-07-25 05:10:10.641194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7566 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2019-07-25 05:10:27.643833: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 2054 of 4000
2019-07-25 05:10:35.917445: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:129] Shuffle buffer filled.
2019-07-25 05:10:36.175698: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-07-25 05:10:36.175820: E tensorflow/stream_executor/cuda/cuda_dnn.cc:463] possibly insufficient driver version: 384.183.0
2019-07-25 05:10:36.175842: E tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2019-07-25 05:10:36.175859: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

Please help

Eugen2525 · 2019-07-30T04:49:22Z

I have similar issue: CUDNN_STATUS_ALLOC_FAILED.
I broke my head for 3-4 hours. Finally fixed.
this indeed works, as mentioned above by many :
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

But the key is to write it immediately below "import tensorflow as tf" which I wasn't doing. I had written it after all the imports.

great reply, worked for me !!

imflash217 · 2019-07-30T12:39:54Z

it worked for me when adding these lines of code to the begining of script @Codersadis
add the following code to the very beginning of the .py file, which solves my problem.
from future import print_function, division
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))

I am getting the same error with tensorflow-gpu == 1.8.0, cudnn version = 7.0.5 and cuda 9.1.85
, ubuntu 16.04 even after I add the above suggested solution.
Following is the stack-trace:

INFO - Waveunet Training - Running command 'run'
INFO - Waveunet Training - Started
SCRIPT START
EPOCH: 0
Dataset ready!
Training...
Sep_Vars: 10265550
Num of variables65
2019-07-25 05:10:09.872823: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-25 05:10:10.286584: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-25 05:10:10.286914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:00:05.0
totalMemory: 7.92GiB freeMemory: 7.83GiB
2019-07-25 05:10:10.286964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-07-25 05:10:10.640890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-25 05:10:10.640952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-07-25 05:10:10.640968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-07-25 05:10:10.641194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7566 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2019-07-25 05:10:27.643833: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 2054 of 4000
2019-07-25 05:10:35.917445: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:129] Shuffle buffer filled.
2019-07-25 05:10:36.175698: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-07-25 05:10:36.175820: E tensorflow/stream_executor/cuda/cuda_dnn.cc:463] possibly insufficient driver version: 384.183.0
2019-07-25 05:10:36.175842: E tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2019-07-25 05:10:36.175859: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

Please help

Changing the Nvidia driver to 396+ solved the issue for me.

nwoyecid · 2019-08-30T15:02:09Z

It has to do with the memory fraction available to load GPU resources to create cudnn handle, also known as per_process_gpu_memory_fraction.
Reducing this memory fraction by yourself will solve the error.

> sess_config = tf.ConfigProto(gpu_options =
> tf.GPUOptions(per_process_gpu_memory_fraction=0.7),
> allow_soft_placement = True)
> 
> with tf.Session(config=sess_config) as sess:
>      sess.run([whatever])

Use as small fraction as could fit in your memory. (In the code, I use 0.7, you can start with 0.3 or even smaller, then increase until you get the same error, that's your limit.)
Pass it to your tf.Session() or tf.train.MonitoredTrainingSession() or Supervisor's sv.managed_session() as config.

This should allow your GPU create a cudnn handle for your TensorFlow code.

Kevin-Oudai · 2019-12-04T04:46:58Z

I was getting the following error with tensorflow 2.0 in my conda environment.

2019-12-03 23:49:06.381259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-03 23:49:07.220066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-03 23:49:07.236411: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-03 23:49:07.247476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-03 23:49:07.256881: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-03 23:49:07.269536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-03 23:49:07.281954: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-03 23:49:07.295302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-03 23:49:08.589865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-03 23:49:08.599121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-03 23:49:08.610543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-03 23:49:08.616005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4627 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-12-03 23:49:58.521484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-03 23:49:59.604517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-03 23:50:04.209110: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-03 23:50:04.216670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:333] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-12-03 23:50:04.226172: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-03 23:50:04.234741: E tensorflow/stream_executor/cuda/cuda_dnn.cc:333] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-12-03 23:50:04.244958: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node sequential/conv2d/Conv2D}}]]

so i added the following code to my CNN

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

My output is now

2019-12-04 00:10:07.708573: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-12-04 00:10:11.643304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-04 00:10:12.753615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-04 00:10:12.769498: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-04 00:10:12.783900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-04 00:10:54.941468: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-04 00:10:55.372516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-04 00:10:55.383385: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-04 00:10:55.406053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-04 00:10:56.741665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 00:10:56.747255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-04 00:10:56.752302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-04 00:10:56.756861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4627 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-12-04 00:11:08.281356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-04 00:11:08.934804: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-04 00:11:11.870237: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.

As everyone suggested it is due to tensorflow using all of the GPU/GPUs. My CNN trains without error now.

rpvelloso · 2019-12-17T02:47:51Z

I was facing the same problem when using the community supported version of tensorflow inside a conda environment (i.e. using > conda install tensorflow-gpu )

Turns out this version is not actually good in all situations (even though I've been using it on other machines). The best version to use is the pip installable version https://www.tensorflow.org/install/pip inside a conda environment. When I did this everything worked.

That solved for me, thanks!

RokoMijic · 2019-12-19T19:02:16Z

This also resolved the issue for me.

GeForce GTX 1050, CUDA 10.0

Note: this is the only thing I can find that works in TF 2.0 for now. Thanks!

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

jtiscione · 2020-01-04T23:26:21Z

This also resolved the issue for me.

GeForce GTX 1050, CUDA 10.0

Note: this is the only thing I can find that works in TF 2.0 for now. Thanks!

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

This didn't make any difference for me... TF 2.0, RTX 2060, CUDA 10.1, CuDNN 7.6

This is with 16 GB RAM, 6 GB video memory, and a basic MNIST toy model with one conv layer. No memory problems, just a stack trace.

No GPU problems at all with Pytorch, as usual

magomar · 2020-01-10T09:39:07Z

In my case, I have two machines, both with RTX 2080Ti, TF 2.1, CUDA 10.1, CuDNN 7.6. One works, the other one raises the aforementioned error. Both machines have the same amount of RAM, 16GB. There are hardware differentes, though, like the CPU. But the problem is only occurring when using the GPU.

panzhufeng · 2020-03-07T02:43:33Z

In my case, I have two machines, both with RTX 2080Ti, TF 2.1, CUDA 10.1, CuDNN 7.6. One works, the other one raises the aforementioned error. Both machines have the same amount of RAM, 16GB. There are hardware differentes, though, like the CPU. But the problem is only occurring when using the GPU.

Same platform, same problem

Samaritan1011001 · 2020-04-13T18:52:48Z

If you are using the latest tensorflow and keras. Try this from here, it worked for me:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

nitin-barthwal · 2020-07-20T08:32:19Z

This one works for me.
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

This worked for me. Thanks

ypaez · 2020-08-21T23:26:58Z

@Samaritan1011001 your solution works for me thanks a lot.

kitarikes · 2021-04-28T02:41:02Z

@Samaritan1011001 your solution works for me, too! thanks xD !

nic-00 · 2021-06-02T23:42:24Z

This problem is generally related to the version of cuda and GPU memory, if former, the easiest way is to change your cuda version by Anaconda！if later, you can find some ways to solve in other answers.
这个问题一般与显存和cuda版本有关，如果尝试了上面的更改GPU memory的方法无效，考虑更改cuda版本，最简单的方法是不用去管系统装了什么cuda版本，直接在Anaconda中的项目环境下修改cuda版本即可，亲测有效。

不好意思请问一下要怎么在anaconda里面直接修改cuda的版本呢？感激不尽
May I know how to change cuda's version in anaconda prompt? thanks a lot

michaelisard added the stat:awaiting response Status - Awaiting response from author label Jan 7, 2017

aselle closed this as completed Mar 3, 2017

prb12 mentioned this issue Mar 11, 2017

Kernel Died Issue #8309

Closed

aselle reopened this Apr 7, 2017

aselle removed the stat:awaiting response Status - Awaiting response from author label Apr 7, 2017

drpngx added type:build/install Build and install issues stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Apr 17, 2017

tawe141 mentioned this issue Mar 20, 2019

Component function execution failed: Unknown: Fail to find the dnn implementation. #26959

Closed

drasmuss mentioned this issue May 9, 2019

Errors when rerunning test suite microsoft/vscode-python#5590

Closed

chsigg mentioned this issue May 12, 2019

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #24496

Closed

nishbo mentioned this issue Dec 3, 2019

Using GPU in analyse_videos(): CudNN failed to initialize DeepLabCut/DeepLabCut#358

Closed

RomanGirin mentioned this issue Oct 10, 2020

UnknownError: Failed to get convolution algorithm. #43764

Closed

nvbogu mentioned this issue Oct 31, 2020

CUDNN Failed (but sometimes, rarely not) sglvladi/TensorFlowObjectDetectionTutorial#65

Closed

sak96 mentioned this issue Jan 10, 2023

CUDNN_STATUS_NOT_INITIALIZED LaurentMazare/diffusers-rs#16

Open

Crash: Could not create cuDNN handle when convnets are used #6698

Crash: Could not create cuDNN handle when convnets are used #6698

Comments

ymfa commented Jan 6, 2017 • edited Loading

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

Environment info

If possible, provide a minimal reproducible example

Complete log using CUDA 7.5 and Tensorflow compiled from source

Complete log using CUDA 8.0 and Tensorflow installed from pip

EncodeTS commented Jan 7, 2017 • edited Loading

ymfa commented Jan 7, 2017

yaroslavvb commented Jan 7, 2017 • edited Loading

EncodeTS commented Jan 8, 2017

axakak commented Jan 25, 2017

Environment

Example

aselle commented Mar 3, 2017

colinator commented Mar 13, 2017 • edited Loading

TheTesla commented Mar 16, 2017

gururao001 commented Mar 16, 2017

BanR commented Mar 18, 2017

strickon commented Mar 28, 2017 • edited Loading

lajos commented Apr 6, 2017

jake17007 commented Apr 7, 2017 • edited Loading

dinara92 commented Apr 7, 2017

rtbins commented Apr 8, 2017

strickon commented Apr 9, 2017

drpngx commented Apr 17, 2017

Dibel commented Apr 22, 2017

serans1 commented Apr 25, 2017

strickon commented Apr 25, 2017

vburca commented Apr 26, 2017 • edited Loading

ghost commented May 7, 2017

lazysquid commented May 9, 2017

ThomasWarn commented Feb 15, 2019

kl3eo commented Mar 5, 2019

HenryChanhy commented Mar 10, 2019

sungwonida commented Apr 5, 2019

woshidandan commented Apr 16, 2019

pathnirvana commented Jun 6, 2019 • edited Loading

nayash commented Jun 15, 2019

liuzc188 commented Jul 6, 2019

imflash217 commented Jul 25, 2019 • edited Loading

Eugen2525 commented Jul 30, 2019

imflash217 commented Jul 30, 2019

nwoyecid commented Aug 30, 2019

Kevin-Oudai commented Dec 4, 2019

rpvelloso commented Dec 17, 2019

RokoMijic commented Dec 19, 2019

jtiscione commented Jan 4, 2020 • edited Loading

magomar commented Jan 10, 2020

panzhufeng commented Mar 7, 2020

Samaritan1011001 commented Apr 13, 2020 • edited Loading

nitin-barthwal commented Jul 20, 2020

ypaez commented Aug 21, 2020

kitarikes commented Apr 28, 2021

nic-00 commented Jun 2, 2021

ymfa commented Jan 6, 2017 •

edited

Loading

EncodeTS commented Jan 7, 2017 •

edited

Loading

yaroslavvb commented Jan 7, 2017 •

edited

Loading

colinator commented Mar 13, 2017 •

edited

Loading

strickon commented Mar 28, 2017 •

edited

Loading

jake17007 commented Apr 7, 2017 •

edited

Loading

vburca commented Apr 26, 2017 •

edited

Loading

pathnirvana commented Jun 6, 2019 •

edited

Loading

imflash217 commented Jul 25, 2019 •

edited

Loading

jtiscione commented Jan 4, 2020 •

edited

Loading

Samaritan1011001 commented Apr 13, 2020 •

edited

Loading