Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash: Could not create cuDNN handle when convnets are used #6698

Closed
ymfa opened this issue Jan 6, 2017 · 147 comments
Closed

Crash: Could not create cuDNN handle when convnets are used #6698

ymfa opened this issue Jan 6, 2017 · 147 comments
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:build/install Build and install issues

Comments

@ymfa
Copy link

ymfa commented Jan 6, 2017

Tensorflow (GPU) was imported successfully, but when running a session that involves a convolutional neural network (CNN), Python crashes with the following message:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 

The problem persists on any combination of CUDA toolkit 7.5/8.0 and Tensorflow installed from pip/source. Test sessions that do not use CNNs are run successfully.

What related GitHub issues or StackOverflow threads have you found by searching the web for your problem?

The issue is similar to #6586, where I first commented. But since I experience the problem on a Mac, I was suggested to open a separate issue.

Environment info

Operating System: macOS Sierra 10.12.2
Xcode version 8.2 (8C38) (When I later tried CUDA 7.5, I installed Command Line Tools version 7.3.1 because CUDA 7.5 lacked support of the more recent compilers.)
Python 3.5.2 (anaconda)

Installed version of CUDA: tried both 8.0 (initially) and 7.5 (reported here, toolkit only -- the driver is still 8.0)
Installed version of cuDNN: 5.1 (different installations according to CUDA versions)
(please attach the output of ls -l /path/to/cuda/lib/libcud*):

lrwxr-xr-x  1 root   wheel        33  5 Jan 20:33 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x@ 1 root   wheel      8280 13 Apr  2016 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x@ 1 root   wheel        45 13 Apr  2016 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudadevrt.a
lrwxr-xr-x@ 1 root   wheel        50 13 Apr  2016 /usr/local/cuda/lib/libcudart.7.5.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.7.5.dylib
lrwxr-xr-x@ 1 root   wheel        46 13 Apr  2016 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart.dylib
lrwxr-xr-x@ 1 root   wheel        49 13 Apr  2016 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-7.5/lib/libcudart_static.a
lrwxr-xr-x  1 root   wheel        16  5 Jan 17:14 /usr/local/cuda/lib/libcudnn.5 -> libcudnn.5.dylib
-rwxr-xr-x@ 1 ymfa   staff  58975112 10 Jun  2016 /usr/local/cuda/lib/libcudnn.5.dylib
lrwxr-xr-x@ 1 ymfa   staff        16 10 Jun  2016 /usr/local/cuda/lib/libcudnn.dylib -> libcudnn.5.dylib
lrwxr-xr-x  1 root   wheel        16  5 Jan 17:14 /usr/local/cuda/lib/libcudnn5.dylib -> libcudnn.5.dylib
-rw-r--r--@ 1 ymfa   staff  56392320 10 Jun  2016 /usr/local/cuda/lib/libcudnn_static.a

I tried both installing from pip and source. I first installed from binary pip package:

  1. A link to the pip package you installed:
    tensorflow-gpu
  2. The output from python -c "import tensorflow; print(tensorflow.__version__)".
    0.12.head

Later I installed from source (the pip package was uninstalled):

  1. The commit hash (git rev-parse HEAD)
    d67c09d98a576e1fbf2f3609ddb842e53890f31c

  2. The output of bazel version

    Build label: 0.4.3-homebrew
    Build target: bazel-out/local-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
    Build time: Thu Dec 22 15:20:15 2016 (1482420015)
    Build timestamp: 1482420015
    Build timestamp as int: 1482420015

If possible, provide a minimal reproducible example

I made a minimal example by simplifying the network and reducing the training data to only twenty images and two classes for classification. issue.zip contains the Python code and the data. I wrote two convolutional layers because I found the network with only one convolutional layer runs without problem.

Complete log using CUDA 7.5 and Tensorflow compiled from source

I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcublas.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcufft.7.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:125] successfully opened CUDA library libcurand.7.5.dylib locally
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:95] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:874] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 740.18MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 

Complete log using CUDA 8.0 and Tensorflow installed from pip

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 590.00MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/stream_executor/cuda/cuda_dnn.cc:392] error retrieving driver version: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got ""
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)
@EncodeTS
Copy link
Contributor

EncodeTS commented Jan 7, 2017

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:02:00.0
Total memory: 7.92GiB
Free memory: 3.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:02:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I met exactly the same problem as you do with CUDA8 and TF r0.12.1.

@ymfa
Copy link
Author

ymfa commented Jan 7, 2017

@EncodeTS I just added a minimal reproducible example to my first post. Could you check if it reproduces the problem on your machine? On my machine, one convolutional layer works but not two convolutional layers, which led me to think that the problem might be caused by some resource limitations.

@michaelisard michaelisard added the stat:awaiting response Status - Awaiting response from author label Jan 7, 2017
@yaroslavvb
Copy link
Contributor

yaroslavvb commented Jan 7, 2017

I can confirm that @ymfa minimal example fails on MacOS NVidia 750, but also same example works on Linux/Titan X

@EncodeTS
Copy link
Contributor

EncodeTS commented Jan 8, 2017

The minimal example works on my Ubuntu. It looks like the issue I had encountered has a very low occurrence probability on my computer.

@axakak
Copy link

axakak commented Jan 25, 2017

I'm encountering the same problem. The graph will run fine when forced to the cpu, but crashed on the gpu.

Environment

OS: macOS 10.12.2
GPU: GeForce GT 750M
TF: 0.12.1 (pip install)
Python: 3.6.0
CUDA: 8.0
cuDNN: 5.1

(output of ls -l /path/to/cuda/lib/libcud*):

lrwxr-xr-x  1 root  wheel     33 Dec 14 14:25 /usr/local/cuda/lib/libcuda.1.dylib -> /usr/local/cuda/lib/libcuda.dylib
-rwxr-xr-x  1 root  wheel  13504 Dec  2 16:48 /usr/local/cuda/lib/libcuda.dylib
lrwxr-xr-x  1 root  wheel     45 Nov  3 11:40 /usr/local/cuda/lib/libcudadevrt.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudadevrt.a
lrwxr-xr-x  1 root  wheel     50 Nov  3 11:40 /usr/local/cuda/lib/libcudart.8.0.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.8.0.dylib
lrwxr-xr-x  1 root  wheel     46 Nov  3 11:40 /usr/local/cuda/lib/libcudart.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart.dylib
lrwxr-xr-x  1 root  wheel     49 Nov  3 11:40 /usr/local/cuda/lib/libcudart_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudart_static.a
lrwxr-xr-x  1 root  wheel     47 Dec 14 10:21 /usr/local/cuda/lib/libcudnn.5.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.5.dylib
lrwxr-xr-x  1 root  wheel     45 Dec 14 10:21 /usr/local/cuda/lib/libcudnn.dylib -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn.dylib
lrwxr-xr-x  1 root  wheel     48 Dec 14 10:21 /usr/local/cuda/lib/libcudnn_static.a -> /Developer/NVIDIA/CUDA-8.0/lib/libcudnn_static.a

Example

The minimal example provided by @ymfa both fails and succeeds on my setup. The following are three outputs that have been produced.
fail(1)

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.76GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 
Abort trap: 6

fail(2)

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.53GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
E tensorflow/stream_executor/cuda/cuda_blas.cc:372] failed to create cublas handle: CUBLAS_STATUS_NOT_INITIALIZED
W tensorflow/stream_executor/stream.cc:1390] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1021, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1003, in _run_fn
    status, run_metadata)
  File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 469, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "issue.py", line 52, in <module>
    sess.run(training_operation, feed_dict={x: X, y: Y})
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 766, in run
    run_metadata_ptr)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 964, in _run
    feed_dict_string, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1014, in _do_run
    target_list, options, run_metadata)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1034, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

Caused by op 'MatMul', defined at:
  File "issue.py", line 43, in <module>
    logits = SimpleNet(x)
  File "issue.py", line 34, in SimpleNet
    logits = tf.matmul(fc1, fc1_W) + fc1_b
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/math_ops.py", line 1729, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 1442, in _mat_mul
    transpose_b=transpose_b, name=name)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 759, in apply_op
    op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2240, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1128, in __init__
    self._traceback = _extract_stack()

InternalError (see above for traceback): Blas SGEMM launch failed : a.shape=(20, 400), b.shape=(400, 2), m=20, n=2, k=400
	 [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](Flatten/Reshape, Variable_4/read)]]

pass

I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.dylib locally
I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:901] OS X does not support NUMA - returning NUMA node zero
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 750M
major: 3 minor: 0 memoryClockRate (GHz) 0.9255
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.71GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 750M, pci bus id: 0000:01:00.0)
Training...
Training complete!

@aselle
Copy link
Contributor

aselle commented Mar 3, 2017

Automatically closing due to lack of recent activity. Please update the issue when new information becomes available, and we will reopen the issue. Thanks!

@colinator
Copy link

colinator commented Mar 13, 2017

Not so fast, I see this crash too. Macbook pro, geforce 650. TF v1. Running via jupyter kernels, which I have to frequently restart. Maybe this graphics card is just too weak? Seeing as how the op uses the same card: likely.

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcudnn.5.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.8.0.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.1.dylib locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.8.0.dylib locally
...
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties: 
name: GeForce GT 650M
major: 3 minor: 0 memoryClockRate (GHz) 0.9
pciBusID 0000:01:00.0
Total memory: 1023.69MiB
Free memory: 870.46MiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0:   Y 
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GT 650M, pci bus id: 0000:01:00.0)
E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) 

@TheTesla
Copy link

I have the same problem with GTX 960m, cudnn5.1.5 and cuda-8.0.44.

@gururao001
Copy link

Have the same problem with centOS, titan X

@BanR
Copy link

BanR commented Mar 18, 2017

Have the same problem with ubuntu(14.04) and GRID K520 (aws g2.2)

@strickon
Copy link

strickon commented Mar 28, 2017

Have the same problem windows 10 cudnn 5.1 cuda 8 gtx 1060. Program works on cpu version of tensor flow but get these same errors with the gpu version.

@lajos
Copy link

lajos commented Apr 6, 2017

I had the same issue with gtx1060, win8.1, cuda8.0.60, cudnn5.0. Upgraded to the latest stable tensorflow-gpu nightly build (currently http://ci.tensorflow.org/job/nightly-win/133/) and cudnn5.1. Problem solved.

@jake17007
Copy link

jake17007 commented Apr 7, 2017

Same issue here.

I was having this issue with the software versions listed below, except TF was version 1.0.0. I then upgraded to TF 1.0.1. I ran the same program once and it worked. I then ran it again and it didn't work -- it produced the same error as before.

Tensorflow-gpu 1.0.1
Mac OS X 10.12.3
Cuda 8.0.61
CuDNN 5.1
GeForce GT 750M

@aselle aselle reopened this Apr 7, 2017
@aselle aselle removed the stat:awaiting response Status - Awaiting response from author label Apr 7, 2017
@dinara92
Copy link

dinara92 commented Apr 7, 2017

having the same problem with gtx650, ubuntu 16.04, CUDA Version 8.0.61, TF version 1.0.0
it was working just now, but giving some low memory warnings. However, it was running
Now it doesn't run at all, giving me same Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms) error

@rtbins
Copy link

rtbins commented Apr 8, 2017

Having the same issue with gtx 1080 ti, windows 10, CUDA Version 8.0.61, TF version 1.0.1, 5.1 Cudann, cuda 8.0.61

@strickon
Copy link

strickon commented Apr 9, 2017

I was able to get a program to work by limiting the gpu usage. In my case with a 3gb gtx 1060 on ubuntu 16.04, if I set gpu option per_process_gpu_memory_fraction to .7 it works. Anything higher, I get these errors

E tensorflow/stream_executor/cuda/cuda_dnn.cc:397] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:364] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

It could be a case of bad error reporting by tensorflow. Seems completely unrelated. Maybe it is a clue to getting this resolved in a better manner?

@drpngx
Copy link
Contributor

drpngx commented Apr 17, 2017

@zheng-xq is there an obvious setup issue?

@drpngx drpngx added type:build/install Build and install issues stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Apr 17, 2017
@Dibel
Copy link

Dibel commented Apr 22, 2017

Same issue too. I'm on Windows 10, GTX1070, CUDA 8.0, cuDNN 5.1.

E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

@serans1
Copy link

serans1 commented Apr 25, 2017

If it helps anyone, seems there are sometimes zombie processes left which prevent from tf to start again properly and gave me this error. killing them work around the issue.

@strickon
Copy link

Here is a bit more info on how I temporarily resolved it. I believe these issues are all related to GPU memory allocation and have nothing to do with the errors being reported. There were other errors before this indicating some sort of memory allocation problem but the program continued to progress, eventually giving the cudnn errors that everyone is getting. The reason I believe it works sometimes is that if you use the gpu for other things besides tensorflow such as your primary display, the available memory fluctuates. Sometimes you can allocate what you need and other times it can't.

From the API
https://www.tensorflow.org/versions/r0.12/how_tos/using_gpu/
"By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to CUDA_VISIBLE_DEVICES) visible to the process. This is done to more efficiently use the relatively precious GPU memory resources on the devices by reducing memory fragmentation."

I think this default allocation is broken in some way that causes this erratic behavior and certain situations to work and others to fail.

I have resolved this issue by changing the default behavior of TF to allocate a minimum amount of memory and grow as needed as detailed in the webpage.
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config, ...)

I have also tried the alternate way and was able to get it to work and fail with experimentally choosing a percentage that worked. In my case it ended up being about .7.

config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)

Still no word from anyone on the TF team confirming this but it is worth a shot to see if others can confirm similar behavior.

@vburca
Copy link

vburca commented Apr 26, 2017

I am also getting the CUDNN_STATUS_NOT_INITIALIZED error. Here is the full error log:

2017-04-26 00:08:57.526234: I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
2017-04-26 00:09:01.111706: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:359] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2017-04-26 00:09:01.111805: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:366] error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2017-04-26 00:09:01.114040: E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:326] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2017-04-26 00:09:01.114232: F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:659] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

I am on Windows 10, CUDA 8.0, cuDNN 5.1 . Can anything be done to avoid these? I was able to run earlier some other tensorflow tests and it worked fine (including conv op), but now it doesn't work on this new test...

@serans1 What zombie processes are you referring to?

Please let me know if there is a workaround for this. Thank you!

EDIT This might have been a newbie mistake, but I will just mention it here, in case someone else runs in the same issue:
My problem was that I already had running an instance of a Jupyter Python Notebook (whose cells were all ran already, hence loaded in the memory), and also some other process that was taking up GPU memory (minimized video game). Therefore, when I checked the memory usage on my GPU, it was already at around 4+GB (50+%). I closed the Jupyter Notebook and the other application, and re-ran my tensorflow test. Now everything ran smoothly :) Also, while running I noticed that at peak it uses up to 90% of my GPU memory, and thus it makes sense why it couldn't initialize CUDNN when it had less than 50% available in my initial situation.

Sorry again for my mistake! I'm just at the beginning of playing around with this :)

@ghost
Copy link

ghost commented May 7, 2017

The same problem,is there any solution to it ?

I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 960M
major: 5 minor: 0 memoryClockRate (GHz) 1.176
pciBusID 0000:01:00.0
Total memory: 4.00GiB
Free memory: 3.35GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 960M, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support.
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:392] error retrieving driver version: Permission denied: could not open driver version path for reading: /proc/driver/nvidia/version
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\kernels\conv_ops.cc:532] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

@lazysquid
Copy link

I have exactly same issue.
But I can run my codes with root access(with sudo).
Currently I'm working on Ubuntu 16.04 with GTX 960.
My CUDA version is 8.0 and I'm using tensorflow 1.01

@ThomasWarn
Copy link

I had the error and I 'fixed' it by closing my multiple instances of Jupyter and closing other applications. I'm new to working with tensorflow in general so it's likely this only fixed my problem.

@kl3eo
Copy link

kl3eo commented Mar 5, 2019

E tensorflow/stream_executor/cuda/cuda_dnn.cc:353] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I had this issue with 10.1 Cuda+cuDNN7.5 and TF 1.11 compiled from source with cuda. The script I was trying to use needed these lines inserted somewhere:
config = tf.ConfigProto() config.gpu_options.allow_growth = True

and then later:
sess = tf.Session(graph=detection_graph,config=config)

This done, a lot of "GPU out of memory errors" - but detection goes on very quickly as I suppose it should when we're using GPU. Thanks for sharing!

@HenryChanhy
Copy link

I faced the same issues.and use below line fixed it. check here get detail.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/extras/CUPTI/lib64

@sungwonida
Copy link

@EncodeTS I just added a minimal reproducible example to my first post. Could you check if it reproduces the problem on your machine? On my machine, one convolutional layer works but not two convolutional layers, which led me to think that the problem might be caused by some resource limitations.

Actually, I'm working on Ubuntu 18.04, not macOS, but this looks to make sense that it might be caused by some resource limitations. Me either faced the same issue on GTX 1050 ti (4 GB) but the issue has gone away when I run the same architecture on GTX 1080 ti (11 GB). Though all the environments are not the same between the two systems, I tried my best by utilizing the docker container.

@woshidandan
Copy link

This problem is generally related to the version of cuda and GPU memory, if former, the easiest way is to change your cuda version by Anaconda!if later, you can find some ways to solve in other answers.
这个问题一般与显存和cuda版本有关,如果尝试了上面的更改GPU memory的方法无效,考虑更改cuda版本,最简单的方法是不用去管系统装了什么cuda版本,直接在Anaconda中的项目环境下修改cuda版本即可,亲测有效。

@pathnirvana
Copy link

pathnirvana commented Jun 6, 2019

if you are still getting this issue, try the following. it worked for me
tf.config.gpu.set_per_process_memory_growth(True); tf.config.gpu.set_per_process_memory_fraction(0.4);

tensorflow 2 alpha
cuda 10.0
GTX 1650

@nayash
Copy link

nayash commented Jun 15, 2019

I have similar issue: CUDNN_STATUS_ALLOC_FAILED.
I broke my head for 3-4 hours. Finally fixed.
this indeed works, as mentioned above by many :
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

But the key is to write it immediately below "import tensorflow as tf" which I wasn't doing. I had written it after all the imports.

@liuzc188
Copy link

liuzc188 commented Jul 6, 2019

May be tensorflow-gpu version has problems, you should check your own versions try again and again, uninstall and install..... tensorflow-gpu找到对应的版本号然后卸载再重装

@imflash217
Copy link

imflash217 commented Jul 25, 2019

it worked for me when adding these lines of code to the begining of script @Codersadis

add the following code to the very beginning of the .py file, which solves my problem.

from future import print_function, division
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))

I am getting the same error with tensorflow-gpu == 1.8.0, cudnn version = 7.0.5 and cuda 9.1.85
, ubuntu 16.04 even after I add the above suggested solution.
Following is the stack-trace:

INFO - Waveunet Training - Running command 'run'
INFO - Waveunet Training - Started
SCRIPT START
EPOCH: 0
Dataset ready!
Training...
Sep_Vars: 10265550
Num of variables65
2019-07-25 05:10:09.872823: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-25 05:10:10.286584: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-25 05:10:10.286914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:00:05.0
totalMemory: 7.92GiB freeMemory: 7.83GiB
2019-07-25 05:10:10.286964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-07-25 05:10:10.640890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-25 05:10:10.640952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-07-25 05:10:10.640968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-07-25 05:10:10.641194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7566 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2019-07-25 05:10:27.643833: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 2054 of 4000
2019-07-25 05:10:35.917445: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:129] Shuffle buffer filled.
2019-07-25 05:10:36.175698: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-07-25 05:10:36.175820: E tensorflow/stream_executor/cuda/cuda_dnn.cc:463] possibly insufficient driver version: 384.183.0
2019-07-25 05:10:36.175842: E tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2019-07-25 05:10:36.175859: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

Please help

@Eugen2525
Copy link

I have similar issue: CUDNN_STATUS_ALLOC_FAILED.
I broke my head for 3-4 hours. Finally fixed.
this indeed works, as mentioned above by many :
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)

But the key is to write it immediately below "import tensorflow as tf" which I wasn't doing. I had written it after all the imports.

great reply, worked for me !!

@imflash217
Copy link

it worked for me when adding these lines of code to the begining of script @Codersadis
add the following code to the very beginning of the .py file, which solves my problem.
from future import print_function, division
import tensorflow as tf
from keras.backend.tensorflow_backend import set_session
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
set_session(tf.Session(config=config))

I am getting the same error with tensorflow-gpu == 1.8.0, cudnn version = 7.0.5 and cuda 9.1.85
, ubuntu 16.04 even after I add the above suggested solution.
Following is the stack-trace:

INFO - Waveunet Training - Running command 'run'
INFO - Waveunet Training - Started
SCRIPT START
EPOCH: 0
Dataset ready!
Training...
Sep_Vars: 10265550
Num of variables65
2019-07-25 05:10:09.872823: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-07-25 05:10:10.286584: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-07-25 05:10:10.286914: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1356] Found device 0 with properties: 
name: Quadro P4000 major: 6 minor: 1 memoryClockRate(GHz): 1.48
pciBusID: 0000:00:05.0
totalMemory: 7.92GiB freeMemory: 7.83GiB
2019-07-25 05:10:10.286964: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0
2019-07-25 05:10:10.640890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-07-25 05:10:10.640952: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929]      0 
2019-07-25 05:10:10.640968: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0:   N 
2019-07-25 05:10:10.641194: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7566 MB memory) -> physical GPU (device: 0, name: Quadro P4000, pci bus id: 0000:00:05.0, compute capability: 6.1)
2019-07-25 05:10:27.643833: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:94] Filling up shuffle buffer (this may take a while): 2054 of 4000
2019-07-25 05:10:35.917445: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:129] Shuffle buffer filled.
2019-07-25 05:10:36.175698: E tensorflow/stream_executor/cuda/cuda_dnn.cc:455] could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-07-25 05:10:36.175820: E tensorflow/stream_executor/cuda/cuda_dnn.cc:463] possibly insufficient driver version: 384.183.0
2019-07-25 05:10:36.175842: E tensorflow/stream_executor/cuda/cuda_dnn.cc:427] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
2019-07-25 05:10:36.175859: F tensorflow/core/kernels/conv_ops.cc:713] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms) 
Aborted (core dumped)

Please help

Changing the Nvidia driver to 396+ solved the issue for me.

@nwoyecid
Copy link

It has to do with the memory fraction available to load GPU resources to create cudnn handle, also known as per_process_gpu_memory_fraction.
Reducing this memory fraction by yourself will solve the error.

> sess_config = tf.ConfigProto(gpu_options =
> tf.GPUOptions(per_process_gpu_memory_fraction=0.7),
> allow_soft_placement = True)
> 
> with tf.Session(config=sess_config) as sess:
>      sess.run([whatever])

Use as small fraction as could fit in your memory. (In the code, I use 0.7, you can start with 0.3 or even smaller, then increase until you get the same error, that's your limit.)
Pass it to your tf.Session() or tf.train.MonitoredTrainingSession() or Supervisor's sv.managed_session() as config.

This should allow your GPU create a cudnn handle for your TensorFlow code.

@Kevin-Oudai
Copy link

I was getting the following error with tensorflow 2.0 in my conda environment.

2019-12-03 23:49:06.381259: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-03 23:49:07.220066: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-03 23:49:07.236411: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-03 23:49:07.247476: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-03 23:49:07.256881: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-03 23:49:07.269536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-03 23:49:07.281954: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-03 23:49:07.295302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-03 23:49:08.589865: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-03 23:49:08.599121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-03 23:49:08.610543: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-03 23:49:08.616005: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4627 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-12-03 23:49:58.521484: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-03 23:49:59.604517: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-03 23:50:04.209110: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-03 23:50:04.216670: E tensorflow/stream_executor/cuda/cuda_dnn.cc:333] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-12-03 23:50:04.226172: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
2019-12-03 23:50:04.234741: E tensorflow/stream_executor/cuda/cuda_dnn.cc:333] Error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
2019-12-03 23:50:04.244958: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node sequential/conv2d/Conv2D}}]]

so i added the following code to my CNN

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

My output is now

2019-12-04 00:10:07.708573: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudart64_100.dll
2019-12-04 00:10:11.643304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library nvcuda.dll
2019-12-04 00:10:12.753615: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-04 00:10:12.769498: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-04 00:10:12.783900: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-04 00:10:54.941468: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2019-12-04 00:10:55.372516: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties:
name: GeForce GTX 1660 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.59
pciBusID: 0000:01:00.0
2019-12-04 00:10:55.383385: I tensorflow/stream_executor/platform/default/dlopen_checker_stub.cc:25] GPU libraries are statically linked, skip dlopen check.
2019-12-04 00:10:55.406053: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0
2019-12-04 00:10:56.741665: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-12-04 00:10:56.747255: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165]      0
2019-12-04 00:10:56.752302: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0:   N
2019-12-04 00:10:56.756861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 4627 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1660 Ti, pci bus id: 0000:01:00.0, compute capability: 7.5)
2019-12-04 00:11:08.281356: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2019-12-04 00:11:08.934804: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll
2019-12-04 00:11:11.870237: W tensorflow/stream_executor/cuda/redzone_allocator.cc:312] Internal: Invoking ptxas not supported on Windows
Relying on driver to perform ptx compilation. This message will be only logged once.

As everyone suggested it is due to tensorflow using all of the GPU/GPUs. My CNN trains without error now.

@rpvelloso
Copy link

I was facing the same problem when using the community supported version of tensorflow inside a conda environment (i.e. using > conda install tensorflow-gpu )

Turns out this version is not actually good in all situations (even though I've been using it on other machines). The best version to use is the pip installable version https://www.tensorflow.org/install/pip inside a conda environment. When I did this everything worked.

That solved for me, thanks!

@RokoMijic
Copy link

This also resolved the issue for me.

GeForce GTX 1050, CUDA 10.0

Note: this is the only thing I can find that works in TF 2.0 for now. Thanks!

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

@jtiscione
Copy link

jtiscione commented Jan 4, 2020

This also resolved the issue for me.

GeForce GTX 1050, CUDA 10.0

Note: this is the only thing I can find that works in TF 2.0 for now. Thanks!

gpus = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

This didn't make any difference for me... TF 2.0, RTX 2060, CUDA 10.1, CuDNN 7.6

This is with 16 GB RAM, 6 GB video memory, and a basic MNIST toy model with one conv layer. No memory problems, just a stack trace.

No GPU problems at all with Pytorch, as usual

@magomar
Copy link

magomar commented Jan 10, 2020

In my case, I have two machines, both with RTX 2080Ti, TF 2.1, CUDA 10.1, CuDNN 7.6. One works, the other one raises the aforementioned error. Both machines have the same amount of RAM, 16GB. There are hardware differentes, though, like the CPU. But the problem is only occurring when using the GPU.

@panzhufeng
Copy link
Member

In my case, I have two machines, both with RTX 2080Ti, TF 2.1, CUDA 10.1, CuDNN 7.6. One works, the other one raises the aforementioned error. Both machines have the same amount of RAM, 16GB. There are hardware differentes, though, like the CPU. But the problem is only occurring when using the GPU.

Same platform, same problem

@Samaritan1011001
Copy link

Samaritan1011001 commented Apr 13, 2020

If you are using the latest tensorflow and keras. Try this from here, it worked for me:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  try:
    # Currently, memory growth needs to be the same across GPUs
    for gpu in gpus:
      tf.config.experimental.set_memory_growth(gpu, True)
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Memory growth must be set before GPUs have been initialized
    print(e)

@nitin-barthwal
Copy link

This one works for me.
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

This worked for me. Thanks

@ypaez
Copy link

ypaez commented Aug 21, 2020

@Samaritan1011001 your solution works for me thanks a lot.

@kitarikes
Copy link

@Samaritan1011001 your solution works for me, too! thanks xD !

@nic-00
Copy link

nic-00 commented Jun 2, 2021

This problem is generally related to the version of cuda and GPU memory, if former, the easiest way is to change your cuda version by Anaconda!if later, you can find some ways to solve in other answers.
这个问题一般与显存和cuda版本有关,如果尝试了上面的更改GPU memory的方法无效,考虑更改cuda版本,最简单的方法是不用去管系统装了什么cuda版本,直接在Anaconda中的项目环境下修改cuda版本即可,亲测有效。

不好意思请问一下要怎么在anaconda里面直接修改cuda的版本呢?感激不尽
May I know how to change cuda's version in anaconda prompt? thanks a lot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:awaiting tensorflower Status - Awaiting response from tensorflower type:build/install Build and install issues
Projects
None yet
Development

No branches or pull requests