TF2.0 beta on RTX 2080 throws Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #29632

srihari-humbarwadi · 2019-06-11T09:23:52Z

Please make sure that this is a build/installation issue. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:build_template

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04):18.04
Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/a
TensorFlow installed from (source or binary): source
TensorFlow version: 2.0.0-beta0
Python version: 3.6
Installed using virtualenv? pip? conda?: built whl and installed with virtualenv
Bazel version (if compiling from source): 0.25.0
GCC/Compiler version (if compiling from source): 7.3.0
CUDA/cuDNN version: 10.0 / 7.5.0
GPU model and memory: RTX 2080, driver 418.40.04

Describe the problem
I am trying to build the tf2.0 beta on my rtx system, and i can build successfully but i cannot run anything on i install the whl. i keep hitting

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I also tried using the prebuilt binary it still throws the same error. Strangely 1.13.1 works prefectly only if i build it from source.

Provide the exact sequence of commands / steps that you executed before running into the problem

Any other info / logs

Python 3.6.7 (default, Oct 22 2018, 11:32:17) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import tensorflow as tf                                                                                                                                                                             

In [2]: model = tf.keras.applications.ResNet50()                                                                                                                                                            
2019-06-11 14:12:10.599679: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-06-11 14:12:10.654528: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.655435: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:02:00.0
2019-06-11 14:12:10.655756: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-11 14:12:10.656708: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-11 14:12:10.672933: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-11 14:12:10.673320: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-11 14:12:10.674796: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-11 14:12:10.675938: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-11 14:12:10.679002: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-11 14:12:10.679234: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.680150: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.680897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-11 14:12:10.681515: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.682282: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.83
pciBusID: 0000:02:00.0
2019-06-11 14:12:10.682345: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-11 14:12:10.682365: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2019-06-11 14:12:10.682391: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2019-06-11 14:12:10.682412: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2019-06-11 14:12:10.682432: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2019-06-11 14:12:10.682451: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2019-06-11 14:12:10.682470: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-11 14:12:10.682548: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.683363: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.684127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2019-06-11 14:12:10.722429: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2019-06-11 14:12:10.842536: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-11 14:12:10.842661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-06-11 14:12:10.842716: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2019-06-11 14:12:10.867592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.868473: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.869242: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-11 14:12:10.869958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7248 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:02:00.0, compute capability: 7.5)

In [3]: model(tf.random.normal([5, 224, 224, 3]))                                                                                                                                                           
2019-06-11 14:13:21.027798: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-11 14:13:22.315487: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-06-11 14:13:22.318659: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

The text was updated successfully, but these errors were encountered:

srihari-humbarwadi · 2019-06-11T11:27:30Z

@ymodak @jvishnuvardhan

srihari-humbarwadi · 2019-06-11T19:23:35Z

Tried it on a fresh Ubuntu 18.04 installation and followed official scripts to install cuda and cudnn
Still facing the same issue.

# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804_10.0.130-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update

# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-410
# Reboot. Check that GPUs are visible using the command: nvidia-smi

# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
    cuda-10-0 \
    libcudnn7=7.4.1.5-1+cuda10.0  \
    libcudnn7-dev=7.4.1.5-1+cuda10.0


# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get update && \
        sudo apt-get install nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 \
        && sudo apt-get update \
        && sudo apt-get install -y --no-install-recommends libnvinfer-dev=5.0.2-1+cuda10.0

srihari-humbarwadi · 2019-06-11T19:38:20Z

The only workaround I could find is this

physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)

But is there any other way to fix this so that I wouldn't have to use this? This doesn't seem to be normal to me

inkcherry · 2019-09-23T08:17:50Z

The only workaround I could find is this
physical_devices = tf.config.experimental.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], True)
But is there any other way to fix this so that I wouldn't have to use this? This doesn't seem to be normal to me

I have the same problem,and I have 3GPU. By your way :
tf.config.experimental.set_memory_growth(physical_devices[0], True)
and I got ValueError: Memory growth cannot differ between GPU devices

and then:
tf.config.experimental.set_memory_growth(physical_devices[0], True)
tf.config.experimental.set_memory_growth(physical_devices[1], True)
tf.config.experimental.set_memory_growth(physical_devices[2], True)

I sloved the problem,thanks a lot.

jvishnuvardhan · 2019-09-26T06:16:38Z

@srihari-humbarwadi Sorry for the delay. Is this still an issue or was it resolved? Can you check with TF2.0 and let us know whether the issue persists with latest TF version. Thanks!

jvishnuvardhan · 2019-10-07T23:45:04Z

Automatically closing this out since I understand it to be resolved, but please let me know if I'm mistaken.Thanks!

tensorflow-bot · 2019-10-07T23:45:05Z

Are you satisfied with the resolution of your issue?
Yes
No

nikste · 2020-02-11T08:49:59Z

I still experience the issue with an RTX 2080 and the latest tensorflow version with cuda 10.1 and cudnn 7.6

farahmand-m · 2020-07-24T16:11:02Z

I built TensorFlow 2.3 from source, using CUDA 10.2 and cuDNN 7.6, Arch Linux, and I get a similar error:

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR

I've a GeForce 1660 Ti; the latest driver is installed, and PyTorch can use the libraries, no problem.

majnas · 2020-11-18T19:57:38Z

physical_devices = tf.config.experimental.list_physical_devices('GPU')
[tf.config.experimental.set_memory_growth(pd, True) for pd in physical_devices]

goldiegadde added the 2.0.0-beta0 label Jun 11, 2019

ymodak self-assigned this Jun 11, 2019

ymodak added the comp:gpu GPU related issues label Jun 11, 2019

ymodak assigned caisq and unassigned ymodak Jun 11, 2019

ymodak added stat:awaiting tensorflower Status - Awaiting response from tensorflower type:bug Bug labels Jun 11, 2019

gadagashwini-zz mentioned this issue Jul 1, 2019

Cannot Run Official TF2.0 Example in Official TF2.0 Docker Container #30207

Closed

ymodak assigned aaroey and unassigned caisq Aug 30, 2019

jvishnuvardhan added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Sep 26, 2019

jvishnuvardhan closed this as completed Oct 7, 2019

Saduf2019 mentioned this issue Mar 26, 2020

CUDNN_STATUS_INTERNAL_ERROR after hours of HyperParameter optimization. #37932

Closed

lvenugopalan added the TF 2.0 Issues relating to TensorFlow 2.0 label Apr 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF2.0 beta on RTX 2080 throws Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #29632

TF2.0 beta on RTX 2080 throws Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #29632

srihari-humbarwadi commented Jun 11, 2019

srihari-humbarwadi commented Jun 11, 2019

srihari-humbarwadi commented Jun 11, 2019

srihari-humbarwadi commented Jun 11, 2019

inkcherry commented Sep 23, 2019

jvishnuvardhan commented Sep 26, 2019 •

edited

jvishnuvardhan commented Oct 7, 2019

tensorflow-bot bot commented Oct 7, 2019

nikste commented Feb 11, 2020

farahmand-m commented Jul 24, 2020 •

edited

majnas commented Nov 18, 2020 •

edited

TF2.0 beta on RTX 2080 throws Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #29632

TF2.0 beta on RTX 2080 throws Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #29632

Comments

srihari-humbarwadi commented Jun 11, 2019

srihari-humbarwadi commented Jun 11, 2019

srihari-humbarwadi commented Jun 11, 2019

srihari-humbarwadi commented Jun 11, 2019

inkcherry commented Sep 23, 2019

jvishnuvardhan commented Sep 26, 2019 • edited

jvishnuvardhan commented Oct 7, 2019

tensorflow-bot bot commented Oct 7, 2019

nikste commented Feb 11, 2020

farahmand-m commented Jul 24, 2020 • edited

majnas commented Nov 18, 2020 • edited

jvishnuvardhan commented Sep 26, 2019 •

edited

farahmand-m commented Jul 24, 2020 •

edited

majnas commented Nov 18, 2020 •

edited