Fail to find the dnn implementation while using recurrent layers #45248

Avditvs · 2020-11-28T13:57:22Z

System information

OS Platform and Distribution: Ubuntu 18.04 running in WSL2
TensorFlow version): 2.3.1
Python version: 3.6.9
CUDA/cuDNN version: CUDA 10.1/ cuDNN 7.6.5.32
GPU model and memory: RTX 2060 6GB

Current behavior
I want to train a model containing Keras LSTM layers, however the following error occurs:

Jupyter output:
UnknownError: Fail to find the dnn implementation. [[{{node CudnnRNN}}]] [[sequential_2/lstm_1/PartitionedCall]] [Op:__inference_train_function_5270]

Console output:
OP_REQUIRES failed at cudnn_rnn_ops.cc:1510 : Unknown: Fail to find the dnn implementation.

Expected behavior
I expect the code to run since I am able to run Conv2D layers wich are properly accelerated by the GPU.
I have already tried multiple things such as using different Tensorflow/Cuda/cuDNN versions.
I also tried to enable the memory growth as described in #36508 but it did not work either.

Standalone code to reproduce the issue
The environment was set up by following the installation instructions (without installing the nvidia driver inside the VM as mentioned in the nvidia documentation ): https://www.tensorflow.org/install/gpu#install_cuda_with_apt

I was able to reproduce this issue by running the RNN tutorial available on the online Tensorflow documentation : https://www.tensorflow.org/guide/keras/rnn

I would appreciate any help to solve this issue.

The text was updated successfully, but these errors were encountered:

Avditvs · 2020-11-28T19:00:39Z

I finally found the way to solve the problem by reverting my Nvidia driver installed on Windows from 465 to 460, as mentioned in the note in the nvidia documentation : https://docs.nvidia.com/cuda/wsl-user-guide/index.html

Since using Windows Subsystem for Linux is becoming more and more common, why not adding a section in the documentation to set up Tensorflow with CUDA inside WSL ?

mihaimaruseac · 2020-12-07T21:59:18Z

We are currently discussing moving towards Windows GPU support only via WSL.

ICG14 · 2021-01-26T10:37:15Z

I continue without solving this issue...
I have tried all that you have mentioned but it continues the same problem
my OS is:

Ubuntu 18.04
CUDA 10.0
Tensorflow 2.0
Nvidia-driver 460 (Although I have tried with 450 and it also does not work)
geForce RTX2060
Python 3.7

I have tried to compile with CUDA 10.1 and TF 2.1 but I continue without solving it. It starts to be a little frustrating

This is what I obtain after fitting:

Epoch 1/50
2021-01-25 18:59:34.964218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-25 18:59:35.096029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
128/2156 [>.............................] - ETA: 15sWARNING:tensorflow:Can save best model only with val_loss available, skipping.

.2021-01-25 18:59:35.364099: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-25 18:59:35.364136: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cudnn_rnn_ops.cc:1510 : Unknown: Fail to find the dnn implementation.
2021-01-25 18:59:35.364158: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
2021-01-25 18:59:35.364356: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: {{function_node __forward_cudnn_lstm_with_fallback_2517_specialized_for_sequential_lstm_StatefulPartitionedCall_at___inference_distributed_function_3196}} {{function_node __forward_cudnn_lstm_with_fallback_2517_specialized_for_sequential_lstm_StatefulPartitionedCall_at___inference_distributed_function_3196}} Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]]

All testings of the cuDnn and Cuda works well.

pupscub · 2021-05-16T12:15:00Z

I continue without solving this issue...
I have tried all that you have mentioned but it continues the same problem
my OS is:

Ubuntu 18.04
CUDA 10.0
Tensorflow 2.0
Nvidia-driver 460 (Although I have tried with 450 and it also does not work)
geForce RTX2060
Python 3.7

I have tried to compile with CUDA 10.1 and TF 2.1 but I continue without solving it. It starts to be a little frustrating

This is what I obtain after fitting:

Epoch 1/50
2021-01-25 18:59:34.964218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-25 18:59:35.096029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
128/2156 [>.............................] - ETA: 15sWARNING:tensorflow:Can save best model only with val_loss available, skipping.

.2021-01-25 18:59:35.364099: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-25 18:59:35.364136: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cudnn_rnn_ops.cc:1510 : Unknown: Fail to find the dnn implementation.
2021-01-25 18:59:35.364158: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
2021-01-25 18:59:35.364356: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: {{function_node __forward_cudnn_lstm_with_fallback_2517_specialized_for_sequential_lstm_StatefulPartitionedCall_at___inference_distributed_function_3196}} {{function_node __forward_cudnn_lstm_with_fallback_2517_specialized_for_sequential_lstm_StatefulPartitionedCall_at___inference_distributed_function_3196}} Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]]

All testings of the cuDnn and Cuda works well.

Did u find any solutions?
I have same system configurations and facing the same issue while running my dl model

sanatmpa1 · 2021-09-01T10:10:59Z

@iamMOY,

Can you take a look at this link to know about tested configurations and please update to latest stable version i.e TF 2.6.0 and create a new issue if you face any. Thanks!

sanatmpa1 · 2021-09-01T10:12:56Z

@Avditvs,

As the problem has been fixed after you have downgraded the NVIDIA driver. Can you confirm if we are good to close this issue? Thanks!

google-ml-butler · 2021-09-08T10:42:11Z

This issue has been automatically marked as stale because it has no recent activity. It will be closed if no further activity occurs. Thank you.

google-ml-butler · 2021-09-15T10:55:07Z

Closing as stale. Please reopen if you'd like to work on this further.

google-ml-butler · 2021-09-15T10:55:09Z

Are you satisfied with the resolution of your issue?
Yes
No

Avditvs added the type:bug Bug label Nov 28, 2020

google-ml-butler bot assigned ravikyram Nov 28, 2020

ravikyram added comp:gpu GPU related issues TF 2.3 Issues related to TF 2.3 labels Nov 29, 2020

ravikyram assigned jvishnuvardhan and unassigned ravikyram Nov 29, 2020

ravikyram added type:build/install Build and install issues and removed type:bug Bug labels Nov 29, 2020

jvishnuvardhan assigned mihaimaruseac and MarkDaoust and unassigned jvishnuvardhan Dec 7, 2020

jvishnuvardhan added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Dec 8, 2020

sanatmpa1 self-assigned this Sep 1, 2021

sanatmpa1 added stat:awaiting response Status - Awaiting response from author and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Sep 1, 2021

google-ml-butler bot added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Sep 8, 2021

google-ml-butler bot closed this as completed Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail to find the dnn implementation while using recurrent layers #45248

Fail to find the dnn implementation while using recurrent layers #45248

Avditvs commented Nov 28, 2020 •

edited

Avditvs commented Nov 28, 2020 •

edited

mihaimaruseac commented Dec 7, 2020

ICG14 commented Jan 26, 2021

pupscub commented May 16, 2021 •

edited

sanatmpa1 commented Sep 1, 2021

sanatmpa1 commented Sep 1, 2021

google-ml-butler bot commented Sep 8, 2021

google-ml-butler bot commented Sep 15, 2021

google-ml-butler bot commented Sep 15, 2021

Fail to find the dnn implementation while using recurrent layers #45248

Fail to find the dnn implementation while using recurrent layers #45248

Comments

Avditvs commented Nov 28, 2020 • edited

Avditvs commented Nov 28, 2020 • edited

mihaimaruseac commented Dec 7, 2020

ICG14 commented Jan 26, 2021

pupscub commented May 16, 2021 • edited

sanatmpa1 commented Sep 1, 2021

sanatmpa1 commented Sep 1, 2021

google-ml-butler bot commented Sep 8, 2021

google-ml-butler bot commented Sep 15, 2021

google-ml-butler bot commented Sep 15, 2021

Avditvs commented Nov 28, 2020 •

edited

Avditvs commented Nov 28, 2020 •

edited

pupscub commented May 16, 2021 •

edited