Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

keras LSTM Fail to find the dnn implementation #36508

Closed
ARozental opened this issue Feb 6, 2020 · 50 comments
Closed

keras LSTM Fail to find the dnn implementation #36508

ARozental opened this issue Feb 6, 2020 · 50 comments
Assignees
Labels
comp:keras Keras related issues TF 2.1 for tracking issues in 2.1 release type:bug Bug

Comments

@ARozental
Copy link

System information

  • CUDA/cuDNN version: 10.1
  • GPU model and memory: GeForce RTX 2080
  • TF 2.1.0:

uncommenting the LSTM layer will yield the following error:

UnknownError:  [_Derived_]  Fail to find the dnn implementation.
	 [[{{node CudnnRNN}}]]
	 [[sequential_6/bidirectional_2/backward_lstm_3/StatefulPartitionedCall]]
	 [[Reshape_11/_38]] [Op:__inference_distributed_function_39046]

working code:

model = tf.keras.Sequential([
    tf.keras.layers.Embedding(encoder.vocab_size, 64),
    #tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])
history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset, 
                    validation_steps=30)
@Saduf2019 Saduf2019 assigned Saduf2019 and unassigned ravikyram Feb 7, 2020
@Saduf2019 Saduf2019 added comp:keras Keras related issues TF 2.1 for tracking issues in 2.1 release labels Feb 7, 2020
@Saduf2019
Copy link
Contributor

@ARozental Could you please provide us with supporting files and complete stand alone code to replicate the issue in our environment.

@alonRozental
Copy link

alonRozental commented Feb 9, 2020

@Saduf2019
the code is from one of the TF official tutorials and the working version is attached here, uncommenting the LSTM line will raise the error:

from __future__ import absolute_import, division, print_function, unicode_literals
import os
import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.python.client import device_lib

dataset, info = tfds.load('imdb_reviews/subwords8k', with_info=True,
                          as_supervised=True)
train_dataset, test_dataset = dataset['train'], dataset['test']

BUFFER_SIZE = 10000
BATCH_SIZE = 64

train_dataset = train_dataset.shuffle(BUFFER_SIZE)
train_dataset = train_dataset.padded_batch(BATCH_SIZE, train_dataset.output_shapes)

test_dataset = test_dataset.padded_batch(BATCH_SIZE, test_dataset.output_shapes)
encoder = info.features['text'].encoder


model = tf.keras.Sequential([
    tf.keras.layers.Embedding(encoder.vocab_size, 64),
    #tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])
model.compile(loss='binary_crossentropy',
              optimizer=tf.keras.optimizers.Adam(1e-4),
              metrics=['accuracy'])
history = model.fit(train_dataset, epochs=10,
                    validation_data=test_dataset, 
                    validation_steps=30)

Also, I use ubuntu 18.04.
Thanks.

@Saduf2019
Copy link
Contributor

Saduf2019 commented Feb 10, 2020

@alonRozental I ran the code [on nightly] after un-commenting the LSTM line and did not face any issues, please find the gist here

@Saduf2019 Saduf2019 added the stat:awaiting response Status - Awaiting response from author label Feb 10, 2020
@alonRozental
Copy link

alonRozental commented Feb 10, 2020

@Saduf2019 I'm running TF 2.1.0.
I don't think the problem exists in TF1 which is used in the notebook.
also making the following change makes the code work:

    #tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Bidirectional(tf.keras.layers.RNN(tf.keras.layers.LSTMCell(64))),

I would think that those 2 lines should do the same thing (please correct me if I'm wrong) but it seems only the second line works.

@Saduf2019
Copy link
Contributor

@ARozental I ran the code on nightly ['2.2.0-dev20200210'] and on tensorflow==2.1.0, un-commenting the LSTM line as requested by you and did not face any issues, please find the gist of 2.1.0 here

@alonRozental
Copy link

@Saduf2019 than I don't know how to replicate it on Colab, maybe it only occurs with specific hardware (ti 2080). In anyway, can you confirm that those 2 lines should do the exact same thing? if this is indeed the case we can look at the difference (that shouldn't exist) between the 2 implementations to find the bug.

@Saduf2019 Saduf2019 added type:bug Bug and removed stat:awaiting response Status - Awaiting response from author labels Feb 11, 2020
@Saduf2019 Saduf2019 assigned gowthamkpr and unassigned Saduf2019 Feb 11, 2020
@Lay4U
Copy link

Lay4U commented Feb 11, 2020

me too

tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_ALLOC_FAILED
2020-02-12 04:48:50.916938: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cudnn_rnn_ops.cc:1510 : Unknown: Fail to find the dnn implementation.
2020-02-12 04:48:50.923690: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Fail to find the dnn implementation.
         [[{{node CudnnRNN}}]]
2020-02-12 04:48:50.931195: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: {{function_node __inference_cudnn_lstm_with_fallback_1954_specialized_for_sequential_1_lstm_StatefulPartitionedCall_at___inference_distributed_function_2139}} {{function_node __inference_cudnn_lstm_with_fallback_1954_specialized_for_sequential_1_lstm_StatefulPartitionedCall_at___inference_distributed_function_2139}} Fail to find the dnn implementation.
         [[{{node CudnnRNN}}]]
         [[sequential_1/lstm/StatefulPartitionedCall]]

@gowthamkpr
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

@gowthamkpr gowthamkpr added the stat:awaiting response Status - Awaiting response from author label Feb 11, 2020
@alonRozental
Copy link

@gowthamkpr It doesn't help

@olalakul
Copy link

olalakul commented Mar 1, 2020

I confirm that it does not help

1 similar comment
@Yougigun
Copy link

Yougigun commented Mar 2, 2020

I confirm that it does not help

@gowthamkpr gowthamkpr removed the stat:awaiting response Status - Awaiting response from author label Mar 4, 2020
@gowthamkpr gowthamkpr assigned qlzh727 and unassigned gowthamkpr Mar 4, 2020
@gowthamkpr gowthamkpr added the stat:awaiting tensorflower Status - Awaiting response from tensorflower label Mar 4, 2020
@qlzh727
Copy link
Member

qlzh727 commented Mar 4, 2020

@Saduf2019 I'm running TF 2.1.0.
I don't think the problem exists in TF1 which is used in the notebook.
also making the following change makes the code work:

    #tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
    tf.keras.layers.Bidirectional(tf.keras.layers.RNN(tf.keras.layers.LSTMCell(64))),

I would think that those 2 lines should do the same thing (please correct me if I'm wrong) but it seems only the second line works.

Those two line will build different graph under the hood, but should produce same math result.
The first line will use cudnn kernel on GPU if GPU is available, whereas the second line will use generic kernel on GPU.

Adding @houtoms from Nvidia side. Is there any recent change to the kernel CudnnRNN?

@qlzh727
Copy link
Member

qlzh727 commented Mar 4, 2020

I wasn't able to produce this issue on a GPU colab as well. I think this somehow indicate its a environment issue, we probably should check the cuda kernel version.

@kaixih
Copy link
Contributor

kaixih commented Mar 5, 2020

From the error log, the cuDNN didn't successfully create the handler. So, it seems not to be a CuDNN RNN issue. Can you try some convolution examples to see if the cuDNN is able to create handler? @ARozental

@sousandrei
Copy link

sousandrei commented Jun 6, 2020

Ok I managed to make it work after fighting with CUDA 10.1 and 10.2 (10.2 works nice with 2.3 nightly) for a while, environments, OS and everything.

Narrowed it to a seeming harmless line

I was running tf.test.gpu_device_name() to check there was a GPU and print its name. That command when run at any time makes the model fail on train with the mentioned error: Unknown: Fail to find the dnn implementation

The tf.config.experimental.set_visible_devices command that @shaoeChen mentioned didn't change anything for me so I removed it.

I managed to make it work more reliably running this right after importing tensorflow (and other libs, but I don't think it changes anything)

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_memory_growth(device=gpus[0], enable=True)

Is this a known bug or some unintended behaviour?

@geetachavan1 geetachavan1 added this to Done in TensorFlow 2.3.0 Jun 9, 2020
@Verythai
Copy link

Just a heads up I had this error but I noticed in the output this error as well

Loaded runtime CuDNN library: 7.1.3 but source was compiled with: 7.6.4.  CuDNN library major and minor version needs to match or have higher minor version in case of CuDNN 7.0 or later version.

Resolved by updating my conda env with

conda install -c anaconda cudnn

Yes, simply works! Thank you.

@paapu88
Copy link

paapu88 commented Jul 11, 2020

Why this is closed? I got the same error in ubuntu 20.04 jupyterlab '2.1.5' tensorflow 2.2.0 (with GPU) CUDA Version 10.1.105 when building a model in jupyter-lab using a kernel having tensorflow 2.2.0

Only thing that helped is the workaround presented earlier:

from tensorflow.keras.layers import RNN, LSTMCell
def build_model(feature_count=feature_count, seq_len=seq_len):
    inputs = tensorflow.keras.Input(shape=(seq_len, feature_count))
    X = RNN(LSTMCell(units=seq_len), input_shape=(seq_len, feature_count), return_sequences=True, stateful=False)(inputs)

terveisin, Markus

@wbadry
Copy link

wbadry commented Jul 14, 2020

@Lay4U @ARozental Please use the below code while importing TensorFlow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Hello,
Thanks @gowthamkpr 👍
This solved my problem. My configuration is:
OS: Windows 10 x64
Python : 3.6
TensorFlow-GPU : 2.2.0
Cuda : 10.1
Cudnn : 7.6.5

@paulmwatson
Copy link

conda install -c anaconda cudnn

This worked for us when getting

tensorflow.python.framework.errors_impl.UnknownError:  [_Derived_]  Fail to find the dnn implementation.

Thanks @ElliotVilhelm

@huydhoang
Copy link

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_visible_devices(devices=gpus[1], device_type='GPU')
tf.config.experimental.set_memory_growth(device=gpus[1], enable=True)

above work for me.

also worked for me (tf 2.3). Does this mean CUDA was not installed correctly or is this a tensorflow bug?

@marcosclima
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Worked for me. tks.
Running BiLSTM on TF2.1 with two 2080S

@ziliangok
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

It solved my problem. Using tf 2.2.0 with one 2070s.

@vaecole
Copy link

vaecole commented Oct 14, 2020

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

It worked for me, running GRU using TF 2.3.0 with one 2060. Thanks!

@TaWeiYeh
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

This solves the problem for me as well.
OS: Ubuntu 18.04
Python : 3.6.9
TensorFlow-GPU : 2.3.0
Cuda : 10.1
Cudnn : 7.6.5

@RRSBG
Copy link

RRSBG commented Nov 16, 2020

thx, solved the problem:
linux mint 20
geforce RTX 2060

@leimao
Copy link

leimao commented Dec 12, 2020

I think a lot of the cuDNN related problems could be solved by adding these code.
https://leimao.github.io/blog/TensorFlow-cuDNN-Failure/

@3d-illusions
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

this solved it for me. What does this do exactly?

@JamieMoon
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Just getting this:
RuntimeError: Physical devices cannot be modified after being initialized

Did that work for you?

@shubhamdo
Copy link

@JamieMoon Just close the terminal/python console and run the below code first, then your LSTM

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

@ICG14
Copy link

ICG14 commented Jan 25, 2021

I continue without solving this issue...
I have tried all that you have mentioned but it continues the same problem
my OS is:

Ubuntu 18.04
CUDA 10.0
Tensorflow 2.0
Nvidia-driver 460 (Although I have tried with 450 and it also does not work)
geForce RTX2060
Python 3.7

I have tried to compile with CUDA 10.1 and TF 2.1 but I continue without solving it. It starts to be a little frustrating

This is what I obtain after fitting:

Epoch 1/50
2021-01-25 18:59:34.964218: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-01-25 18:59:35.096029: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
128/2156 [>.............................] - ETA: 15sWARNING:tensorflow:Can save best model only with val_loss available, skipping.

.2021-01-25 18:59:35.364099: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2021-01-25 18:59:35.364136: W tensorflow/core/framework/op_kernel.cc:1655] OP_REQUIRES failed at cudnn_rnn_ops.cc:1510 : Unknown: Fail to find the dnn implementation.
2021-01-25 18:59:35.364158: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
2021-01-25 18:59:35.364356: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Unknown: {{function_node __forward_cudnn_lstm_with_fallback_2517_specialized_for_sequential_lstm_StatefulPartitionedCall_at___inference_distributed_function_3196}} {{function_node __forward_cudnn_lstm_with_fallback_2517_specialized_for_sequential_lstm_StatefulPartitionedCall_at___inference_distributed_function_3196}} Fail to find the dnn implementation.
[[{{node CudnnRNN}}]]
[[sequential/lstm/StatefulPartitionedCall]]

All testings of the cuDnn and Cuda works well.

@marcelmotta
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

Just had the same issue here, managed to fix with this solution

My setup:
Windows 10
CUDA 11.2
Tensorflow 2.3
Nvidia Driver 460.x
Geforce RTX 2060
Python 3.8

@trifwn
Copy link

trifwn commented Apr 22, 2021

Same issue here
I tried all the aforementioned solutions. None seems to resolve the issue

@this-is-shashank
Copy link

@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!

import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
tf.config.experimental.set_memory_growth(physical_devices[0], enable=True)

RuntimeError: Physical devices cannot be modified after being initialized

@frankl1
Copy link

frankl1 commented Oct 4, 2021

I had the same issue. Updating tensorflow with pip install -U tensorflow solved it

@cloudy-sfu
Copy link

I have the same problem. The solutions above doesn't work for me.

OS: ubuntu 20.04
Python: 3.11
Tensorflow: 2.12.0
cuda: 11.8

import tensorflow as tf

l0 = tf.keras.layers.Input(shape=(x.shape[1], x.shape[2]))
_, l1_h_t, _ = tf.keras.layers.LSTM(64, return_state=True)(l0)
l2 = tf.keras.layers.Dense(128, activation='relu')(l1_h_t)
l3 = tf.keras.layers.Dense(128, activation='relu')(l2)
l5 = tf.keras.layers.Dense(32, activation='relu')(l3)
l6 = tf.keras.layers.Dense(1, activation='linear')(l5)
my_model = tf.keras.Model(l0, l6)
my_model.compile(optimizer='adam', loss='mse')
tensorboard = tf.keras.callbacks.TensorBoard(log_dir=f'raw/4_tensorboard/')
stop_early = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=60)
save_best = tf.keras.callbacks.ModelCheckpoint('raw/4_lstm.h5', monitor='val_loss', save_best_only=True)
x_train, x_valid, y_train, y_valid = train_test_split(x, y, train_size=0.8, random_state=974238)
history = my_model.fit(
    x_train, y_train, validation_data=(x_valid, y_valid),
    epochs=1000, batch_size=1200, callbacks=[stop_early, save_best, tensorboard]
)
y_hat = my_model.predict(x)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues TF 2.1 for tracking issues in 2.1 release type:bug Bug
Projects
Development

No branches or pull requests