New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keras LSTM Fail to find the dnn implementation #36508
Comments
@ARozental Could you please provide us with supporting files and complete stand alone code to replicate the issue in our environment. |
@Saduf2019
Also, I use ubuntu 18.04. |
@alonRozental I ran the code [on nightly] after un-commenting the LSTM line and did not face any issues, please find the gist here |
@Saduf2019 I'm running TF 2.1.0.
I would think that those 2 lines should do the same thing (please correct me if I'm wrong) but it seems only the second line works. |
@ARozental I ran the code on nightly ['2.2.0-dev20200210'] and on tensorflow==2.1.0, un-commenting the LSTM line as requested by you and did not face any issues, please find the gist of 2.1.0 here |
@Saduf2019 than I don't know how to replicate it on Colab, maybe it only occurs with specific hardware (ti 2080). In anyway, can you confirm that those 2 lines should do the exact same thing? if this is indeed the case we can look at the difference (that shouldn't exist) between the 2 implementations to find the bug. |
me too
|
@Lay4U @ARozental Please use the below code while importing tensorflow and let me know if the issue still persists. Thanks!
|
@gowthamkpr It doesn't help |
I confirm that it does not help |
1 similar comment
I confirm that it does not help |
Those two line will build different graph under the hood, but should produce same math result. Adding @houtoms from Nvidia side. Is there any recent change to the kernel CudnnRNN? |
I wasn't able to produce this issue on a GPU colab as well. I think this somehow indicate its a environment issue, we probably should check the cuda kernel version. |
From the error log, the cuDNN didn't successfully create the handler. So, it seems not to be a CuDNN RNN issue. Can you try some convolution examples to see if the cuDNN is able to create handler? @ARozental |
Ok I managed to make it work after fighting with CUDA 10.1 and 10.2 (10.2 works nice with 2.3 nightly) for a while, environments, OS and everything. Narrowed it to a seeming harmless line I was running The I managed to make it work more reliably running this right after importing tensorflow (and other libs, but I don't think it changes anything)
Is this a known bug or some unintended behaviour? |
Yes, simply works! Thank you. |
Why this is closed? I got the same error in ubuntu 20.04 jupyterlab '2.1.5' tensorflow 2.2.0 (with GPU) CUDA Version 10.1.105 when building a model in jupyter-lab using a kernel having tensorflow 2.2.0 Only thing that helped is the workaround presented earlier:
terveisin, Markus |
Hello, |
This worked for us when getting
Thanks @ElliotVilhelm |
also worked for me (tf 2.3). Does this mean CUDA was not installed correctly or is this a tensorflow bug? |
Worked for me. tks. |
It solved my problem. Using tf 2.2.0 with one 2070s. |
It worked for me, running GRU using TF 2.3.0 with one 2060. Thanks! |
This solves the problem for me as well. |
thx, solved the problem: |
I think a lot of the cuDNN related problems could be solved by adding these code. |
this solved it for me. What does this do exactly? |
Just getting this: Did that work for you? |
@JamieMoon Just close the terminal/python console and run the below code first, then your LSTM import tensorflow as tf |
I continue without solving this issue... Ubuntu 18.04 I have tried to compile with CUDA 10.1 and TF 2.1 but I continue without solving it. It starts to be a little frustrating This is what I obtain after fitting: Epoch 1/50 .2021-01-25 18:59:35.364099: E tensorflow/stream_executor/cuda/cuda_dnn.cc:329] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR All testings of the cuDnn and Cuda works well. |
Just had the same issue here, managed to fix with this solution My setup: |
Same issue here |
RuntimeError: Physical devices cannot be modified after being initialized |
I had the same issue. Updating tensorflow with |
I have the same problem. The solutions above doesn't work for me. OS: ubuntu 20.04
|
System information
uncommenting the LSTM layer will yield the following error:
working code:
The text was updated successfully, but these errors were encountered: