New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TF 2.0] constant folding failed: invalid argument: unsupported type: 21 #29525
Comments
Have the same issue on a TF2.0 GPU beta0. It really influences performance. |
Hi @vejvarm What kind of performance do you mean? Training speed or accuracy? |
Hi @llan-ml, sorry for not ellaborating on that. By performance I mean the training speed. If I remember correctly, with the warning it took about 2 seconds/batch while without it I'm at 2 to 4 batches/second. So roughly 4 to 8 times slowdown with the warning. Not really sure about a specific number, but it was significant. As of accuracy, I haven't had the time to run the model for long enough to see if it has some inpact on that. |
@llan-ml I tried to reproducing the issue on colab with latest tf-nightly-gpu-2.0-preview but i did not get any error. Can you try once and let us know if that still an issue. Thanks! |
Just tried it and to my knowledge it is still there as of 2.0.0.dev20190614. It's just not written dirrectly to the cell output as it is not an error but a warning. It can be found in the runtime logs of the notebook:
|
@gadagashwini I tested with |
same issue on tf-gpu 1.14 now |
@rmlarsen this looks like a grappler issue, can you triage? |
I am having a similar issue, also on TensorFlow 2.0 beta with GPU enabled. Interestingly, hiding the GPU away from Tensorflow (using System information
Code to reproduce the issue
Error with GPU enabled
Error with GPU disabled
|
Hi, I did some additional testing based on my previous bug-yielding example and would like to report on it, in hope that it may help track down, and ultimately fix, the issue at stake. Setting and consequences What I did was getting rid of sequences masking for the BiLSTM layers, thus using a less-general model expecting batches of same-length sequences. In this case, I no longer encounter GPU memory leakage (at least, not something that would make my computer crash on the first run of fitting the model), however an optimization warning is raised - and I have no idea whether it relates to the initial issue or not. It shows up both with and without enabling the use of the GPU, and for each use of the model (not just for the fitting process). Warning message
Code In the code below, I allow distinct batches to contain sequences of different length, however I also made a test using a strict parameter (i.e. setting the
I hope this helps solving the initial issue. Please let me know if there is any additional info I can provide or test I can run to help. At the moment, not being able to fit models with LSTM layers using properly-masked variable-length sequences is quite an issue to put code into production under TensorFlow 2.0. I know this is the whole point of a beta release (having a not-yet-quite-stable version out to identify issued that need solving before the actual release), but the programming logic has been so greatly altered as compared with TF 1.x that it would also be unpractical not to start taking the step (getting used to Eager execution demands an important effort, after having extensively used the low-level placeholder / session API)... |
Note: this issue is quite similar to the newly-opened #30263 |
Additional test/results (sorry for the multiplication of messages - I really want to provide as much info as possible, hoping it can help solve the issue):
Code:
Conclusion:
|
I also run into this issue when using masking on a GRU/LSTM layer, though running on CPU does not prevent the memory from blowing up and crashing the machine. In fact, even when running on GPU, system memory maxes out, though it looks as though the printed errors imply that GPU memory has been completely filled as well. Removing the masking, however allows training to occur without issue, though the "constant folding failed: Invalid argument: Unsupported type: 21" message still occurs. |
Hi, |
Thank you for sharing this. The issue seems to be at the grappler level, which if I am not mistaken is indeed the mechanism that chooses the backend kernel to use, which can be a CuDNN one... |
Interestingly, I am encountering this issue in TF 1.14, in TF 2.0b1 installed through pip, but not in TF 2.0b1 installed from source using the r2.0 branch, and not always in TF 2.0b1 installed from source using yesterday's state of the master branch. Using this issue's code, on the latter installation, I have a distinct bug, namely repeated prints similar to |
Edit: I should note that I am running on the gpu nightly pip build as of the time stamp on this comment. Another interesting piece of narrowing information. In the piece of code below, everything runs without a hitch if the for loop (tf.while_loop behind the scenes) is removed. That is... Without for loop: tf function routine runs twice, code runs ad infinitum With for loop: tf function routine runs twice, graph placement issue and and code breaks Here's the code: https://github.com/jkamalu/tensorflow_bugs/blob/master/LSTMGraphPlacement.py Another thing worth noting is that this issue appears even without the while loop with tensorflow GPU distributed strategies as seen #29189 @pandrey-fr a note: if the cudnn implementation is not important to you (I don't know why it wouldn't be, but just in case), you can wrap the LSTMCell layer in the RNN layer and it works fine... another hint that this error might be in the optimized implementation. |
The warning should go away in the next nightly. I'm looking into the original issue with unsupported types in constant folding. |
The issue is that the error handling in many places in Grappler is much too conservative. In this case we bail completely out of folding because we fail to convert a constant of an unknown type early. I'll work on making the code more robust in this sense. |
The particular error in this case was due to ZerosLike being overloaded for DT_VARIANT types: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/constant_op.cc#L267 I am submitting a fix now. |
…or DT_RESOURCE. Addresses: #29525 PiperOrigin-RevId: 257703800
Fix submitted: 2417464 |
Great, thank you @rmlarsen! |
As announced by @rmlarsen, the fix (which is now included in the nightly build) removes the error message ; however it appears (at least in my case) that LSTM layers with masking still won't be moved to the GPU (when Eager is enabled at least - I am still trying to figure out whether it is the case with Eager disabled), which is somehow confusing. Do you have any idea why this is the case? |
Do you mean they won't be moved to the GPU or that the graph won't be built with the CuDNN implementation? My bootleg LSTM layers (see below) exist on the GPU with the standard implementation (I verify this by watching nvidia-smi). I use masking (right-padding, so TF v2.0 CuDNN compatible), but end up having to use RNN-wrapped LSTMCell instances, which don't use the CuDNN implementation. It should be noted that in a while loop for dynamic decoding, the GPU enabled, CuDNN compatible tf.keras.layers.LSTM implementation does not function, nor does this specific setup work (even without the while loop) on multiple GPUs via a distributed strategy. |
To be honest I am not quite sure... What I did was using a If you have any advice as to how to properly keep track of where operations are being performed (maybe also when Eager execution is disabled), I would be glad to use them! |
Hi @rmlarsen, I just wanted to let you know that errors resembling this decrease in speed were reintroduced by later nightly builds. This isn't a request for a fix (I downgraded to the July 24 nightly and everything works fine now), but I thought you might like to know just in case it's a simple thing. With the same code (multi-gpu setting on TF v2 with LSTM) ... On the July 24 build... model trains quickly on all GPUs and is correct and gives spurious error messages
On the August 12 build... model trains on all GPUs and is correct but takes ~50 times more time. Not an exaggeration.
|
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
Recent versions of Tensorflow Keras will automatically switch between cuDNN and Tensorflow implementations. The trained parameters work regardless of the selected implementation. The conditions for using the cuDNN implementation are documented at: https://www.tensorflow.org/api_docs/python/tf/keras/layers/LSTM They boil down to: 1. a NVIDIA GPU is available, 2. certain hyper parameters (e.g. activations) are set to specific values. If the cuDNN implementation is selected, this results in a nice speedup. The Tensorflow requirements are bumped to 1.15.0. This setup fails with 1.14.0 with a constant folding error in Grappler: tensorflow/tensorflow#29525
System information
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): tf-nightly-gpu-2.0-preview 2.0.0.dev20190606
Python version: 3.6.5
Code to reproduce the issue
Other info / logs
Print:
Related to #28626 .
The text was updated successfully, but these errors were encountered: