tf.while_loop with tf.keras.layers.LSTM broken

**System information**
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
- Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): the july 12 p36 gpu 2.0 nightly preview
- Python version: 3.6
- CUDA/cuDNN version: 10/7
- GPU model and memory: 3 GeForce GTX w/8 GB 

**Describe the current behavior**

First, I want to mention that the LSTM not working with distributed strategies is already being looked into here: https://github.com/tensorflow/tensorflow/issues/29189 -- I wanted to highlight this as a separate issue, because it likely has a different source...

Basically, when dynamically decoding a sequence with an LSTM and tf.while_loop, the code breaks (see logs below for more detail). This does not happen with an RNN(LSTMCell) configuration, but the LSTM is the only CuDNN access point, aside from GRU (which also does not work in this configuration).

**Describe the expected behavior**

The code should use the optimized CuDNN LSTM implementation and behave as the RNN(LSTMCell) approach i.e. not fail.

**Code to reproduce the issue**
https://github.com/jkamalu/tensorflow_bugs/blob/master/LSTMGraphPlacement.py

**Other info / logs**

2019-07-12 11:37:10.386140: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1558] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set.  If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU.  To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-07-12 11:38:30.548248: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] function_optimizer failed: Invalid argument: Input 1 of node se_q3/seq_encoder/while/body/_195/TensorListPushBack_49 was passed int32 from se_q3/seq_encoder/while/body/_195/decoder_c/lstm_3/StatefulPartitionedCall:9 incompatible with expected variant.
2019-07-12 11:38:37.853257: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] function_optimizer failed: Invalid argument: Input 1 of node se_q3/seq_encoder/while/body/_195/TensorListPushBack_49 was passed int32 from se_q3/seq_encoder/while/body/_195/decoder_c/lstm_3/StatefulPartitionedCall:9 incompatible with expected variant.
2019-07-12 11:38:39.689929: W tensorflow/core/common_runtime/process_function_library_runtime.cc:672] Ignoring multi-device function optimization failure: Invalid argument: Input 1 of node se_q3/seq_encoder/while/body/_195/TensorListPushBack_77 was passed int32 from se_q3/seq_encoder/while/body/_195/decoder_c/lstm_2/StatefulPartitionedCall:9 incompatible with expected variant.
2019-07-12 11:38:45.280991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-07-12 11:38:45.755520: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at partitioned_function_ops.cc:113 : Invalid argument: Cannot place the graph because a reference or resource edge connects colocation groups with incompatible assigned devices: /job:localhost/replica:0/task:0/device:GPU:0 vs /job:localhost/replica:0/task:0/device:CPU:0. The edge src node is while_20/exit/_94 , and the dst node is while_0_RetVal
2019-07-12 11:38:45.755562: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: Cannot place the graph because a reference or resource edge connects colocation groups with incompatible assigned devices: /job:localhost/replica:0/task:0/device:GPU:0 vs /job:localhost/replica:0/task:0/device:CPU:0. The edge src node is while_20/exit/_94 , and the dst node is while_0_RetVal
	 [[{{node se_q3/seq_encoder/while/body/_195/decoder_c/lstm_2/StatefulPartitionedCall}}]]
	 [[If_9/else/_2424/gradients/while_grad/while_grad/body/_11561/gradients/TensorArrayV2Read/TensorListGetItem_grad/TensorListLength/TensorListPopBack/_1920]]
2019-07-12 11:38:45.755854: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: Cannot place the graph because a reference or resource edge connects colocation groups with incompatible assigned devices: /job:localhost/replica:0/task:0/device:GPU:0 vs /job:localhost/replica:0/task:0/device:CPU:0. The edge src node is while_20/exit/_94 , and the dst node is while_0_RetVal
	 [[{{node se_q3/seq_encoder/while/body/_195/decoder_c/lstm_2/StatefulPartitionedCall}}]]
[I 11:38:49.971 NotebookApp] Saving file at /SEQ3_LSTM_CUDA.ipynb


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tf.while_loop with tf.keras.layers.LSTM broken #30639

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

tf.while_loop with tf.keras.layers.LSTM broken #30639

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions