Skip to content

tf.while_loop with tf.keras.layers.LSTM broken #30639

@jkamalu

Description

@jkamalu

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): the july 12 p36 gpu 2.0 nightly preview
  • Python version: 3.6
  • CUDA/cuDNN version: 10/7
  • GPU model and memory: 3 GeForce GTX w/8 GB

Describe the current behavior

First, I want to mention that the LSTM not working with distributed strategies is already being looked into here: #29189 -- I wanted to highlight this as a separate issue, because it likely has a different source...

Basically, when dynamically decoding a sequence with an LSTM and tf.while_loop, the code breaks (see logs below for more detail). This does not happen with an RNN(LSTMCell) configuration, but the LSTM is the only CuDNN access point, aside from GRU (which also does not work in this configuration).

Describe the expected behavior

The code should use the optimized CuDNN LSTM implementation and behave as the RNN(LSTMCell) approach i.e. not fail.

Code to reproduce the issue
https://github.com/jkamalu/tensorflow_bugs/blob/master/LSTMGraphPlacement.py

Other info / logs

2019-07-12 11:37:10.386140: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1558] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-07-12 11:38:30.548248: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] function_optimizer failed: Invalid argument: Input 1 of node se_q3/seq_encoder/while/body/_195/TensorListPushBack_49 was passed int32 from se_q3/seq_encoder/while/body/_195/decoder_c/lstm_3/StatefulPartitionedCall:9 incompatible with expected variant.
2019-07-12 11:38:37.853257: E tensorflow/core/grappler/optimizers/meta_optimizer.cc:502] function_optimizer failed: Invalid argument: Input 1 of node se_q3/seq_encoder/while/body/_195/TensorListPushBack_49 was passed int32 from se_q3/seq_encoder/while/body/_195/decoder_c/lstm_3/StatefulPartitionedCall:9 incompatible with expected variant.
2019-07-12 11:38:39.689929: W tensorflow/core/common_runtime/process_function_library_runtime.cc:672] Ignoring multi-device function optimization failure: Invalid argument: Input 1 of node se_q3/seq_encoder/while/body/_195/TensorListPushBack_77 was passed int32 from se_q3/seq_encoder/while/body/_195/decoder_c/lstm_2/StatefulPartitionedCall:9 incompatible with expected variant.
2019-07-12 11:38:45.280991: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2019-07-12 11:38:45.755520: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at partitioned_function_ops.cc:113 : Invalid argument: Cannot place the graph because a reference or resource edge connects colocation groups with incompatible assigned devices: /job:localhost/replica:0/task:0/device:GPU:0 vs /job:localhost/replica:0/task:0/device:CPU:0. The edge src node is while_20/exit/_94 , and the dst node is while_0_RetVal
2019-07-12 11:38:45.755562: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: Cannot place the graph because a reference or resource edge connects colocation groups with incompatible assigned devices: /job:localhost/replica:0/task:0/device:GPU:0 vs /job:localhost/replica:0/task:0/device:CPU:0. The edge src node is while_20/exit/_94 , and the dst node is while_0_RetVal
[[{{node se_q3/seq_encoder/while/body/_195/decoder_c/lstm_2/StatefulPartitionedCall}}]]
[[If_9/else/_2424/gradients/while_grad/while_grad/body/_11561/gradients/TensorArrayV2Read/TensorListGetItem_grad/TensorListLength/TensorListPopBack/_1920]]
2019-07-12 11:38:45.755854: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Invalid argument: Cannot place the graph because a reference or resource edge connects colocation groups with incompatible assigned devices: /job:localhost/replica:0/task:0/device:GPU:0 vs /job:localhost/replica:0/task:0/device:CPU:0. The edge src node is while_20/exit/_94 , and the dst node is while_0_RetVal
[[{{node se_q3/seq_encoder/while/body/_195/decoder_c/lstm_2/StatefulPartitionedCall}}]]
[I 11:38:49.971 NotebookApp] Saving file at /SEQ3_LSTM_CUDA.ipynb

Metadata

Metadata

Assignees

Labels

TF 2.0Issues relating to TensorFlow 2.0comp:kerasKeras related issuestype:bugBug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions