Skip to content

TF 2.0: Cannot use recurrent_dropout with LSTMs/GRUs #29187

Closed
@sbagroy986

Description

@sbagroy986

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): No (one line modification to stock example)
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 14.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): tensorflow-gpu==2.0.0-alpha0 (also fails with every other tf 2.0 build I have explored)
  • Python version: 3.6
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: Tried multiple
  • GPU model and memory: Tried multiple

Describe the current behavior
The program crashes with a TypeError as below:

TypeError: An op outside of the function building code is being passed a "Graph" tensor. It is possible to have Graph tensors leak out of the function building context by including a tf.init_scope in your function building code. For example, the following function will fail: @tf.function def has_init_scope(): my_constant = tf.constant(1.) with tf.init_scope(): added = my_constant * 2 The graph tensor has name: encoder/unified_gru/ones_like:0

This occurs when trying to backprop the gradients through the LSTM/GRU with recurrent_dropout enabled.

Describe the expected behavior
No error

Code to reproduce the issue
Since this problem shows up at the time of training, one needs to have the entire training pipeline (dataset, model etc.) setup to demonstrate this bug. As a result, I used the Neural Machine Translation tutorial from TensorFlow and modified their model to include recurrent_dropout. The entire code can be found in this Colab notebook; run the code blocks all the way till the block where we're training the model to see the bug.

Other info.logs

x---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
 in ()
      8 
      9   for (batch, (inp, targ)) in enumerate(dataset.take(steps_per_epoch)):
---> 10     batch_loss = train_step(inp, targ, enc_hidden)
     11     total_loss += batch_loss
     12 

6 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    436         # Lifting succeeded, so variables are initialized and we can run the
    437         # stateless function.
--> 438         return self._stateless_fn(*args, **kwds)
    439     else:
    440       canon_args, canon_kwds = self._canonicalize_function_inputs(args, kwds)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   1286     """Calls a graph function specialized to the inputs."""
   1287     graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
-> 1288     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   1289 
   1290   @property

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _filtered_call(self, args, kwargs)
    572     """
    573     return self._call_flat(
--> 574         (t for t in nest.flatten((args, kwargs))
    575          if isinstance(t, (ops.Tensor,
    576                            resource_variable_ops.ResourceVariable))))

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _call_flat(self, args)
    625     # Only need to override the gradient in graph mode and when we have outputs.
    626     if context.executing_eagerly() or not self.outputs:
--> 627       outputs = self._inference_function.call(ctx, args)
    628     else:
    629       self._register_gradient()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in call(self, ctx, args)
    413             attrs=("executor_type", executor_type,
    414                    "config_proto", config),
--> 415             ctx=ctx)
    416       # Replace empty list with None
    417       outputs = outputs or None

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     68     if any(ops._is_keras_symbolic_tensor(x) for x in inputs):
     69       raise core._SymbolicException
---> 70     raise e
     71   # pylint: enable=protected-access
     72   return tensors

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     58     tensors = pywrap_tensorflow.TFE_Py_Execute(ctx._handle, device_name,
     59                                                op_name, inputs, attrs,
---> 60                                                num_outputs)
     61   except core._NotOkStatusException as e:
     62     if name is not None:

TypeError: An op outside of the function building code is being passed
a "Graph" tensor. It is possible to have Graph tensors
leak out of the function building context by including a
tf.init_scope in your function building code.
For example, the following function will fail:
  @tf.function
  def has_init_scope():
    my_constant = tf.constant(1.)
    with tf.init_scope():
      added = my_constant * 2
The graph tensor has name: encoder/unified_gru/ones_like:0

Metadata

Metadata

Assignees

Labels

TF 2.0Issues relating to TensorFlow 2.0comp:kerasKeras related issuestype:bugBug

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions