Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs.. #39908

Closed
Bocharick opened this issue May 27, 2020 · 18 comments
Assignees
Labels
comp:keras Keras related issues regression issue To spot regression issues in latest version TF 2.2 Issues related to TF 2.2 type:bug Bug

Comments

@Bocharick
Copy link

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): VERSION: 2.2.0; GIT_VERSION: v2.2.0-rc4-8-g2b96f3662b
  • Python version: 3.7
  • CUDA/cuDNN version: CUDA 10.1, cuDNN 7.6.5
  • GPU model and memory: GTX 1080 8Gb; 16 Gb RAM

Describe the current behavior
My code works great on 2.1.1 but not works at 2.2.0. (Error log №1 below)
Empirically found that the problem appears if a dropout or recurrent_dropout is used in GRU layers.
Tried to change GRU to LSTM also, - same problem.
I tried to use tf.compat.v1.experimental.output_all_intermediates() True and False - has no effect.
At 2.2.0 it works ONLY if I remove dropout and reccurent_dropout options from GRU layers AND disable eager_execution with tf.compat.v1.disable_eager_execution() command.
But if I remove dropouts and eager is enabled - I have another error (Error log №2 below)

Standalone code to reproduce the issue
Test case with this problem:
https://colab.research.google.com/drive/1HUayaLsHNZ30JaBlxvLyQz7Evf1FnsD5?usp=sharing

Other info / logs
Error log №1:

---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1364     try:
-> 1365       return fn(*args)
   1366     except errors.OpError as e:

14 frames
InvalidArgumentError: Node 'training/SGD/gradients/gradients/GRU_1/while_grad/GRU_1/while_grad': Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs. Try using tf.compat.v1.experimental.output_all_intermediates(True).

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1382                     '\nsession_config.graph_options.rewrite_options.'
   1383                     'disable_meta_optimizer = True')
-> 1384       raise type(e)(node_def, op, message)
   1385 
   1386   def _extend_graph(self):

InvalidArgumentError: Node 'training/SGD/gradients/gradients/GRU_1/while_grad/GRU_1/while_grad': Connecting to invalid output 163 of source node GRU_1/while which has 163 outputs. Try using tf.compat.v1.experimental.output_all_intermediates(True).

Error log №2:

tf.keras.utils.plot_model(model)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-14-8c47125ededc> in <module>()
----> 1 tf.keras.utils.plot_model(model)

1 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/vis_utils.py in model_to_dot(model, show_shapes, show_layer_names, rankdir, expand_nested, dpi, subgraph)
    141 
    142     # Append a wrapped layer's label to node's label, if it exists.
--> 143     layer_name = layer.name
    144     class_name = layer.__class__.__name__
    145 

AttributeError: 'dict' object has no attribute 'name'
model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-1d2f84f55c4b> in <module>()
----> 1 model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)

10 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    966           except Exception as e:  # pylint:disable=broad-except
    967             if hasattr(e, "ag_error_metadata"):
--> 968               raise e.ag_error_metadata.to_exception(e)
    969             else:
    970               raise

AttributeError: in user code:

    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:571 train_function  *
        outputs = self.distribute_strategy.run(
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:951 run  **
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2290 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/distribute/distribute_lib.py:2649 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py:543 train_step  **
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:391 update_state
        self._build(y_pred, y_true)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:322 _build
        self._metrics, y_true, y_pred)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1118 map_structure_up_to
        **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1214 map_structure_with_tuple_paths_up_to
        *flat_value_lists)]
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1213 <listcomp>
        results = [func(*args, **kwargs) for args in zip(flat_path_list,
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/nest.py:1116 <lambda>
        lambda _, *values: func(*values),  # Discards the path arg.
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:421 _get_metric_objects
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:421 <listcomp>
        return [self._get_metric_object(m, y_t, y_p) for m in metrics]
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/compile_utils.py:442 _get_metric_object
        y_t_rank = len(y_t.shape.as_list())

    AttributeError: 'NoneType' object has no attribute 'shape'

@Bocharick Bocharick added the type:bug Bug label May 27, 2020
@ravikyram ravikyram added comp:keras Keras related issues regression issue To spot regression issues in latest version TF 2.2 Issues related to TF 2.2 labels May 28, 2020
@ravikyram ravikyram assigned rmothukuru and unassigned ravikyram May 28, 2020
@ravikyram
Copy link
Contributor

I tried in colab with TF 2.1.0 and i am not seeing any issue.However i am able to reproduce the issue with TF 2.2.0, nightly version (2.3.0-dev20200527).Please, find the gist here.Thanks!

@ravikyram ravikyram assigned ymodak and unassigned rmothukuru May 28, 2020
@Bocharick
Copy link
Author

Bocharick commented May 28, 2020

I tried in colab with TF 2.1.0 and i am not seeing any issue.However i am able to reproduce the issue with TF 2.2.0, nightly version (2.3.0-dev20200527).Please, find the gist here.Thanks!

I was talking about this. the bottom line is that in the new "stable" version 2.2.0, the code does not work. This is either a bug or some kind of change that I do not understand

I want to use new version 2.2.0 in all my work's codes, but this one don't want to work. So i reported this as a bug

@ymodak ymodak assigned qlzh727 and unassigned ymodak May 28, 2020
@qlzh727
Copy link
Member

qlzh727 commented Jun 1, 2020

Thanks for reporting the issue.

I think there are several issues we need to address here:

  1. The model is built with 2 inputs and 4 outputs. However, the training data only have 2 inputs and 2 outputs. This is causing the issue for Error log №2. After I removed 2 extra output when building the model, the model.fit() still failed with. I need to check with runtime team and see what's the root cause there.
2020-05-31 22:05:45.953509: W tensorflow/core/framework/op_kernel.cc:1760] OP_REQUIRES failed at variable_ops.cc:100 : Already exists: Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "/Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_15.py", line 73, in <module>
    model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1090, in fit
    tmp_logs = train_function(iterator)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7797]

Function call stack:
train_function
  1. I can also confirm that Error log №1 when tf.compat.v1.disable_eager_execution() is added with dropout and recurrent_dropout. It is probably related to runtime and how gradient is generated, which I need to confirm with runtime team again.

@qlzh727
Copy link
Member

qlzh727 commented Jun 1, 2020

Btw, I think the issue is probably related to #38906, which we have same finding for tensorflow.python.framework.errors_impl.AlreadyExistsError.

Btw, disable_eager_execution() will probably cause some side effects in our code base, since it will fallback to some legacy behavior, which might not be recommended for current user. Do you really need eager mode turned off? or you are just trying that to see if it can walk around the issue?

@Bocharick
Copy link
Author

Thanks for reporting the issue.

I think there are several issues we need to address here:

  1. The model is built with 2 inputs and 4 outputs. However, the training data only have 2 inputs and 2 outputs. This is causing the issue for Error log №2. After I removed 2 extra output when building the model, the model.fit() still failed with. I need to check with runtime team and see what's the root cause there.
2020-05-31 22:05:45.953509: W tensorflow/core/framework/op_kernel.cc:1760] OP_REQUIRES failed at variable_ops.cc:100 : Already exists: Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "/Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_15.py", line 73, in <module>
    model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1090, in fit
    tmp_logs = train_function(iterator)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7797]

Function call stack:
train_function
  1. I can also confirm that Error log №1 when tf.compat.v1.disable_eager_execution() is added with dropout and recurrent_dropout. It is probably related to runtime and how gradient is generated, which I need to confirm with runtime team again.
  1. It's OK for model to have more outputs than labels. In my case it's just two extra outputs for two previous classification outputs with simple argmax. So when I will use this trained model in production, argmax values will be counted as model output for me. Very useful.
    And when I start this training at TF 2.1.1 it has only one gentle warning:
WARNING:tensorflow:Output y_before_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_before_argmaxed.
WARNING:tensorflow:Output y_after_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_after_argmaxed.

@Bocharick
Copy link
Author

Btw, I think the issue is probably related to #38906, which we have same finding for tensorflow.python.framework.errors_impl.AlreadyExistsError.

Btw, disable_eager_execution() will probably cause some side effects in our code base, since it will fallback to some legacy behavior, which might not be recommended for current user. Do you really need eager mode turned off? or you are just trying that to see if it can walk around the issue?

No, I don't need eager mode off, I just have tried different setups for my code to work, but unsuccessful

@qlzh727
Copy link
Member

qlzh727 commented Jun 2, 2020

Thanks for reporting the issue.
I think there are several issues we need to address here:

  1. The model is built with 2 inputs and 4 outputs. However, the training data only have 2 inputs and 2 outputs. This is causing the issue for Error log №2. After I removed 2 extra output when building the model, the model.fit() still failed with. I need to check with runtime team and see what's the root cause there.
2020-05-31 22:05:45.953509: W tensorflow/core/framework/op_kernel.cc:1760] OP_REQUIRES failed at variable_ops.cc:100 : Already exists: Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "/Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_15.py", line 73, in <module>
    model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1090, in fit
    tmp_logs = train_function(iterator)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7797]

Function call stack:
train_function
  1. I can also confirm that Error log №1 when tf.compat.v1.disable_eager_execution() is added with dropout and recurrent_dropout. It is probably related to runtime and how gradient is generated, which I need to confirm with runtime team again.
  1. It's OK for model to have more outputs than labels. In my case it's just two extra outputs for two previous classification outputs with simple argmax. So when I will use this trained model in production, argmax values will be counted as model output for me. Very useful.
    And when I start this training at TF 2.1.1 it has only one gentle warning:
WARNING:tensorflow:Output y_before_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_before_argmaxed.
WARNING:tensorflow:Output y_after_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_after_argmaxed.

Ok. I guess this might be regression since we refactor the training logic a bit between 2.1 and 2.2. Currently the code is expecting each output of the model should have a matching label.

@omalleyt12

@Bocharick
Copy link
Author

Thanks for reporting the issue.
I think there are several issues we need to address here:

  1. The model is built with 2 inputs and 4 outputs. However, the training data only have 2 inputs and 2 outputs. This is causing the issue for Error log №2. After I removed 2 extra output when building the model, the model.fit() still failed with. I need to check with runtime team and see what's the root cause there.
2020-05-31 22:05:45.953509: W tensorflow/core/framework/op_kernel.cc:1760] OP_REQUIRES failed at variable_ops.cc:100 : Already exists: Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_7/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
Traceback (most recent call last):
  File "/Users/scottzhu/Library/Preferences/PyCharmCE2018.1/scratches/scratch_15.py", line 73, in <module>
    model.fit(parsed_alldata_dataset, steps_per_epoch=1000, epochs=100)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 108, in _method_wrapper
    return method(self, *args, **kwargs)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/keras/engine/training.py", line 1090, in fit
    tmp_logs = train_function(iterator)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 766, in __call__
    result = self._call(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 826, in _call
    return self._stateless_fn(*args, **kwds)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2812, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1838, in _filtered_call
    cancellation_manager=cancellation_manager)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1915, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 549, in call
    ctx=ctx)
  File "/Users/scottzhu/tf-nightly/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.AlreadyExistsError:  Resource __per_step_0/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var/N10tensorflow19TemporaryVariableOp6TmpVarE
	 [[{{node gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/functional_1/BGRU_0/forward_GRU_0/while_grad/body/_347/gradient_tape/functional_1/BGRU_0/forward_GRU_0/while/gradients/AddN_8/tmp_var}}]] [Op:__inference_train_function_7797]

Function call stack:
train_function
  1. I can also confirm that Error log №1 when tf.compat.v1.disable_eager_execution() is added with dropout and recurrent_dropout. It is probably related to runtime and how gradient is generated, which I need to confirm with runtime team again.
  1. It's OK for model to have more outputs than labels. In my case it's just two extra outputs for two previous classification outputs with simple argmax. So when I will use this trained model in production, argmax values will be counted as model output for me. Very useful.
    And when I start this training at TF 2.1.1 it has only one gentle warning:
WARNING:tensorflow:Output y_before_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_before_argmaxed.
WARNING:tensorflow:Output y_after_argmaxed missing from loss dictionary. We assume this was done on purpose. The fit and evaluate APIs will not be expecting any data to be passed to y_after_argmaxed.

Ok. I guess this might be regression since we refactor the training logic a bit between 2.1 and 2.2. Currently the code is expecting each output of the model should have a matching label.

@omalleyt12

Now I removed this outputs, and eager mode is enabled, but:

  1. tf.keras.utils.plot_model - not working
  2. model.fit - not working.

Code on colab:
https://colab.research.google.com/drive/1HUayaLsHNZ30JaBlxvLyQz7Evf1FnsD5?usp=sharing

@qlzh727
Copy link
Member

qlzh727 commented Jun 3, 2020

The model.fit is failing the same way as #38906, and we will send a fix very soon.

@qlzh727
Copy link
Member

qlzh727 commented Jun 4, 2020

Btw 80a9367 should fix the issue for training. Let me verify it when we have a new nightly PIP.

@qlzh727
Copy link
Member

qlzh727 commented Jun 5, 2020

I tested you colab with latest nightly, and it is working now. Closing this issue.

@qlzh727 qlzh727 closed this as completed Jun 5, 2020
@google-ml-butler
Copy link

Are you satisfied with the resolution of your issue?
Yes
No

@goldiegadde goldiegadde added this to Done in TensorFlow 2.3.0 Jun 5, 2020
@boluoyu
Copy link

boluoyu commented Jul 22, 2020

I tested you colab with latest nightly, and it is working now. Closing this issue.

@qlzh727 I still report an error when Using tf-nightly (2.4.0-dev20200722)

@qlzh727
Copy link
Member

qlzh727 commented Jul 22, 2020

I tested you colab with latest nightly, and it is working now. Closing this issue.

@qlzh727 I still report an error when Using tf-nightly (2.4.0-dev20200722)

Do u have the colab to repro the issue?

@boluoyu
Copy link

boluoyu commented Jul 23, 2020

I tested you colab with latest nightly, and it is working now. Closing this issue.

@qlzh727 I still report an error when Using tf-nightly (2.4.0-dev20200722)

Do u have the colab to repro the issue?

@qlzh727 here https://colab.research.google.com/drive/1XuQWSLa41BFcHlAD2S25cKa6OdxNEuAf?usp=sharing

@qlzh727
Copy link
Member

qlzh727 commented Jul 23, 2020

It seems that your code disable the eager execution, and I think the code will work if the eager is enabled. Is there any reason that you disable the eager execution?

@boluoyu
Copy link

boluoyu commented Jul 24, 2020

It seems that your code disable the eager execution, and I think the code will work if the eager is enabled. Is there any reason that you disable the eager execution?

@qlzh727 It's a long story.I want to use attention model to extract attention score.But I can't find any TF2 API to use.So I consider train model in TF2 mode and save model in TF1 mode.I am searching for a long time on net. But no use.I repro the issue on colab. If you have good suggestions, please give me some advice. Thank you. https://colab.research.google.com/drive/1hq4WWM481pcKH8JoO43gFciZrFeKe-3_?usp=sharing

edit:I've found a solution to save the model in TF2 mode .But still cause an error when disable the eager execution.

@boluoyu
Copy link

boluoyu commented Jul 27, 2020

Why disable the eager execution cause an error ? Can the issue be fixed ?
edit:I've found a solution to save the model in TF2 mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues regression issue To spot regression issues in latest version TF 2.2 Issues related to TF 2.2 type:bug Bug
Projects
Development

No branches or pull requests

6 participants