AttentionMachanism is not compatible with Eager Execution #535

kazemnejad · 2019-09-21T11:24:15Z

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
TensorFlow version and how it was installed (source or binary): binary(2.0.0-dev20190920)
TensorFlow-Addons version and how it was installed (source or binary): binary(0.6.0-dev)
Python version: 3.6
Is GPU used? (yes/no): Yes

Describe the bug

In the context of eager execution, we need to re-setup the memory on each step of training. However, it seems that the current API does not provide this kind of behavior. The following code snippet is from _BaseAttentionMechanism.__call__(...) method.

if self._memory_initialized:
    if len(inputs) not in (2, 3):
        raise ValueError(
            "Expect the inputs to have 2 or 3 tensors, got %d" %
            len(inputs))
    if len(inputs) == 2:
        # We append the calculated memory here so that the graph will be
        # connected.
        inputs.append(self.values)
return super(_BaseAttentionMechanism, self).__call__(inputs, **kwargs)

As you can see, once the memory gets initialized, it assumes future inputs will be only to query the memory. Therefore the second call to this method (to re-setup the memory) will raise an error.

Other info / logs
I encountered this issue when i was working on #335

Ideas to solve:
1- Use AttetionMechanism.setup_memory(...) to re-setup the attention memory. But as far as I know, the API does not recommend this usage.
2- Set AttetionMechanism._memory_initialized to False at the beginning of the model's call method.
3- Internally fix this behavior. e.g. Change AttentionMechanism.__call__(...) to consider re-setting the memory.

The text was updated successfully, but these errors were encountered:

guillaumekln · 2019-09-23T13:12:00Z

I think 1. is currently the most explicit approach.

If I understand correctly, 2. and 3. do not seem to be compatible with step-by-step decoding, at least it would inefficient to run the memory layer at each step.

kazemnejad · 2019-09-23T14:03:25Z

The problem I had with the first option was this comment in the AttentionMechanism.call() method:

# Return the processed memory in order to create the Keras
# connectivity data for it.

I'm not sure how using the .setup_memory() would affect this? Is it okay to entierly use this method to set up the memory instead of AttentionMechanism.call(..., setup_memory=True)?

For the 3rd option, I meant calling the method with setup_memory=True. i.e. mechanism(memory, memory_mask=mask, setup_memory=True). As far as I remember, this kind of invocation will not affect step-by-step decoding since the memory is processed once in the model's graph. Am i right with this?

Furthermore, two consecutive calls to this method will cause an error too (this is the case with the Keras training loop in which the model's .call(...) can be called multiple times.

mechanism(memory, memory_mask=mask, setup_memory=True)
mechanism(memory, memory_mask=mask, setup_memory=True)

will raise an error due to the following condition:

 if self._memory_initialized:
            if len(inputs) not in (2, 3):
                      (...)

kazemnejad · 2019-09-27T01:35:07Z

Hi @guillaumekln,
Did you have time to investigate this issue any further? Thanks.

guillaumekln · 2019-09-27T08:25:18Z

Sorry I did not. Maybe @qlzh727 has more Keras expertise to advise on this subject.

qlzh727 · 2019-09-27T17:15:21Z

@kazemnejad, in the eager context, why do we need to "re-setup the memory on each step of training"? Does the memory change after each step?

If we need to diverge the behavior between eager and graph context, we could do this within the call body.

kazemnejad · 2019-09-27T18:11:33Z

To be clear, by the step, I meant one step of training (one batch flowing through the model's graph), not one step of dynamic decoding in RNNs.

I hypothesize that in the eager execution, the graph is dynamic so there are no symbolic tensors; tensors are created eagerly upon OPS invocation. This means that, if we don't call the .setup_memory() on each run, the AttentionMechanism will simply hold a reference to the EagerTensor created in the previous step. Moreover, I empirically saw this behavior when I created and trained a Keras model with run_eagerly=True argument.

qlzh727 · 2019-09-27T23:01:30Z

I see. "run_eagerly=True" is more like a debug mode where all the tensor input/output to layer will just be eager tensor (numpy array like). It will have a bad performance, but will allow user to debug and trace the numeric value if needed.

In that case, I agree that the cached value for memory will be incorrect, and should be reset/populated per batch. I think the call() should take that into consideration, which is the option 3 you stated above.

qlzh727 · 2019-09-27T23:03:02Z

sorry, I mean __call__, somehow github was converting it to call.

guillaumekln added the seq2seq label Sep 23, 2019

kazemnejad mentioned this issue Sep 28, 2019

Allow manual memory reset in AttentionMechanism #547

Merged

guillaumekln closed this as completed in #547 Sep 30, 2019

bhack mentioned this issue May 30, 2020

Errors when using tf.keras.Input & tfa.seq2seq in eager mode #1898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttentionMachanism is not compatible with Eager Execution #535

AttentionMachanism is not compatible with Eager Execution #535

kazemnejad commented Sep 21, 2019

guillaumekln commented Sep 23, 2019

kazemnejad commented Sep 23, 2019

kazemnejad commented Sep 27, 2019

guillaumekln commented Sep 27, 2019

qlzh727 commented Sep 27, 2019

kazemnejad commented Sep 27, 2019 •

edited

qlzh727 commented Sep 27, 2019

qlzh727 commented Sep 27, 2019 •

edited

AttentionMachanism is not compatible with Eager Execution #535

AttentionMachanism is not compatible with Eager Execution #535

Comments

kazemnejad commented Sep 21, 2019

guillaumekln commented Sep 23, 2019

kazemnejad commented Sep 23, 2019

kazemnejad commented Sep 27, 2019

guillaumekln commented Sep 27, 2019

qlzh727 commented Sep 27, 2019

kazemnejad commented Sep 27, 2019 • edited

qlzh727 commented Sep 27, 2019

qlzh727 commented Sep 27, 2019 • edited

kazemnejad commented Sep 27, 2019 •

edited

qlzh727 commented Sep 27, 2019 •

edited