Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttentionMachanism is not compatible with Eager Execution #535

Closed
kazemnejad opened this issue Sep 21, 2019 · 8 comments · Fixed by #547
Closed

AttentionMachanism is not compatible with Eager Execution #535

kazemnejad opened this issue Sep 21, 2019 · 8 comments · Fixed by #547
Labels

Comments

@kazemnejad
Copy link
Contributor

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 18.04
  • TensorFlow version and how it was installed (source or binary): binary(2.0.0-dev20190920)
  • TensorFlow-Addons version and how it was installed (source or binary): binary(0.6.0-dev)
  • Python version: 3.6
  • Is GPU used? (yes/no): Yes

Describe the bug

In the context of eager execution, we need to re-setup the memory on each step of training. However, it seems that the current API does not provide this kind of behavior. The following code snippet is from _BaseAttentionMechanism.__call__(...) method.

if self._memory_initialized:
    if len(inputs) not in (2, 3):
        raise ValueError(
            "Expect the inputs to have 2 or 3 tensors, got %d" %
            len(inputs))
    if len(inputs) == 2:
        # We append the calculated memory here so that the graph will be
        # connected.
        inputs.append(self.values)
return super(_BaseAttentionMechanism, self).__call__(inputs, **kwargs)

As you can see, once the memory gets initialized, it assumes future inputs will be only to query the memory. Therefore the second call to this method (to re-setup the memory) will raise an error.

Other info / logs
I encountered this issue when i was working on #335

Ideas to solve:
1- Use AttetionMechanism.setup_memory(...) to re-setup the attention memory. But as far as I know, the API does not recommend this usage.
2- Set AttetionMechanism._memory_initialized to False at the beginning of the model's call method.
3- Internally fix this behavior. e.g. Change AttentionMechanism.__call__(...) to consider re-setting the memory.

@guillaumekln
Copy link
Contributor

I think 1. is currently the most explicit approach.

If I understand correctly, 2. and 3. do not seem to be compatible with step-by-step decoding, at least it would inefficient to run the memory layer at each step.

@kazemnejad
Copy link
Contributor Author

The problem I had with the first option was this comment in the AttentionMechanism.call() method:

# Return the processed memory in order to create the Keras
# connectivity data for it.

I'm not sure how using the .setup_memory() would affect this? Is it okay to entierly use this method to set up the memory instead of AttentionMechanism.call(..., setup_memory=True)?

For the 3rd option, I meant calling the method with setup_memory=True. i.e. mechanism(memory, memory_mask=mask, setup_memory=True). As far as I remember, this kind of invocation will not affect step-by-step decoding since the memory is processed once in the model's graph. Am i right with this?

Furthermore, two consecutive calls to this method will cause an error too (this is the case with the Keras training loop in which the model's .call(...) can be called multiple times.

mechanism(memory, memory_mask=mask, setup_memory=True)
mechanism(memory, memory_mask=mask, setup_memory=True)

will raise an error due to the following condition:

 if self._memory_initialized:
            if len(inputs) not in (2, 3):
                      (...)

@kazemnejad
Copy link
Contributor Author

Hi @guillaumekln,
Did you have time to investigate this issue any further? Thanks.

@guillaumekln
Copy link
Contributor

Sorry I did not. Maybe @qlzh727 has more Keras expertise to advise on this subject.

@qlzh727
Copy link
Member

qlzh727 commented Sep 27, 2019

@kazemnejad, in the eager context, why do we need to "re-setup the memory on each step of training"? Does the memory change after each step?

If we need to diverge the behavior between eager and graph context, we could do this within the call body.

@kazemnejad
Copy link
Contributor Author

kazemnejad commented Sep 27, 2019

To be clear, by the step, I meant one step of training (one batch flowing through the model's graph), not one step of dynamic decoding in RNNs.

I hypothesize that in the eager execution, the graph is dynamic so there are no symbolic tensors; tensors are created eagerly upon OPS invocation. This means that, if we don't call the .setup_memory() on each run, the AttentionMechanism will simply hold a reference to the EagerTensor created in the previous step. Moreover, I empirically saw this behavior when I created and trained a Keras model with run_eagerly=True argument.

@qlzh727
Copy link
Member

qlzh727 commented Sep 27, 2019

I see. "run_eagerly=True" is more like a debug mode where all the tensor input/output to layer will just be eager tensor (numpy array like). It will have a bad performance, but will allow user to debug and trace the numeric value if needed.

In that case, I agree that the cached value for memory will be incorrect, and should be reset/populated per batch. I think the call() should take that into consideration, which is the option 3 you stated above.

@qlzh727
Copy link
Member

qlzh727 commented Sep 27, 2019

sorry, I mean __call__, somehow github was converting it to call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants