New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttentionMachanism is not compatible with Eager Execution #535
Comments
I think 1. is currently the most explicit approach. If I understand correctly, 2. and 3. do not seem to be compatible with step-by-step decoding, at least it would inefficient to run the memory layer at each step. |
The problem I had with the first option was this comment in the
I'm not sure how using the For the 3rd option, I meant calling the method with Furthermore, two consecutive calls to this method will cause an error too (this is the case with the Keras training loop in which the model's mechanism(memory, memory_mask=mask, setup_memory=True)
mechanism(memory, memory_mask=mask, setup_memory=True) will raise an error due to the following condition: if self._memory_initialized:
if len(inputs) not in (2, 3):
(...) |
Hi @guillaumekln, |
Sorry I did not. Maybe @qlzh727 has more Keras expertise to advise on this subject. |
@kazemnejad, in the eager context, why do we need to "re-setup the memory on each step of training"? Does the memory change after each step? If we need to diverge the behavior between eager and graph context, we could do this within the call body. |
To be clear, by the step, I meant one step of training (one batch flowing through the model's graph), not one step of dynamic decoding in RNNs. I hypothesize that in the eager execution, the graph is dynamic so there are no symbolic tensors; tensors are created eagerly upon OPS invocation. This means that, if we don't call the |
I see. "run_eagerly=True" is more like a debug mode where all the tensor input/output to layer will just be eager tensor (numpy array like). It will have a bad performance, but will allow user to debug and trace the numeric value if needed. In that case, I agree that the cached value for memory will be incorrect, and should be reset/populated per batch. I think the call() should take that into consideration, which is the option 3 you stated above. |
sorry, I mean |
System information
Describe the bug
In the context of eager execution, we need to re-setup the memory on each step of training. However, it seems that the current API does not provide this kind of behavior. The following code snippet is from
_BaseAttentionMechanism.__call__(...)
method.As you can see, once the memory gets initialized, it assumes future inputs will be only to query the memory. Therefore the second call to this method (to re-setup the memory) will raise an error.
Other info / logs
I encountered this issue when i was working on #335
Ideas to solve:
1- Use
AttetionMechanism.setup_memory(...)
to re-setup the attention memory. But as far as I know, the API does not recommend this usage.2- Set
AttetionMechanism._memory_initialized
to False at the beginning of the model's call method.3- Internally fix this behavior. e.g. Change
AttentionMechanism.__call__(...)
to consider re-setting the memory.The text was updated successfully, but these errors were encountered: