New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: How to Access Attention Weights of Attention Wrapper #11067
Comments
Thanks for the issue, but this question looks like a feature request that belongs in tensorboard |
@aselle , perhaps the second question belongs to TensorBoard, but I don't think that the first does at all. Even if TensorBoard had the visualization capability I'm looking for, if I wanted to add a summary to visualize the attention weights, I wouldn't know how to access the weights to add them as summaries. I think that either an example or additional documentation would be useful and relevant to TensorFlow, not TensorBoard. |
@aselle If you have time, maybe you could answer the question on Stack Overflow? It's clear I'm not the only person struggling to access the AttentionWrapper weights. |
@ebrevdo , you were helpful on an earlier thread. If you have the time, could you please provide an answer to my first question of how to access the attention mechanism's weights? |
For anyone else wondering, you can access the alignments by setting
|
Your approach is the right one. We don't enable alignments history by
default because:
1. It requires extra memory
2. It's not always possible, i.e. when using a beam search decoder.
…On Jun 27, 2017 9:59 AM, "Rylan Schaeffer" ***@***.***> wrote:
@ebrevdo <https://github.com/ebrevdo> , you were helpful on an earlier
thread. If you have the time, could you please provide an answer to my
first question of how to access the attention mechanism's weights?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#11067 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABtim_DZWf-z4I6VuuBQEv9utxrueGygks5sITT3gaJpZM4OF1_h>
.
|
@ebrevdo , can you clarify what Also, to confirm, the |
@ebrevdo , I'm having a strange problem with my Sequence to Sequence model's alignment history. My model's alignment values are initially uniformly distributed (~0.002, for encoder outputs of 500 steps), which makes sense. However, the alignment values remain roughly the same (~0.002) even after training, despite the fact that my model's accuracy climbs from chance (25%) to 100%. My problem is structured in a way that the only way to do well at it is for the decoder to learn to pay attention. I have no idea what might be causing this - has anyone experienced anything similar to this? Alternatively, does anyone have suggestions for debugging this issue? Edit: I'm posting this on Stack Overflow. @aselle , if you know the answer, or know someone who knows, I'd appreciate it! |
@oahziur , do you have an idea of what might be causing my problem? |
I thought I'd add more information and code, in case that can help someone help me. As background, both my inputs and labeled outputs at each time step are vectors of shape I reduced my model to a single layer encoder, single layer decoder to eliminate any mistake I might be making with multi-layered architectures. The encoder is a bidirectional RNN. At the start of training, my During training, my model converges to 100% classification accuracy on both training and validation data, as shown below. The model never sees the same training data twice, so I'm 99% confident that the model isn't memorizing the training data. However, after training, the values of alignment_history effectively haven't changed; the values now look randomly chosen from between 0.00185 and 0.00219. My code is relatively straightforward. I have a class encapsulating my model. One method instantiates a RNN cell:
I have one method for building the encoder:
I similarly have another method for building the decoder:
I use both of these methods, and then project the output of the decoder to the same dimensionality as my labels.
|
Most of my code was written before the NMT tutorial was released, so I read the code and then stepped through it, but I can't find any glaring differences. I do have a couple of additional questions.
|
@lmthang , if you could help, I'd really appreciate it! |
OS: macOS Sierra version 10.12.5
TensorFlow Version: v1.2.0-rc2-21-g12f033d 1.2.0
This is a two-part request related to
tensorflow.contrib.seq2seq
. I would like the ability to visualize the attention weights of theAttentionWrapper
, but I'm hampered by the lack of examples and I'm struggling to infer the input forBahdanauAttention
's__call__
method's argumentprevious_alignments
.First, could someone clarify how to access the attention weights?
Second, would it be possible to add some tool that visualizes the attention weights (possibly to TensorBoard)?
The text was updated successfully, but these errors were encountered: