Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in nmt_with_attention, the gru in decoder is not connected, last step state is not passed to this step #38248

Closed
hihell opened this issue Apr 5, 2020 · 10 comments · Fixed by tensorflow/text#626
Assignees
Labels
comp:keras Keras related issues type:docs-bug Document issues

Comments

@hihell
Copy link

hihell commented Apr 5, 2020

https://www.tensorflow.org/tutorials/text/nmt_with_attention

the decoder is trained step by step, and it's not passing last step state to this step

# passing the concatenated vector to the GRU
output, state = self.gru(x)

is this a feature or a bug? I checked a lot of NMT with attention paper, unlike the document those decoder are connected.

thanks in advance!

@hihell hihell added the type:docs-bug Document issues label Apr 5, 2020
@jvishnuvardhan jvishnuvardhan added the comp:keras Keras related issues label Apr 7, 2020
@ManishAradwad
Copy link
Contributor

Hello, I'd like to work on this issue. Can you plz give me some pointers on where I should start?

@MarkDaoust
Copy link
Member

I think you're right.

@ManishAradwad if you want to take a shot at a fix, I think all it needs is a output, state = self.gru(x, initial_state=hidden)

@yashk2810
Copy link
Member

yashk2810 commented Apr 9, 2020

Be careful with that. The last time I did that, the attention plots didn't come out right. So please test and see if the attention plots remain similar to what they are now.

The x is being created from the hidden_state. See the attention equations in the code.

@hihell
Copy link
Author

hihell commented Apr 9, 2020

@MarkDaoust
I agree with @yashk2810
I did exact the same with initial_state=hidden and the attention is strange, the train loss is not converging after 100 epochs (it goes down at first then goes up, ended at 0.2)

@MarkDaoust
Copy link
Member

Thanks for taking a look @hihell

the attention is strange

Can you be more specific?

I'll dive in and try to figure out what's wrong.

@ManishAradwad
Copy link
Contributor

I tried re-running the notebook with initial_state=hidden, the loss after 10 epochs is 0.076(Previously it was 0.0958). The attention plots are not exactly same but the translations are more or less similar.
For ex. Previously try to find out is now try to figure it out
Are these significant differences?

@hihell
Copy link
Author

hihell commented Apr 10, 2020

@MarkDaoust
to show strange attention the experiment use 60k samples, 10 epochs

gru(x) train loss 0.08
attention_1586504203_60k_10_tloss_0 08_stateless

gru(x, initial_state=hidden) train loss 0.09
bHJwxUMy3rQM8ikM__thumbnail

@hihell
Copy link
Author

hihell commented Apr 18, 2020

@MarkDaoust
hi, this there any update?

@MarkDaoust
Copy link
Member

I'm working on this now.

TensorFlow-Docs-Copybara pushed a commit to tensorflow/docs that referenced this issue Apr 23, 2021
- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes tensorflow/tensorflow#38248

PiperOrigin-RevId: 370168273
@nikitamaia
Copy link
Member

Fixed in 9e18593

tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 370250185
tf-text-github-robot pushed a commit to tensorflow/text that referenced this issue May 25, 2021
For TF2.5

- Use the TextVectorization layer.
- Use the AdditiveAttention layer.
- tf.function the translate loop for text->text export.
- Add more inline explanations, and sanity checks.
- Add shape assertions throughout the code to make it easier to follow.

Fixes: tensorflow/tensorflow#38248
Fixes: tensorflow/tensorflow#39654
See also: tensorflow/tensorflow#49237
PiperOrigin-RevId: 375597559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:keras Keras related issues type:docs-bug Document issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants