in nmt_with_attention, the gru in decoder is not connected, last step state is not passed to this step #38248

hihell · 2020-04-05T19:05:23Z

https://www.tensorflow.org/tutorials/text/nmt_with_attention

the decoder is trained step by step, and it's not passing last step state to this step

# passing the concatenated vector to the GRU
output, state = self.gru(x)

is this a feature or a bug? I checked a lot of NMT with attention paper, unlike the document those decoder are connected.

thanks in advance!

The text was updated successfully, but these errors were encountered:

ManishAradwad · 2020-04-09T09:19:22Z

Hello, I'd like to work on this issue. Can you plz give me some pointers on where I should start?

MarkDaoust · 2020-04-09T17:10:04Z

I think you're right.

@ManishAradwad if you want to take a shot at a fix, I think all it needs is a output, state = self.gru(x, initial_state=hidden)

yashk2810 · 2020-04-09T17:13:21Z

Be careful with that. The last time I did that, the attention plots didn't come out right. So please test and see if the attention plots remain similar to what they are now.

The x is being created from the hidden_state. See the attention equations in the code.

hihell · 2020-04-09T18:29:54Z

@MarkDaoust
I agree with @yashk2810
I did exact the same with initial_state=hidden and the attention is strange, the train loss is not converging after 100 epochs (it goes down at first then goes up, ended at 0.2)

MarkDaoust · 2020-04-09T21:52:31Z

Thanks for taking a look @hihell

the attention is strange

Can you be more specific?

I'll dive in and try to figure out what's wrong.

ManishAradwad · 2020-04-10T06:40:02Z

I tried re-running the notebook with initial_state=hidden, the loss after 10 epochs is 0.076(Previously it was 0.0958). The attention plots are not exactly same but the translations are more or less similar.
For ex. Previously try to find out is now try to figure it out
Are these significant differences?

hihell · 2020-04-10T06:58:42Z

@MarkDaoust
to show strange attention the experiment use 60k samples, 10 epochs

gru(x) train loss 0.08

gru(x, initial_state=hidden) train loss 0.09

hihell · 2020-04-18T11:41:32Z

@MarkDaoust
hi, this there any update?

MarkDaoust · 2020-06-23T15:46:36Z

I'm working on this now.

- Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes tensorflow/tensorflow#38248 PiperOrigin-RevId: 370168273

nikitamaia · 2021-04-23T23:15:05Z

Fixed in 9e18593

For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 370250185

For TF2.5 - Use the TextVectorization layer. - Use the AdditiveAttention layer. - tf.function the translate loop for text->text export. - Add more inline explanations, and sanity checks. - Add shape assertions throughout the code to make it easier to follow. Fixes: tensorflow/tensorflow#38248 Fixes: tensorflow/tensorflow#39654 See also: tensorflow/tensorflow#49237 PiperOrigin-RevId: 375597559

hihell added the type:docs-bug Document issues label Apr 5, 2020

google-ml-butler bot assigned ravikyram Apr 5, 2020

saikumarchalla assigned saikumarchalla and jvishnuvardhan and unassigned ravikyram and saikumarchalla Apr 6, 2020

jvishnuvardhan assigned MarkDaoust and unassigned jvishnuvardhan Apr 7, 2020

jvishnuvardhan added the comp:keras Keras related issues label Apr 7, 2020

ManishAradwad mentioned this issue Apr 10, 2020

Added initial_state=hidden tensorflow/docs#1537

Closed

nikitamaia closed this as completed Apr 23, 2021

tf-text-github-robot mentioned this issue May 25, 2021

Update nmt_with_attention tensorflow/text#626

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

in nmt_with_attention, the gru in decoder is not connected, last step state is not passed to this step #38248

in nmt_with_attention, the gru in decoder is not connected, last step state is not passed to this step #38248

hihell commented Apr 5, 2020 •

edited

ManishAradwad commented Apr 9, 2020

MarkDaoust commented Apr 9, 2020

yashk2810 commented Apr 9, 2020 •

edited

hihell commented Apr 9, 2020 •

edited

MarkDaoust commented Apr 9, 2020

ManishAradwad commented Apr 10, 2020

hihell commented Apr 10, 2020 •

edited

hihell commented Apr 18, 2020

MarkDaoust commented Jun 23, 2020

nikitamaia commented Apr 23, 2021

in nmt_with_attention, the gru in decoder is not connected, last step state is not passed to this step #38248

in nmt_with_attention, the gru in decoder is not connected, last step state is not passed to this step #38248

Comments

hihell commented Apr 5, 2020 • edited

ManishAradwad commented Apr 9, 2020

MarkDaoust commented Apr 9, 2020

yashk2810 commented Apr 9, 2020 • edited

hihell commented Apr 9, 2020 • edited

MarkDaoust commented Apr 9, 2020

ManishAradwad commented Apr 10, 2020

hihell commented Apr 10, 2020 • edited

hihell commented Apr 18, 2020

MarkDaoust commented Jun 23, 2020

nikitamaia commented Apr 23, 2021

hihell commented Apr 5, 2020 •

edited

yashk2810 commented Apr 9, 2020 •

edited

hihell commented Apr 9, 2020 •

edited

hihell commented Apr 10, 2020 •

edited