Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teacher-forcing in the Transformer tutorial #30852

Closed
UltraSpecialException opened this issue Jul 18, 2019 · 2 comments

Comments

@UltraSpecialException
Copy link

commented Jul 18, 2019

Thank you for submitting a TensorFlow documentation issue. Per our GitHub
policy, we only address code/doc bugs, performance issues, feature requests, and
build/installation issues on GitHub.

The TensorFlow docs are open source! To get involved, read the documentation
contributor guide: https://www.tensorflow.org/community/contribute/docs

Please provide a link to the documentation entry, for example:
https://www.tensorflow.org/beta/tutorials/text/transformer#training_and_checkpointing

Description of issue (what needs changing):

Teacher-forcing seems to not be implemented?

Clear description

The documentation here mentions that the training uses teacher-forcing, however, it doesn't seem like, with the code shown, that this is implemented. The variable tar_real is the true outputs, but it seems to only be used for loss and accuracy computations?

Please let me know if I'm making a mistake here! Thanks in advance.

@oanush oanush self-assigned this Jul 19, 2019

@oanush oanush added the comp:model label Jul 19, 2019

@oanush oanush assigned ymodak and unassigned oanush Jul 19, 2019

@tlkh

This comment has been minimized.

Copy link
Contributor

commented Jul 21, 2019

Each train_step takes in inp and tar objects from the dataset in the training loop. Teacher forcing is indeed used since the correct example from the dataset is always used as input during training (as opposed to the "incorrect" output from the previous training step):

  • tar is split into tar_inp, tar_real (offset by one character)
  • inp, tar_inp is used as input to the model
  • model produces an output which is compared with tar_real to calculate loss
  • model output is discarded (not used anymore)
  • repeat loop

Teacher forcing is a procedure ... in which during training the model receives the ground truth output y(t) as input at time t+1.
Page 372, Deep Learning, 2016.

@UltraSpecialException

This comment has been minimized.

Copy link
Author

commented Jul 22, 2019

@tlkh Thanks for the response! I see what you mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.