Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add Attention is All you need transformer and Translation example #422

Closed
wants to merge 35 commits into from

Conversation

andr-ec
Copy link

@andr-ec andr-ec commented Mar 26, 2020

This pull request adds the transformer from "attention is all you need"/"the annotated transformer" into the models folder. And adds a translation example for WMT'14 English-German data.
I'd love to hear any feedback, and any help would be great!

This would close #148

TODO:

  • implement transformer
  • add dataset
  • working training loop
  • clean up comments
  • split dataset into eval and add eval into training loop
  • add testing dataset and test loop

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@andr-ec
Copy link
Author

andr-ec commented Mar 26, 2020

@googlebot I signed it!

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@brettkoonce
Copy link
Contributor

SUGOI DESU NE

import TensorFlow

///// Input to an attention layer.
//public struct AttentionInput<Scalar: TensorFlowFloatingPoint>: Differentiable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might clean this out if it's not in use


struct WMTTranslationTask {
// https://nlp.stanford.edu/projects/nmt/
// WMT'14 English-German data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WMT == 💪!

@texasmichelle
Copy link
Member

Thank you for putting this together! A translation example would indeed be a fantastic addition to the repo.

Between GPT-2 and BERT, we're trying to get a handle on the code duplication among transformer-based models. We'd like to reuse code among common concepts, which a translation example could also utilize. If you're up for it, we could use some help with #436 before compounding the problem by adding more transformer code.

Breaking this up into smaller PRs would also be helpful, such as the standalone dataset or at least leveraging existing components (because how many separate implementations of multi-head attention do we really need? Apropos multi-head attention, it appears you could easily use existing libraries since the code looks nearly identical, but your name is listed as author in the header so maybe there are some major changes).

@marcrasi
Copy link
Contributor

Hi, there hasn't been activity on this PR for a while. We're going to close it to keep our list of open PRs small. Of course, feel free to reopen any time if you have time to come back and work on this!

@marcrasi marcrasi closed this Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translation example
6 participants