Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing the colab link to the right one #20104

Merged
merged 4 commits into from
Jun 19, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,8 @@
}
],
"private_outputs": true,
"collapsed_sections": []
"collapsed_sections": [],
"toc_visible": true
},
"kernelspec": {
"name": "python3",
Expand All @@ -41,7 +42,7 @@
"# Neural Machine Translation with Attention\n",
"\n",
"<table align=\"left\"><td>\n",
"<a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/tensorflow/tensorflow/tree/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb\">\n",
"<a target=\"_blank\" href=\"https://colab.sandbox.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb\">\n",
" <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a> \n",
"</td><td>\n",
"<a target=\"_blank\" href=\"https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on Github</a></td></table>"
Expand Down Expand Up @@ -476,12 +477,12 @@
"\n",
"Pseudo-code:\n",
"\n",
" 1. *score = FC(tanh(FC(EO) + FC(H)))*\n",
" 2. *attention weights = softmax(score, axis = 1)*. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, hidden_size)*. Max_length is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
" 3. *context vector = sum(attention weights * EO, axis = 1)*. Same reason as above for choosing axis as 1.\n",
" 4. *embedding output = The input to the decoder X is passed through an embedding layer.*\n",
" 5. *merged vector = concat(embedding output, context vector)*\n",
" 6. *This merged vector is then given to the GRU*\n",
" * score = FC(tanh(FC(EO) + FC(H)))*\n",
" * attention weights = softmax(score, axis = 1)*. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, hidden_size)*. Max_length is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
" * context vector = sum(attention weights * EO, axis = 1)*. Same reason as above for choosing axis as 1.\n",
" * embedding output = The input to the decoder X is passed through an embedding layer.*\n",
" * merged vector = concat(embedding output, context vector)*\n",
" * This merged vector is then given to the GRU*\n",
" \n",
"The shapes of all the vectors at each step have been specified in the comments in the code.\n",
" \n",
Expand Down Expand Up @@ -646,7 +647,7 @@
},
"cell_type": "markdown",
"source": [
"## Step 5: Define the optimizers and the loss function"
"## Define the optimizers and the loss function"
]
},
{
Expand Down Expand Up @@ -695,7 +696,7 @@
},
"cell_type": "markdown",
"source": [
"## Step 6: Training\n",
"## Training\n",
"\n",
"* Here we pass the input through the encoder which return *encoder output* and the *encoder hidden state*.\n",
"* The encoder output, encoder hidden state and the decoder input (which is the \"start\" token) is passed to the decoder.\n",
Expand Down Expand Up @@ -790,7 +791,7 @@
},
"cell_type": "markdown",
"source": [
"## Step 7: Translate\n",
"## Translate\n",
"\n",
"* The evaluate function is similar to the training loop. The only change is that we don't use teacher forcing here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n",
"* We stop predicting when the model predicts the *'end' token*.\n",
Expand Down