tensorflow · MarkDaoust · Jun 19, 2018 · Jun 18, 2018 · Jun 18, 2018 · Jun 18, 2018
diff --git a/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb b/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb
@@ -18,7 +18,8 @@
         }
       ],
       "private_outputs": true,
-      "collapsed_sections": []
+      "collapsed_sections": [],
+      "toc_visible": true
     },
     "kernelspec": {
       "name": "python3",
@@ -41,7 +42,7 @@
         "# Neural Machine Translation with Attention\n",
         "\n",
         "<table align=\"left\"><td>\n",
-        "<a target=\"_blank\"  href=\"https://colab.sandbox.google.com/github/tensorflow/tensorflow/tree/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb\">\n",
+        "<a target=\"_blank\"  href=\"https://colab.sandbox.google.com/github/tensorflow/tensorflow/blob/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb\">\n",
         "    <img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>  \n",
         "</td><td>\n",
         "<a target=\"_blank\"  href=\"https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/eager/python/examples/nmt_with_attention/NMT_with_Attention.ipynb\"><img width=32px src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View source on Github</a></td></table>"
@@ -476,12 +477,12 @@
         "\n",
         "Pseudo-code:\n",
         "\n",
-        "  1. *score = FC(tanh(FC(EO) + FC(H)))*\n",
-        "  2. *attention weights = softmax(score, axis = 1)*. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, hidden_size)*. Max_length is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
-        "  3. *context vector = sum(attention weights * EO, axis = 1)*. Same reason as above for choosing axis as 1.\n",
-        "  4. *embedding output = The input to the decoder X is passed through an embedding layer.*\n",
-        "  5. *merged vector = concat(embedding output, context vector)*\n",
-        "  6. *This merged vector is then given to the GRU*\n",
+        "  * score = FC(tanh(FC(EO) + FC(H)))*\n",
+        "  * attention weights = softmax(score, axis = 1)*. Softmax by default is applied on the last axis but here we want to apply it on the *1st axis*, since the shape of score is *(batch_size, max_length, hidden_size)*. Max_length is the length of our input. Since we are trying to assign a weight to each input, softmax should be applied on that axis.\n",
+        "  * context vector = sum(attention weights * EO, axis = 1)*. Same reason as above for choosing axis as 1.\n",
+        "  * embedding output = The input to the decoder X is passed through an embedding layer.*\n",
+        "  * merged vector = concat(embedding output, context vector)*\n",
+        "  * This merged vector is then given to the GRU*\n",
         "  \n",
         "The shapes of all the vectors at each step have been specified in the comments in the code.\n",
         "  \n",
@@ -646,7 +647,7 @@
       },
       "cell_type": "markdown",
       "source": [
-        "## Step 5: Define the optimizers and the loss function"
+        "## Define the optimizers and the loss function"
       ]
     },
     {
@@ -695,7 +696,7 @@
       },
       "cell_type": "markdown",
       "source": [
-        "## Step 6: Training\n",
+        "## Training\n",
         "\n",
         "* Here we pass the input through the encoder which return *encoder output* and the *encoder hidden state*.\n",
         "* The encoder output, encoder hidden state and the decoder input (which is the \"start\" token) is passed to the decoder.\n",
@@ -790,7 +791,7 @@
       },
       "cell_type": "markdown",
       "source": [
-        "## Step 7: Translate\n",
+        "## Translate\n",
         "\n",
         "* The evaluate function is similar to the training loop. The only change is that we don't use teacher forcing here. The input to the decoder at each time step is its previous predictions along with the hidden state and the encoder output.\n",
         "* We stop predicting when the model predicts the *'end' token*.\n",