From 792f314da7d21678a536dd9833fec0d2216d080f Mon Sep 17 00:00:00 2001
From: Stefan Schweter <stefan@schweter.it>
Date: Sat, 3 Mar 2018 18:42:00 +0100
Subject: [PATCH 1/4] readme: formatting fix

---
 README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.md b/README.md
index 755d080b6..cd9ab9331 100644
--- a/README.md
+++ b/README.md
@@ -154,7 +154,7 @@ For all translation problems, we suggest to try the Transformer model:
 this should reach a BLEU score of about 28 on the English-German data-set,
 which is close to state-of-the art. If training on a single GPU, try the
 `--hparams_set=transformer_base_single_gpu` setting. For very good results
-or larger data-sets (e.g., for English-French)m, try the big model
+or larger data-sets (e.g., for English-French), try the big model
 with `--hparams_set=transformer_big`.
 
 ## Basics

From 6c4ef8109e27d51cbb9e70c949d9062661ea612b Mon Sep 17 00:00:00 2001
From: Stefan Schweter <stefan@schweter.it>
Date: Sat, 3 Mar 2018 18:42:21 +0100
Subject: [PATCH 2/4] docs: formatting fix

---
 docs/index.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/index.md b/docs/index.md
index 8860e03b7..b7d0236c9 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -69,7 +69,7 @@ For language modeling, we have these data-sets in T2T:
 * LM1B (a billion-word corpus): `--problems=languagemodel_lm1b32k` for
     subword-level modeling and `--problems=languagemodel_lm1b_characters`
     for character-level modeling.
-    
+
 We suggest to start with `--model=transformer` on this task and use
 `--hparams_set=transformer_small` for PTB and
 `--hparams_set=transformer_base` for LM1B.
@@ -95,7 +95,7 @@ For speech-to-text, we have these data-sets in T2T:
 For summarizing longer text into shorter one we have these data-sets:
 * CNN/DailyMail articles summarized into a few sentences:
   `--problems=summarize_cnn_dailymail32k`
-  
+
 We suggest to use `--model=transformer` and
 `--hparams_set=transformer_prepend` for this task.
 This yields good ROUGE scores.
@@ -118,5 +118,5 @@ For all translation problems, we suggest to try the Transformer model:
 this should reach a BLEU score of about 28 on the English-German data-set,
 which is close to state-of-the art. If training on a single GPU, try the
 `--hparams_set=transformer_base_single_gpu` setting. For very good results
-or larger data-sets (e.g., for English-French)m, try the big model
+or larger data-sets (e.g., for English-French), try the big model
 with `--hparams_set=transformer_big`.

From 7b2791800f675306592f802d42a499c320a4cc32 Mon Sep 17 00:00:00 2001
From: Stefan Schweter <stefan@schweter.it>
Date: Sat, 3 Mar 2018 18:42:44 +0100
Subject: [PATCH 3/4] lstm: minor spelling fixes

---
 tensor2tensor/models/lstm.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tensor2tensor/models/lstm.py b/tensor2tensor/models/lstm.py
index 9c6144e9f..d05c1f599 100644
--- a/tensor2tensor/models/lstm.py
+++ b/tensor2tensor/models/lstm.py
@@ -65,7 +65,7 @@ def dropout_lstm_cell():
     attention_mechanism_class = tf.contrib.seq2seq.BahdanauAttention
   else:
     raise ValueError("Unknown hparams.attention_mechanism = %s, must be "
-                     "luong or bahdanu." % hparams.attention_mechanism)
+                     "luong or bahdanau." % hparams.attention_mechanism)
   attention_mechanism = attention_mechanism_class(
       hparams.hidden_size, encoder_outputs)
 
@@ -338,7 +338,7 @@ def lstm_attention():
 
 @registry.register_hparams
 def lstm_bahdanau_attention_multi():
-  """Multi-head Bahdanu attention."""
+  """Multi-head Bahdanau attention."""
   hparams = lstm_bahdanau_attention()
   hparams.num_heads = 4
   return hparams

From 033b667539cbff17f39c8473d65599f2f65a274a Mon Sep 17 00:00:00 2001
From: Stefan Schweter <stefan@schweter.it>
Date: Sat, 3 Mar 2018 18:43:05 +0100
Subject: [PATCH 4/4] distributed: reference to tf.contrib.learn is deprecated.
 Link to tf.estimator.RunConfig is used instead. Introduced with
 https://github.com/tensorflow/tensorflow/commit/c7caa2d87daa37b66811ac99f997ad02acd4ecc8

---
 docs/distributed_training.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/distributed_training.md b/docs/distributed_training.md
index 9ed9778da..95b499f87 100644
--- a/docs/distributed_training.md
+++ b/docs/distributed_training.md
@@ -5,7 +5,7 @@ training.
 
 T2T uses TensorFlow Estimators and so distributed training is configured with
 the `TF_CONFIG` environment variable that is read by the
-[RunConfig](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/learn/python/learn/estimators/run_config.py)
+[RunConfig](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/estimator/run_config.py)
 along with a set of flags.
 
 ## `TF_CONFIG`