Very low BLEU score(3.47) for German to English translation

### Description
I followed the Walkthrough for German to English translation problem. Trained for 250K steps on Google cloud instance with Tesla K80 GPU. The resulted BLEU score is 3.47 for cased and uncased. I see that the paper mentions very good results for English to German. However I find the de-en BLEU score to be very low. There might be something wrong in my method. What can be the issue?
...

### Environment information

```
OS:  $ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.9 (stretch)
Release:        9.9

$ pip freeze | grep tensor
jupyter-tensorboard==0.1.10
mesh-tensorflow==0.0.5
tensor2tensor==1.13.4
tensorboard==1.14.0
tensorflow-datasets==1.0.2
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
tensorflow-hub==0.4.0
tensorflow-metadata==0.13.0
tensorflow-probability==0.7.0rc0
tensorflow-serving-api-gpu==1.13.0
tensorflow-transform==0.13.0

$ python -V
Python 2.7.13
```

### For bugs: reproduction and error logs
Saving dict for global step 250000: global_step = 250000, loss = 1.8727324, metrics-translate_ende_wmt32k_rev/targets/
accuracy = 0.63982284, metrics-translate_ende_wmt32k_rev/targets/accuracy_per_sequence = 0.00935551, metrics-translate_ende_wmt32k_rev/targets/accuracy_top5 = 0.83673584, metr
ics-translate_ende_wmt32k_rev/targets/approx_bleu_score = 0.31759676, metrics-translate_ende_wmt32k_rev/targets/neg_log_perplexity = -1.8884568, metrics-translate_ende_wmt32k_
rev/targets/rouge_2_fscore = 0.40385386, metrics-translate_ende_wmt32k_rev/targets/rouge_L_fscore = 0.57753134

BLEU_uncased =   3.47
BLEU_cased =   3.47
```
# Steps to reproduce:
t2t-datagen --data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=translate_ende_wmt32k

t2t-trainer --data_dir=$DATA_DIR
--output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev
--model=transformer
--hparams_set=transformer_base_single_gpu

t2t-decoder --data_dir=$DATA_DIR --output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev 
--model=transformer --hparams_set=transformer_base_single_gpu
--decode_hparams="beam_size=4,alpha=0.6"
--decode_from_file=decode_this.txt --decode_to_file=Englishtranslation.en

t2t-bleu --translation=Englishtranslation.en --reference=ref-translation.en

### Description
Also decoded for WMT newstest2013.de, which was a part of the training set. This too produced a comparatively low BLEU score. 
BLEU_uncased =  27.47
BLEU_cased =  25.99
After tokenizing using moses tokenizer and calculating BLEU score using multi-bleu.perl :
BLEU = 26.08, 61.7/32.7/19.8/12.4 (BP=0.982, ratio=0.982, hyp_len=67394, ref_len=68604)
Since this was a part of the training set, I expected a much higher BLEU score.
I also referenced issue #317
After 250K the training stopped. Should I train till 500K steps? 

Pls suggest.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Very low BLEU score(3.47) for German to English translation #1623

Description

Environment information

For bugs: reproduction and error logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Very low BLEU score(3.47) for German to English translation #1623

Description

Description

Environment information

For bugs: reproduction and error logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions