Very low BLEU score(3.47) for German to English translation #1623
Description
Description
I followed the Walkthrough for German to English translation problem. Trained for 250K steps on Google cloud instance with Tesla K80 GPU. The resulted BLEU score is 3.47 for cased and uncased. I see that the paper mentions very good results for English to German. However I find the de-en BLEU score to be very low. There might be something wrong in my method. What can be the issue?
...
Environment information
OS: $ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 9.9 (stretch)
Release: 9.9
$ pip freeze | grep tensor
jupyter-tensorboard==0.1.10
mesh-tensorflow==0.0.5
tensor2tensor==1.13.4
tensorboard==1.14.0
tensorflow-datasets==1.0.2
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
tensorflow-hub==0.4.0
tensorflow-metadata==0.13.0
tensorflow-probability==0.7.0rc0
tensorflow-serving-api-gpu==1.13.0
tensorflow-transform==0.13.0
$ python -V
Python 2.7.13
For bugs: reproduction and error logs
Saving dict for global step 250000: global_step = 250000, loss = 1.8727324, metrics-translate_ende_wmt32k_rev/targets/
accuracy = 0.63982284, metrics-translate_ende_wmt32k_rev/targets/accuracy_per_sequence = 0.00935551, metrics-translate_ende_wmt32k_rev/targets/accuracy_top5 = 0.83673584, metr
ics-translate_ende_wmt32k_rev/targets/approx_bleu_score = 0.31759676, metrics-translate_ende_wmt32k_rev/targets/neg_log_perplexity = -1.8884568, metrics-translate_ende_wmt32k_
rev/targets/rouge_2_fscore = 0.40385386, metrics-translate_ende_wmt32k_rev/targets/rouge_L_fscore = 0.57753134
BLEU_uncased = 3.47
BLEU_cased = 3.47
# Steps to reproduce:
t2t-datagen --data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=translate_ende_wmt32k
t2t-trainer --data_dir=$DATA_DIR
--output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev
--model=transformer
--hparams_set=transformer_base_single_gpu
t2t-decoder --data_dir=$DATA_DIR --output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev
--model=transformer --hparams_set=transformer_base_single_gpu
--decode_hparams="beam_size=4,alpha=0.6"
--decode_from_file=decode_this.txt --decode_to_file=Englishtranslation.en
t2t-bleu --translation=Englishtranslation.en --reference=ref-translation.en
### Description
Also decoded for WMT newstest2013.de, which was a part of the training set. This too produced a comparatively low BLEU score.
BLEU_uncased = 27.47
BLEU_cased = 25.99
After tokenizing using moses tokenizer and calculating BLEU score using multi-bleu.perl :
BLEU = 26.08, 61.7/32.7/19.8/12.4 (BP=0.982, ratio=0.982, hyp_len=67394, ref_len=68604)
Since this was a part of the training set, I expected a much higher BLEU score.
I also referenced issue #317
After 250K the training stopped. Should I train till 500K steps?
Pls suggest.