Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Very low BLEU score(3.47) for German to English translation #1623

Open
minump opened this issue Jul 5, 2019 · 0 comments
Open

Very low BLEU score(3.47) for German to English translation #1623

minump opened this issue Jul 5, 2019 · 0 comments

Comments

@minump
Copy link

minump commented Jul 5, 2019

Description

I followed the Walkthrough for German to English translation problem. Trained for 250K steps on Google cloud instance with Tesla K80 GPU. The resulted BLEU score is 3.47 for cased and uncased. I see that the paper mentions very good results for English to German. However I find the de-en BLEU score to be very low. There might be something wrong in my method. What can be the issue?
...

Environment information

OS:  $ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.9 (stretch)
Release:        9.9

$ pip freeze | grep tensor
jupyter-tensorboard==0.1.10
mesh-tensorflow==0.0.5
tensor2tensor==1.13.4
tensorboard==1.14.0
tensorflow-datasets==1.0.2
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
tensorflow-hub==0.4.0
tensorflow-metadata==0.13.0
tensorflow-probability==0.7.0rc0
tensorflow-serving-api-gpu==1.13.0
tensorflow-transform==0.13.0

$ python -V
Python 2.7.13

For bugs: reproduction and error logs

Saving dict for global step 250000: global_step = 250000, loss = 1.8727324, metrics-translate_ende_wmt32k_rev/targets/
accuracy = 0.63982284, metrics-translate_ende_wmt32k_rev/targets/accuracy_per_sequence = 0.00935551, metrics-translate_ende_wmt32k_rev/targets/accuracy_top5 = 0.83673584, metr
ics-translate_ende_wmt32k_rev/targets/approx_bleu_score = 0.31759676, metrics-translate_ende_wmt32k_rev/targets/neg_log_perplexity = -1.8884568, metrics-translate_ende_wmt32k_
rev/targets/rouge_2_fscore = 0.40385386, metrics-translate_ende_wmt32k_rev/targets/rouge_L_fscore = 0.57753134

BLEU_uncased = 3.47
BLEU_cased = 3.47

# Steps to reproduce:
t2t-datagen --data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=translate_ende_wmt32k

t2t-trainer --data_dir=$DATA_DIR
--output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev
--model=transformer
--hparams_set=transformer_base_single_gpu

t2t-decoder --data_dir=$DATA_DIR --output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev 
--model=transformer --hparams_set=transformer_base_single_gpu
--decode_hparams="beam_size=4,alpha=0.6"
--decode_from_file=decode_this.txt --decode_to_file=Englishtranslation.en

t2t-bleu --translation=Englishtranslation.en --reference=ref-translation.en

### Description
Also decoded for WMT newstest2013.de, which was a part of the training set. This too produced a comparatively low BLEU score. 
BLEU_uncased =  27.47
BLEU_cased =  25.99
After tokenizing using moses tokenizer and calculating BLEU score using multi-bleu.perl :
BLEU = 26.08, 61.7/32.7/19.8/12.4 (BP=0.982, ratio=0.982, hyp_len=67394, ref_len=68604)
Since this was a part of the training set, I expected a much higher BLEU score.
I also referenced issue #317
After 250K the training stopped. Should I train till 500K steps? 

Pls suggest.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant