Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Very low BLEU score(3.47) for German to English translation #1623

Open
@minump

Description

@minump

Description

I followed the Walkthrough for German to English translation problem. Trained for 250K steps on Google cloud instance with Tesla K80 GPU. The resulted BLEU score is 3.47 for cased and uncased. I see that the paper mentions very good results for English to German. However I find the de-en BLEU score to be very low. There might be something wrong in my method. What can be the issue?
...

Environment information

OS:  $ lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.9 (stretch)
Release:        9.9

$ pip freeze | grep tensor
jupyter-tensorboard==0.1.10
mesh-tensorflow==0.0.5
tensor2tensor==1.13.4
tensorboard==1.14.0
tensorflow-datasets==1.0.2
tensorflow-estimator==1.14.0
tensorflow-gpu==1.14.0
tensorflow-hub==0.4.0
tensorflow-metadata==0.13.0
tensorflow-probability==0.7.0rc0
tensorflow-serving-api-gpu==1.13.0
tensorflow-transform==0.13.0

$ python -V
Python 2.7.13

For bugs: reproduction and error logs

Saving dict for global step 250000: global_step = 250000, loss = 1.8727324, metrics-translate_ende_wmt32k_rev/targets/
accuracy = 0.63982284, metrics-translate_ende_wmt32k_rev/targets/accuracy_per_sequence = 0.00935551, metrics-translate_ende_wmt32k_rev/targets/accuracy_top5 = 0.83673584, metr
ics-translate_ende_wmt32k_rev/targets/approx_bleu_score = 0.31759676, metrics-translate_ende_wmt32k_rev/targets/neg_log_perplexity = -1.8884568, metrics-translate_ende_wmt32k_
rev/targets/rouge_2_fscore = 0.40385386, metrics-translate_ende_wmt32k_rev/targets/rouge_L_fscore = 0.57753134

BLEU_uncased = 3.47
BLEU_cased = 3.47

# Steps to reproduce:
t2t-datagen --data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=translate_ende_wmt32k

t2t-trainer --data_dir=$DATA_DIR
--output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev
--model=transformer
--hparams_set=transformer_base_single_gpu

t2t-decoder --data_dir=$DATA_DIR --output_dir=$OUTPUT_DIR
--problem=translate_ende_wmt32k_rev 
--model=transformer --hparams_set=transformer_base_single_gpu
--decode_hparams="beam_size=4,alpha=0.6"
--decode_from_file=decode_this.txt --decode_to_file=Englishtranslation.en

t2t-bleu --translation=Englishtranslation.en --reference=ref-translation.en

### Description
Also decoded for WMT newstest2013.de, which was a part of the training set. This too produced a comparatively low BLEU score. 
BLEU_uncased =  27.47
BLEU_cased =  25.99
After tokenizing using moses tokenizer and calculating BLEU score using multi-bleu.perl :
BLEU = 26.08, 61.7/32.7/19.8/12.4 (BP=0.982, ratio=0.982, hyp_len=67394, ref_len=68604)
Since this was a part of the training set, I expected a much higher BLEU score.
I also referenced issue #317
After 250K the training stopped. Should I train till 500K steps? 

Pls suggest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions