Unable to reproduce WMT En2De results #539

Zrachel · 2018-01-24T06:28:31Z

I'm reproducing WMT14 EN-DE experiment.
Here is my training result:

INFO:tensorflow:Validation (step 250000): loss = 1.67542, metrics-translate_ende_wmt32k/accuracy = 0.650801, global_step = 248002, 
metrics-translate_ende_wmt32k/accuracy_per_sequence = 0.0304319, metrics-translate_ende_wmt32k/accuracy_top5 = 0.834215, 
metrics-translate_ende_wmt32k/rouge_L_fscore = 0.569232, metrics-translate_ende_wmt32k/approx_bleu_score = 0.323901, 
metrics-translate_ende_wmt32k/rouge_2_fscore = 0.394656, metrics-translate_ende_wmt32k/neg_log_perplexity = -1.86773

Here is my decoding result:

+ perl /home/zhangruiqing01/tools/MT/mosesdecoder/scripts/generic/multi-bleu.perl /home/zhangruiqing01/MT/Common/python/data/wmt14en-de/input/stanford/test/newstest2014.de.atat
BLEU = 20.91, 48.3/25.6/15.6/9.9 (BP=1.000, ratio=1.062, hyp_len=70763, ref_len=66661)

I've refered to this issue #317 and found my training loss is larger than his 1.56711 at sstep 250000. Is there anything wrong with my training process?

My Environment:
Using 4 K40 GPU cards

PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_base

DATA_DIR=../t2t_data
TMP_DIR=../t2t_datagen
TRAIN_DIR=$MODEL-$HPARAMS

stanford=~/MT/Common/python/data/wmt14en-de/input/stanford/test

Training code:

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS batch_size=1024 \
  --output_dir=$TRAIN_DIR \
  --worker_gpu=4

Testing code:

YEAR=2014
BEAM_SIZE=5
ALPHA=0.6

t2t-decoder \
    --data_dir=$DATA_DIR \
    --problems=$PROBLEM  \
    --model=$MODEL \
    --hparams_set=$HPARAMS --batch_size=5 \
    --output_dir=$TRAIN_DIR \
    --decode_beam_size=$BEAM_SIZE \
    --decode_alpha=$ALPHA \
    --decode_from_file=$stanford/newstest${YEAR}.en

translation=$stanford/newstest${YEAR}.en.${MODEL}.${HPARAMS}.${PROBLEM}.beam4.alpha${ALPHA}.decodes
groundtruth=$stanford/newstest${YEAR}.de
#Tokenize the translation
perl ~/MT/Common/python/data/nist/bin/mosesdecoder/scripts/tokenizer/tokenizer.perl -l de < $translation > $translation.tok
perl ~/MT/Common/python/data/nist/bin/mosesdecoder/scripts/tokenizer/tokenizer.perl -l de < $groundtruth > $groundtruth.tok

#Do compount splitting on the translation
perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' < $translation.tok > $translation.atat
perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' < $groundtruth.tok > $groundtruth.atat

#Score the translation
perl ~/tools/MT/mosesdecoder/scripts/generic/multi-bleu.perl $groundtruth.atat < $translation.atat

The text was updated successfully, but these errors were encountered:

vince62s · 2018-01-24T09:42:19Z

read this #444
also if you used 1.4.2 there is probably soemthing wring with the last version.

martinpopel · 2018-01-24T10:46:28Z

There is a problem since v1.3.0: #529. It seems that it can be worked around by increasing warmup steps (and training long enough), but anyway the optimal hyper-parameters have changed at 1.3.0.
Note also that batch_size=1024 (with just 4 GPUs) is much lower than in the "Attention is all you need" paper, so you should train for many more steps and/or expect lower BLEU (even with the optimal warmup steps and learning rate).

rsepassi · 2018-02-09T00:09:32Z

Verified that a recent run on 8 P100 GPUs with transformer, transformer_base, and translate_ende_wmt32k achieved negative log perplexity of -1.533 (higher is better) after 250k steps.

rsepassi · 2018-02-09T00:12:02Z

Zoomed in:

rsepassi · 2018-02-09T00:12:38Z

Will close in favor of #529. Let's continue the discussion there.

rsepassi closed this as completed Feb 9, 2018

liesun1994 mentioned this issue Mar 15, 2018

[Tuning] Results are GPU-number and batch-size dependent #444

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to reproduce WMT En2De results #539

Unable to reproduce WMT En2De results #539

Zrachel commented Jan 24, 2018

vince62s commented Jan 24, 2018

martinpopel commented Jan 24, 2018

rsepassi commented Feb 9, 2018 •

edited

Loading

rsepassi commented Feb 9, 2018

rsepassi commented Feb 9, 2018

Unable to reproduce WMT En2De results #539

Unable to reproduce WMT En2De results #539

Comments

Zrachel commented Jan 24, 2018

vince62s commented Jan 24, 2018

martinpopel commented Jan 24, 2018

rsepassi commented Feb 9, 2018 • edited Loading

rsepassi commented Feb 9, 2018

rsepassi commented Feb 9, 2018

rsepassi commented Feb 9, 2018 •

edited

Loading