Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Unable to reproduce WMT En2De results #539

Closed
Zrachel opened this issue Jan 24, 2018 · 5 comments
Closed

Unable to reproduce WMT En2De results #539

Zrachel opened this issue Jan 24, 2018 · 5 comments

Comments

@Zrachel
Copy link

Zrachel commented Jan 24, 2018

I'm reproducing WMT14 EN-DE experiment.
Here is my training result:

INFO:tensorflow:Validation (step 250000): loss = 1.67542, metrics-translate_ende_wmt32k/accuracy = 0.650801, global_step = 248002, 
metrics-translate_ende_wmt32k/accuracy_per_sequence = 0.0304319, metrics-translate_ende_wmt32k/accuracy_top5 = 0.834215, 
metrics-translate_ende_wmt32k/rouge_L_fscore = 0.569232, metrics-translate_ende_wmt32k/approx_bleu_score = 0.323901, 
metrics-translate_ende_wmt32k/rouge_2_fscore = 0.394656, metrics-translate_ende_wmt32k/neg_log_perplexity = -1.86773

Here is my decoding result:

+ perl /home/zhangruiqing01/tools/MT/mosesdecoder/scripts/generic/multi-bleu.perl /home/zhangruiqing01/MT/Common/python/data/wmt14en-de/input/stanford/test/newstest2014.de.atat
BLEU = 20.91, 48.3/25.6/15.6/9.9 (BP=1.000, ratio=1.062, hyp_len=70763, ref_len=66661)

I've refered to this issue #317 and found my training loss is larger than his 1.56711 at sstep 250000. Is there anything wrong with my training process?

My Environment:
Using 4 K40 GPU cards

PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_base

DATA_DIR=../t2t_data
TMP_DIR=../t2t_datagen
TRAIN_DIR=$MODEL-$HPARAMS

stanford=~/MT/Common/python/data/wmt14en-de/input/stanford/test       

Training code:

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problems=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS batch_size=1024 \
  --output_dir=$TRAIN_DIR \
  --worker_gpu=4

Testing code:

YEAR=2014
BEAM_SIZE=5
ALPHA=0.6

t2t-decoder \
    --data_dir=$DATA_DIR \
    --problems=$PROBLEM  \
    --model=$MODEL \
    --hparams_set=$HPARAMS --batch_size=5 \
    --output_dir=$TRAIN_DIR \
    --decode_beam_size=$BEAM_SIZE \
    --decode_alpha=$ALPHA \
    --decode_from_file=$stanford/newstest${YEAR}.en

translation=$stanford/newstest${YEAR}.en.${MODEL}.${HPARAMS}.${PROBLEM}.beam4.alpha${ALPHA}.decodes
groundtruth=$stanford/newstest${YEAR}.de
#Tokenize the translation
perl ~/MT/Common/python/data/nist/bin/mosesdecoder/scripts/tokenizer/tokenizer.perl -l de < $translation > $translation.tok
perl ~/MT/Common/python/data/nist/bin/mosesdecoder/scripts/tokenizer/tokenizer.perl -l de < $groundtruth > $groundtruth.tok

#Do compount splitting on the translation
perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' < $translation.tok > $translation.atat
perl -ple 's{(\S)-(\S)}{$1 ##AT##-##AT## $2}g' < $groundtruth.tok > $groundtruth.atat

#Score the translation
perl ~/tools/MT/mosesdecoder/scripts/generic/multi-bleu.perl $groundtruth.atat < $translation.atat
@vince62s
Copy link
Contributor

read this #444
also if you used 1.4.2 there is probably soemthing wring with the last version.

@martinpopel
Copy link
Contributor

There is a problem since v1.3.0: #529. It seems that it can be worked around by increasing warmup steps (and training long enough), but anyway the optimal hyper-parameters have changed at 1.3.0.
Note also that batch_size=1024 (with just 4 GPUs) is much lower than in the "Attention is all you need" paper, so you should train for many more steps and/or expect lower BLEU (even with the optimal warmup steps and learning rate).

@rsepassi
Copy link
Contributor

rsepassi commented Feb 9, 2018

Verified that a recent run on 8 P100 GPUs with transformer, transformer_base, and translate_ende_wmt32k achieved negative log perplexity of -1.533 (higher is better) after 250k steps.

lossbase

@rsepassi
Copy link
Contributor

rsepassi commented Feb 9, 2018

Zoomed in:

lossbase

@rsepassi
Copy link
Contributor

rsepassi commented Feb 9, 2018

Will close in favor of #529. Let's continue the discussion there.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants