Evaluate metrics in WMT task #50

cshanbo · 2017-06-26T02:39:12Z

Hi,
I read the paper Attention is all you need. The results of wmt tasks are really exciting.

But I found that there's no detailed explanation about what exact metrics was used in wmt translation task in the paper.

What I really mean by detailed explanation:

What evaluation script was used? For example, mteval-v11b.pl, or multi-bleu.perl
Is the evaluation case sensitive or insensitive?
Do we need to de-tokenize the output before evaluating?

update

a tiny mis-spelling here
deocding -> decoding

Thank you so much

The text was updated successfully, but these errors were encountered:

mehmedes · 2017-06-26T14:04:57Z

Is it possible to run an evaluation script during training after every n_th step?

lukaszkaiser · 2017-06-30T00:50:02Z

We added utils/get_ende_bleu.sh script that has the commands we used to go from detokenized decodes (produced by t2t_trainer --decode_from_file) to BLEU. It requires MOSES and perl, so you might need to look into the script and adjust paths to run it. But it's probably the best answer to your questions:
(1) the script provided (which uses MOSES tokenizer and multi-bleu)
(2) It is case-sensitive (coming from MOSES)
(3) Yes, the script if for de-tokenized output produced by the trainer on XXX_tokens_32k

If you're running on wmt_ende_bpe32k then instead of the tokenizer call in the script, do this:
perl -ple 's{@@ }{}g' > $decodes_file.target
(4) This is hard, because it needs perl and MOSES and we don't want to call them during training
(it's esp. a problem in the distributed setting, where machines don't have MOSES might not have perl).
That's why we have our approximate BLEU metric that gives us an idea where we are.

Hope that helps, feel free to reopen with more questions!

lukaszkaiser closed this as completed Jun 30, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate metrics in WMT task #50

Evaluate metrics in WMT task #50

cshanbo commented Jun 26, 2017 •

edited

mehmedes commented Jun 26, 2017

lukaszkaiser commented Jun 30, 2017

Evaluate metrics in WMT task #50

Evaluate metrics in WMT task #50

Comments

cshanbo commented Jun 26, 2017 • edited

update

mehmedes commented Jun 26, 2017

lukaszkaiser commented Jun 30, 2017

cshanbo commented Jun 26, 2017 •

edited