Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

Evaluate metrics in WMT task #50

Closed
cshanbo opened this issue Jun 26, 2017 · 3 comments
Closed

Evaluate metrics in WMT task #50

cshanbo opened this issue Jun 26, 2017 · 3 comments

Comments

@cshanbo
Copy link
Contributor

cshanbo commented Jun 26, 2017

Hi,
I read the paper Attention is all you need. The results of wmt tasks are really exciting.

But I found that there's no detailed explanation about what exact metrics was used in wmt translation task in the paper.

What I really mean by detailed explanation:

  1. What evaluation script was used? For example, mteval-v11b.pl, or multi-bleu.perl
  2. Is the evaluation case sensitive or insensitive?
  3. Do we need to de-tokenize the output before evaluating?

update

a tiny mis-spelling here
deocding -> decoding

Thank you so much

@mehmedes
Copy link

  1. Is it possible to run an evaluation script during training after every n_th step?

@lukaszkaiser
Copy link
Contributor

We added utils/get_ende_bleu.sh script that has the commands we used to go from detokenized decodes (produced by t2t_trainer --decode_from_file) to BLEU. It requires MOSES and perl, so you might need to look into the script and adjust paths to run it. But it's probably the best answer to your questions:
(1) the script provided (which uses MOSES tokenizer and multi-bleu)
(2) It is case-sensitive (coming from MOSES)
(3) Yes, the script if for de-tokenized output produced by the trainer on XXX_tokens_32k

  • If you're running on wmt_ende_bpe32k then instead of the tokenizer call in the script, do this:
    perl -ple 's{@@ }{}g' > $decodes_file.target
    (4) This is hard, because it needs perl and MOSES and we don't want to call them during training
    (it's esp. a problem in the distributed setting, where machines don't have MOSES might not have perl).
    That's why we have our approximate BLEU metric that gives us an idea where we are.

Hope that helps, feel free to reopen with more questions!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@lukaszkaiser @cshanbo @mehmedes and others