Add WMT test sets via sacrebleu #39

mjpost · 2021-11-09T17:47:03Z

[Extracting from #30]

It would be nice to would be to add support for sacrebleu-style builtin test sets, e.g.,

# one option
$ cat system.txt | comet -t wmt20 -l de-en [other args]

# another option
$ cat system.txt | comet --sacrebleu-testset wmt20/de-en
$ cat system.txt | comet --sacrebleu-testset mtedx/valid/pt-es

You could accomplish this by just using sacrebleu as a library. It’s pretty easy:

from sacrebleu.utils import get_source, get_references, get_files

# trigger sacrebleu test set
# make these optional: nargs=“?” for argparse
if args.source is None and args.references is None:
    if args.sacrebleu_dataset is None:
        # throw error

    # some test sets are hierarchical, e.g., “mtedx/valid”
    test_set, langpair = args.sacrebleu_dataset.rsplit(“/“, maxsplit=1)
    source = get_source(test_set, langpair)
    ref = get_referencees(test_set, langpair)

     # alternative
    source, ref, _ = get_files(test_set, langpair)

Originally posted by @mjpost in #30 (comment)

The text was updated successfully, but these errors were encountered:

ricardorei closed this as completed in a4c2cf1 Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add WMT test sets via sacrebleu #39

Add WMT test sets via sacrebleu #39

mjpost commented Nov 9, 2021

Add WMT test sets via sacrebleu #39

Add WMT test sets via sacrebleu #39

Comments

mjpost commented Nov 9, 2021