Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing WMT 2017 word_level/test.tags required for predictor-estimator evaluation #24

Closed
ninalopatina opened this issue Apr 22, 2019 · 4 comments
Labels
bug Something isn't working

Comments

@ninalopatina
Copy link

Describe the bug
The WMT 2017 test data set is missing the word_level/test.tags file that is required for predictor-estimator evaluation

To Reproduce
Steps to reproduce the behavior:

  1. Run everything in quickstart instructions for predest model (making corrections for typos + directory specifications)
  2. Use data from WMT 2017, train + test (in specified directories)
  3. Run kiwi evaluate --config experiments/evaluate_estimator.yaml
  4. See error "path must exist: data/WMT17/word_level/test.tags"

Expected behavior
I expected the evaluation to run. Second, I expected to find the WMT 2017 word_level/test.tags file, but it was not in the download from WMT test website.

Screenshots
$ kiwi evaluate --config experiments/evaluate_estimator.yaml
usage: kiwi evaluate [--config CONFIG] [--save-config SAVE_CONFIG] [-d] [-q] [--type {probs,tags}]
[--format {wmt17,wmt18}] [--pred-format {wmt17,wmt18}] [--sents-avg {probs,tags}]
[--gold-sents GOLD_SENTS] [--gold-target GOLD_TARGET] [--gold-source GOLD_SOURCE]
[--gold-cal GOLD_CAL] [--input-dir INPUT_DIR [INPUT_DIR ...]]
[--pred-sents PRED_SENTS [PRED_SENTS ...]] [--pred-target PRED_TARGET [PRED_TARGET ...]]
[--pred-gaps PRED_GAPS [PRED_GAPS ...]] [--pred-source PRED_SOURCE [PRED_SOURCE ...]]
[--pred-cal PRED_CAL]
kiwi evaluate: error: argument --gold-target: path must exist: data/WMT17/word_level/test.tags

The error is correct in that the file does not exist. I don't know where to find this file

Environment (please complete the following information):

  • OS: Linux
  • OpenKiwi version 0.1.1
  • Python version 3.6.5

Additional context
The 2018 test data doesn't have a .tags file either.

@ninalopatina ninalopatina added the bug Something isn't working label Apr 22, 2019
@captainvera
Copy link
Contributor

captainvera commented Apr 23, 2019

Hi @ninalopatina thanks for experimenting with OpenKiwi!

The WMT 2017 test data is available on their website: here (there is a purple link with gold-standard labels)

We should probably also include a link for these in the Quickstart document to make them more visible.

On the other hand the test tags for 2018 are not public (because they are the same as 2019 which is currently accepting submissions)

I'm closing this issue as it is solved and not really a bug with OpenKiwi. Feel free to re-open if you have any other questions!

Edit: I had wrongly stated that 2017 gold files were also not available

@ninalopatina
Copy link
Author

ninalopatina commented Apr 23, 2019

Thanks for looking into this so quickly. @captainvera. I had attempted to run this with the test data for 2017 & 2018, which I had obtained from the same site you linked. For both years, the test data includes only a .mt, .src, and .align file. There is no .tags file for either year, nor for 2016. Should I replace the test set links with dev set, to have a .tags files to evaluate with?

@captainvera
Copy link
Contributor

Hey @ninalopatina ! The .tags file is downloaded from a different location than the other test files. It's pointed out here:

image

You can download the .tags from there and evaluate your model! You could also replace it with the dev set but (if you trained using the dev set as validation) that would just give you your validation scores which you should be familiar with and not a "real" evaluation.

@ninalopatina
Copy link
Author

Thanks so much, @captainvera, I missed those links! I was thinking to test out the pipeline with the dev data until the test data becomes available, but this will work much better

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants