Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] What is the data format for QE ranker model? #54

Closed
taquynhnga2001 opened this issue Feb 4, 2022 · 1 comment
Closed

[QUESTION] What is the data format for QE ranker model? #54

taquynhnga2001 opened this issue Feb 4, 2022 · 1 comment
Labels
question Further information is requested

Comments

@taquynhnga2001
Copy link

I want to train my own metric which is a ranker model and referenceless. According to https://github.com/Unbabel/COMET/blob/0.1.0/docs/source/training.md the data format is a csv file with src, mt, ref and score. Because the ranker model needs to have a pos hypothesis and a neg hypothesis to train and as I understand it doesn't need score to train so is the data format the same for training ranker model?

@taquynhnga2001 taquynhnga2001 added the question Further information is requested label Feb 4, 2022
@ricardorei
Copy link
Collaborator

Hi @taquynhnga2001 yes you need another format for the ranker. You need the daRR data.

You can find that data in this issue: #36

WMT 17-> 19:
This includes relative ranks and DA scores.

wget https://unbabel-experimental-data-sets.s3-eu-west-1.amazonaws.com/comet/da/wmt-metrics.zip

and 2020 DA Relative-Ranks:

wget https://unbabel-experimental-data-sets.s3.eu-west-1.amazonaws.com/wmt/2020-daRR.csv.tar.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants