[QUESTION] What is the data format for QE ranker model? #54

taquynhnga2001 · 2022-02-04T06:52:37Z

I want to train my own metric which is a ranker model and referenceless. According to https://github.com/Unbabel/COMET/blob/0.1.0/docs/source/training.md the data format is a csv file with src, mt, ref and score. Because the ranker model needs to have a pos hypothesis and a neg hypothesis to train and as I understand it doesn't need score to train so is the data format the same for training ranker model?

ricardorei · 2022-02-04T16:15:46Z

Hi @taquynhnga2001 yes you need another format for the ranker. You need the daRR data.

You can find that data in this issue: #36

WMT 17-> 19:
This includes relative ranks and DA scores.

wget https://unbabel-experimental-data-sets.s3-eu-west-1.amazonaws.com/comet/da/wmt-metrics.zip

and 2020 DA Relative-Ranks:

wget https://unbabel-experimental-data-sets.s3.eu-west-1.amazonaws.com/wmt/2020-daRR.csv.tar.gz

taquynhnga2001 added the question Further information is requested label Feb 4, 2022

ricardorei closed this as completed Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QUESTION] What is the data format for QE ranker model? #54

[QUESTION] What is the data format for QE ranker model? #54

taquynhnga2001 commented Feb 4, 2022

ricardorei commented Feb 4, 2022

[QUESTION] What is the data format for QE ranker model? #54

[QUESTION] What is the data format for QE ranker model? #54

Comments

taquynhnga2001 commented Feb 4, 2022

ricardorei commented Feb 4, 2022