# Assignment 1. Learning to Rank

Download MQ2007 dataset from [here](https://1drv.ms/f/s!Aqi9ONgj3OqPaynoZZSZVfHPJd0).  
For this assignment we will **use only Fold1**  

Data split: train, validation and test datasets are given.  
**You have to use hold-out validation to justify hyperparam selection.**  
**In each task you need to surpass baseline.**

Quality metric: NDCG@10

Data description:  
*relevance* - bigger the better  
*qid* - query id  
*docid* - document id  
*features* - 46-dim human-engineered feature vector   



In [4]:
import pandas as pd
import numpy as np

def preprocess(filename):
    df = []
    with open(filename, 'r') as f:
        for line in f:
            rec = {}
            items = line.split()
            rec['relevance'] = int(items[0])
            rec['qid'] = int(items[1].split(':')[1])
            rec['features'] = [float(items[i].split(':')[1]) for i in range(2,2+46)]
            rec['docid'] = items[50]
            df.append(rec)
    df = pd.DataFrame(df)
    return df
    

df_train = preprocess('MQ2007/Fold1/train.txt')
df_valid = preprocess('MQ2007/Fold1/vali.txt')
df_test = preprocess('MQ2007/Fold1/test.txt')

In [5]:
df_train.head()

Unnamed: 0,relevance,qid,features,docid
0,0,10,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...",GX000-00-0000000
1,1,10,"[0.03131, 0.666667, 0.5, 0.166667, 0.033206, 0...",GX000-24-12369390
2,1,10,"[0.078682, 0.166667, 0.5, 0.333333, 0.080022, ...",GX000-62-7863450
3,1,10,"[0.019058, 1.0, 1.0, 0.5, 0.022591, 0.0, 0.0, ...",GX016-48-5543459
4,0,10,"[0.039477, 0.0, 0.75, 0.166667, 0.040555, 0.0,...",GX037-87-3082362


## Task 1 (20 points)
Train **linear regression model** to predict relevance score for query-document pairs. Use scikit-learn library.  
Using trained model, rank documents for each query according to predicted relevance score.  

baseline: NDCG@10 == 0.05

Answer the questions:  
1. Which loss function did you choose and why?  
1. Did you apply some transformations on relevance score and why?  
    
Use your findings in the next tasks.

## Task 2 (20 points)

Implement and train **RankNet model** to rank documents in each query. Use pytorch.  
For base model $f(x)$ you should use multilayer perceptron.  
Note, that during training you have to consider every $(doc_i, doc_j)$ pair in the query.

Answer the questions: 
1. Does the result depend on transformations on relevance score?

baseline: NDCG@10 == 0.1  

## Task 3 (20 points)

Implement and train **ListNet model** to rank documents in each query. Use pytorch.  
For base model $\phi(x)$ you should use multilayer perceptron.

baseline: NDCG@10 == 0.15 

Answer the questions:   
1. Does the result depend on transformations on relevance score?
1. Which model performs better: RankNet or ListNet and why?  
1. Does your answer agree with the paper results? If they do not, what are the reasons?  

## Task 4 (20 points)

Implement and train **LambdaRank model** to rank documents in each query optimizing NDCG. Use pytorch.  
For base model $\phi(x)$ you should use multilayer perceptron.
Note, that during training you have to consider every $(doc_i, doc_j)$ pair in the query.  

baseline: NDCG@10 == 0.15 

Answer the questions: 
1. Which model performs better: LambdaRank or ListNet and why? 
1. Does your answer agree with the paper results? If they do not, what are the reasons?

## Task 5

Suppose we change our quality metric from NDCG to Kendal rank correlation coefficient (Kendal tau).  
Derive and write down new $\lambda'_{ij}$ for this metric for LambdaRank model.  
Train LambdaRank model to rank documents in each query optimizing Kendal tau.  
Evaluate Kendal tau for the model in task 4.  
Compare new model performance with the model from task 4. 