# Terrier Learning to Rank Examples

This notebook demonstrates the use of Pyterrier in a learning-to-rank fashion.

## Preparation

Lets install pyterrier, as usual.

In [1]:
#!pip install python-terrier
!pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier

Collecting python-terrier
  Cloning https://github.com/terrier-org/pyterrier.git to /tmp/pip-install-dipseer9/python-terrier
  Running command git clone -q https://github.com/terrier-org/pyterrier.git /tmp/pip-install-dipseer9/python-terrier
Collecting pyjnius~=1.3.0
[?25l  Downloading https://files.pythonhosted.org/packages/d8/50/098cb5fb76fb7c7d99d403226a2a63dcbfb5c129b71b7d0f5200b05de1f0/pyjnius-1.3.0-cp36-cp36m-manylinux2010_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 2.8MB/s 
Collecting wget
  Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip
Collecting pytrec_eval
  Downloading https://files.pythonhosted.org/packages/36/0a/5809ba805e62c98f81e19d6007132712945c78e7612c11f61bac76a25ba3/pytrec_eval-0.4.tar.gz
Collecting matchpy
[?25l  Downloading https://files.pythonhosted.org/packages/47/95/d265b944ce391bb2fa9982d7506bbb197bb55c5088ea74448a5ffcaeefab/matchpy-0.5.1-py3-none-any

## Init 

You must run pt.init() before other pyterrier functions and classes

Arguments:    
- `version` - terrier IR version e.g. "5.2"    
- `mem` - megabytes allocated to java e.g. 4096

In [2]:
import numpy as np
import pandas as pd
import pyterrier as pt
if not pt.started():
  pt.init()

terrier-assemblies 5.2  jar-with-dependencies not found, downloading to /root/.pyterrier...
Done
terrier-python-helper 0.0.3  jar not found, downloading to /root/.pyterrier...
Done


## Load Files and Index

Again, lets focus on the small Vaswani test collection. Its easily accessible via the dataset API. 

In [3]:
dataset = pt.datasets.get_dataset("vaswani")

indexref = dataset.get_index()
topics = dataset.get_topics()
qrels = dataset.get_qrels()

Downloading vaswani index to /root/.pyterrier/corpora/vaswani/index
Downloading vaswani topics to /root/.pyterrier/corpora/vaswani/query-text.trec
Downloading vaswani qrels to /root/.pyterrier/corpora/vaswani/qrels


## Multi-stage Retrieval

In this experiment, we will be re-ranking the results obtaind from a BM25 ranking, by adding more features. Will then pass these for re-ranking by a regression technique, such as Random Forests.

Conceptually, this pipeline has three stages:
1. PL2 ranking
2. Re-rank by each of the feaures ("TF_IDF" and "PL2")
3. Apply the RandomForests



In [0]:
#this ranker will make the candidate set of documents for each query
BM25 = pt.BatchRetrieve(indexref, controls = {"wmodel": "BM25"})

#these rankers we will use to re-rank the BM25 results
TF_IDF =  pt.BatchRetrieve(indexref, controls = {"wmodel": "TF_IDF"})
PL2 =  pt.BatchRetrieve(indexref, controls = {"wmodel": "PL2"})

OK, so how do we combine these?

In [0]:
pipe = BM25 >> (TF_IDF ** PL2)

Here, we are using two Pyterrer operators:
 - `>>` means "then", and takes the output documents of BM25 and puts them into the next stage. This means that TF_IDF and PL2 are ONLY applied on the documents that BM25 has identified.
 - `**` means feature-union - which makes each ranker into a feature in the `features` column of the results.

Lets give a look at the output to see what it gives:

In [0]:
#this cell wont work until Terrier 5.3 has been released
#pipe.transform("chemical end:2")

See, we now have a "features" column with number from the TF_IDF and PL2 feature scores.

*A note about efficiency*: doing retrieval, then re-ranking the documents again can be slow. For this reason, Terrier has a FeaturesBatchRetrieve. Lets try this:

In [19]:
fbr = pt.FeaturesBatchRetrieve(indexref, controls = {"wmodel": "BM25"}, features=["SAMPLE", "WMODEL:TF_IDF", "WMODEL:PL2"]) 
#lets look at the top 2 results
(fbr %2).transform("chemical")

  0%|          | 0/1 [00:00<?, ?q/s]mqt has 1 terms while fatresultset has 1
Using all of 
Term: chemic$[firstmatchscore] qtw=1.0 es=term2659 Nt=20 TF=21 maxTF=2147483647 @{0 38530 6} scored for wm TF_IDF
mqt has 1 terms while fatresultset has 1
Using all of 
Term: chemic$[firstmatchscore] qtw=1.0 es=term2659 Nt=20 TF=21 maxTF=2147483647 @{0 38530 6} scored for wm PL2c1.0
100%|██████████| 1/1 [00:00<00:00, 42.02q/s]


Unnamed: 0,qid,docid,rank,docno,score,features
0,1,10702,0,10703,13.472012,"[13.472012496423268, 7.38109017620895, 6.99922..."
1,1,1055,1,1056,12.517082,"[12.517081895047532, 6.857899681644975, 6.3584..."
2,1,4885,2,4886,12.228161,"[12.22816082084599, 6.69960466053696, 6.181368..."


Aside: In this case, we also added a "SAMPLE" feature, to get the BM25 scores as a feature. Thats why you see 3 features for each document.

However, this kind of optimisation is common in Pyterrier, so Pyterrier actually supports automatic pipeline optimisation, using the `.compile()` function.

In [20]:
pipe_fast = pipe.compile()
(pipe_fast %2).transform("chemical")

  0%|          | 0/1 [00:00<?, ?q/s]mqt has 1 terms while fatresultset has 1
Using all of 
Term: chemic$[firstmatchscore] qtw=1.0 es=term2659 Nt=20 TF=21 maxTF=2147483647 @{0 38530 6} scored for wm TF_IDF
mqt has 1 terms while fatresultset has 1
Using all of 
Term: chemic$[firstmatchscore] qtw=1.0 es=term2659 Nt=20 TF=21 maxTF=2147483647 @{0 38530 6} scored for wm PL2c1.0
100%|██████████| 1/1 [00:00<00:00, 23.72q/s]

Applying 8 rules





Unnamed: 0,qid,docid,rank,docno,score,features
0,1,10702,0,10703,13.472012,"[7.38109017620895, 6.9992254918907575]"
1,1,1055,1,1056,12.517082,"[6.857899681644975, 6.358419229871986]"
2,1,4885,2,4886,12.228161,"[6.69960466053696, 6.181368165774688]"


# Learning models and re-ranking

Ok, lets get onto the actual machine learning. We can use standard Python ML techniques. We will demonstrate a few here, including from sci-kit learn and xgBoost.

In each case, the pattern is the same:
 - Create a transformer that does the re-ranking
 - Call the fit() method on the created object with the training topics (and validation topics as necessary)
 - Evaluate the results with the Experiment function by using the test topics

 Firstly, lets separate our topics into train/validation/test.

In [0]:
train_topics, valid_topics, test_topics = np.split(topics, [int(.6*len(topics)), int(.8*len(topics))])

## sci-kit learn RandomForestRegressor

Our first learning-to-rank will be done using sci-kit learn's [RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html). 

We use `pt.piptlines.LTR_pipeline`, which is a pyterrier transformer that passes the document features as "X" features to RandomForest. To learn the model (called fitting) the RandomForest, we invoke the `fit()` method - on the entire pipeline, specifying the queries (topics) and relevance assessment (qrels). The latter for the "Y" labels for the RandomForest fitting.

NB: due to their bootstrap nature, Random Forests do not overfit, so we do not provide validation data to `fit()`.

On the other hand, we could use any regression learner from sklearn, and adjust its parameters ourselves.

Finally, we Experiment() on the test data to compare performances.

In [22]:
from sklearn.ensemble import RandomForestRegressor

BaselineLTR = fbr >> pt.pipelines.LTR_pipeline(RandomForestRegressor(n_estimators=400))
BaselineLTR.fit(train_topics, qrels)

results = pt.pipelines.Experiment([PL2, BaselineLTR], test_topics, qrels, ["map"], names=["PL2 Baseline", "LTR Baseline"])
results

  0%|          | 0/55 [00:00<?, ?q/s]mqt has 6 terms while fatresultset has 6
Using all of 
Term: measur$[firstmatchscore] qtw=1.0 es=term70 Nt=1226 TF=1511 maxTF=2147483647 @{0 166571 4} scored for wm TF_IDF
Term: dielectr$[firstmatchscore] qtw=1.0 es=term493 Nt=232 TF=308 maxTF=2147483647 @{0 73422 7} scored for wm TF_IDF
Term: constant$[firstmatchscore] qtw=1.0 es=term487 Nt=430 TF=523 maxTF=2147483647 @{0 52178 4} scored for wm TF_IDF
Term: liquid$[firstmatchscore] qtw=1.0 es=term285 Nt=49 TF=57 maxTF=2147483647 @{0 155217 5} scored for wm TF_IDF
Term: microwav$[firstmatchscore] qtw=1.0 es=term139 Nt=376 TF=458 maxTF=2147483647 @{0 171263 1} scored for wm TF_IDF
Term: techniqu$[firstmatchscore] qtw=1.0 es=term94 Nt=410 TF=445 maxTF=2147483647 @{0 276100 1} scored for wm TF_IDF
mqt has 6 terms while fatresultset has 6
Using all of 
Term: measur$[firstmatchscore] qtw=1.0 es=term70 Nt=1226 TF=1511 maxTF=2147483647 @{0 166571 4} scored for wm PL2c1.0
Term: dielectr$[firstmatchscore] qt

Unnamed: 0,name,map
0,PL2 Baseline,0.206031
1,LTR Baseline,0.146463


## XgBoost Pipeline

We now demonstrate the use of a LambdaMART implementation from [xgBoost](https://xgboost.readthedocs.io/en/latest/). Again, pyTerrier provides a transformer object, namely `XGBoostLTR_pipeline`, which takes in the constrcutor the actual xgBoost model that you want to train. We took the xgBoost configuration from [their example code](https://github.com/dmlc/xgboost/blob/master/demo/rank/rank.py).

Call the `fit()` method on the full pipeline with the training and validation topics.

Evaluate the results with the Experiment function by using the test topics

In [24]:
import xgboost as xgb
params = {'objective': 'rank:ndcg', 
          'learning_rate': 0.1, 
          'gamma': 1.0, 'min_child_weight': 0.1,
          'max_depth': 6,
          'verbose': 2,
          'random_state': 42 
         }

BaseLTR_LM = fbr >> pt.pipelines.XGBoostLTR_pipeline(xgb.sklearn.XGBRanker(**params))
BaseLTR_LM.fit(train_topics, qrels, valid_topics, qrels)

  0%|          | 0/55 [00:00<?, ?q/s]mqt has 6 terms while fatresultset has 6
Using all of 
Term: measur$[firstmatchscore] qtw=1.0 es=term70 Nt=1226 TF=1511 maxTF=2147483647 @{0 166571 4} scored for wm TF_IDF
Term: dielectr$[firstmatchscore] qtw=1.0 es=term493 Nt=232 TF=308 maxTF=2147483647 @{0 73422 7} scored for wm TF_IDF
Term: constant$[firstmatchscore] qtw=1.0 es=term487 Nt=430 TF=523 maxTF=2147483647 @{0 52178 4} scored for wm TF_IDF
Term: liquid$[firstmatchscore] qtw=1.0 es=term285 Nt=49 TF=57 maxTF=2147483647 @{0 155217 5} scored for wm TF_IDF
Term: microwav$[firstmatchscore] qtw=1.0 es=term139 Nt=376 TF=458 maxTF=2147483647 @{0 171263 1} scored for wm TF_IDF
Term: techniqu$[firstmatchscore] qtw=1.0 es=term94 Nt=410 TF=445 maxTF=2147483647 @{0 276100 1} scored for wm TF_IDF
mqt has 6 terms while fatresultset has 6
Using all of 
Term: measur$[firstmatchscore] qtw=1.0 es=term70 Nt=1226 TF=1511 maxTF=2147483647 @{0 166571 4} scored for wm PL2c1.0
Term: dielectr$[firstmatchscore] qt

And evaluate the results.

In [0]:
allresultsLM = pt.pipelines.Experiment([PL2, BaseLTR_LM],
                                test_topics,                                  
                                qrels, ["map"], 
                                names=["PL2 Baseline", "LambdaMART"])
allresultsLM

  0%|          | 0/19 [00:00<?, ?q/s]mqt has 3 terms while fatresultset has 2
Using all of 
Term: linear$[firstmatchscore] qtw=1.0 es=term12 Nt=398 TF=471 maxTF=2147483647 @{0 154639 6} scored for wm PL2c1.0
Term: network$[firstmatchscore] qtw=1.0 es=term227 Nt=607 TF=999 maxTF=2147483647 @{0 180057 5} scored for wm PL2c1.0
mqt has 3 terms while fatresultset has 2
Using all of 
Term: linear$[firstmatchscore] qtw=1.0 es=term12 Nt=398 TF=471 maxTF=2147483647 @{0 154639 6} scored for wm TF_IDF
Term: network$[firstmatchscore] qtw=1.0 es=term227 Nt=607 TF=999 maxTF=2147483647 @{0 180057 5} scored for wm TF_IDF
mqt has 4 terms while fatresultset has 4
Using all of 
Term: transistor$[firstmatchscore] qtw=1.0 es=term54 Nt=640 TF=1045 maxTF=2147483647 @{0 286138 4} scored for wm PL2c1.0
Term: phase$[firstmatchscore] qtw=1.0 es=term274 Nt=509 TF=688 maxTF=2147483647 @{0 201355 1} scored for wm PL2c1.0
Term: split$[firstmatchscore] qtw=1.0 es=term1325 Nt=47 TF=55 maxTF=2147483647 @{0 260640 1} sc

Unnamed: 0,name,map
0,PL2 Baseline,0.206031
1,LambdaMART,0.19743
