# Terrier Learning to Rank Examples

This notebook demonstrates the use of Pyterrier in a learning-to-rank fashion.

## Preparation

Lets install pyterrier, as usual.

In [1]:
%pip install -q python-terrier

## Init

You must run `pt.init()` before other PyTerrier functions and classes.

`pt.init()` takes arguments such as:    
- `version` - Terrier version e.g. "5.2"    
- `mem` - megabytes allocated to JVM e.g. 4096

See also: https://pyterrier.readthedocs.io/en/latest/installation.html

In [2]:
import numpy as np
import pandas as pd
import pyterrier as pt
if not pt.started():
  pt.init()

PyTerrier 0.10.0 has loaded Terrier 5.8 (built by craigm on 2023-11-01 18:05) and terrier-helper 0.0.8



## Load Files and Index

Again, lets focus on the small Vaswani test collection. Its easily accessible via the dataset API.

In [3]:
dataset = pt.datasets.get_dataset("vaswani")

indexref = dataset.get_index()
topics = dataset.get_topics()
qrels = dataset.get_qrels()

## Multi-stage Retrieval

In this experiment, we will be re-ranking the results obtaind from a BM25 ranking, by adding more features. Will then pass these for re-ranking by a regression technique, such as Random Forests.

Conceptually, this pipeline has three stages:
1. PL2 ranking
2. Re-rank by each of the feaures ("TF_IDF" and "PL2")
3. Apply the RandomForests



In [4]:
#this ranker will make the candidate set of documents for each query
BM25 = pt.BatchRetrieve(indexref, wmodel="BM25")

#these rankers we will use to re-rank the BM25 results
TF_IDF =  pt.BatchRetrieve(indexref, wmodel="TF_IDF")
PL2 =  pt.BatchRetrieve(indexref, wmodel="PL2")

OK, so how do we combine these?

In [5]:
pipe = BM25 >> (TF_IDF ** PL2)

Here, we are using two Pyterrer operators:
 - `>>` means "then", and takes the output documents of BM25 and puts them into the next stage. This means that TF_IDF and PL2 are ONLY applied on the documents that BM25 has identified.
 - `**` means feature-union - which makes each ranker into a feature in the `features` column of the results.

Lets give a look at the output to see what it gives:

In [6]:
pipe.search("chemical").head(2)

Unnamed: 0,qid,docid,docno,rank,score,query,features
0,1,10702,10703,0,13.472012,chemical,"[7.38109017620895, 6.9992254918907575]"
1,1,1055,1056,1,12.517082,chemical,"[6.857899681644975, 6.358419229871986]"


See, we now have a "features" column with numbers representing the TF_IDF and PL2 feature scores.

*A note about efficiency*: doing retrieval, then re-ranking the documents again can be slow. For this reason, Terrier has a FeaturesBatchRetrieve. Lets try this:

In [7]:
fbr = pt.FeaturesBatchRetrieve(indexref, wmodel="BM25", features=["WMODEL:TF_IDF", "WMODEL:PL2"])
#lets look at the top 2 results
(fbr %2).search("chemical")

Unnamed: 0,qid,query,docid,rank,features,docno,score
0,1,chemical,10702,0,"[1.9972714735280614, 1.590216305943686]",10703,13.472012
1,1,chemical,1055,1,"[2.5168371014881425, 2.1297038460724336]",1056,12.517082


However, this kind of optimisation is common in Pyterrier, so Pyterrier actually supports automatic pipeline optimisation, using the `.compile()` function.

In [8]:
pipe_fast = pipe.compile()
(pipe_fast %2).search("chemical")

Applying 8 rules


Unnamed: 0,qid,docid,docno,rank,score,query,features
0,1,10702,10703,0,13.472012,chemical,"[7.38109017620895, 6.9992254918907575]"
1,1,1055,1056,1,12.517082,chemical,"[6.857899681644975, 6.358419229871986]"


Finally, often we want our initial retrieval score to be a feature also. We can do this in one of two ways:
 - by adding a `SAMPLE` feature to FeaturesBatchRetrieve
 - or in the original feature-union definition, including an identity Transformer

In [9]:
fbr = pt.FeaturesBatchRetrieve(indexref, wmodel="BM25", features=["SAMPLE", "WMODEL:TF_IDF", "WMODEL:PL2"])
pipe3f = BM25 >> (pt.Transformer.identity() ** TF_IDF ** PL2)

(pipe3f %2).search("chemical")

Unnamed: 0,qid,docid,docno,rank,score,query,features
0,1,10702,10703,0,13.472012,chemical,"[13.472012496423268, 7.38109017620895, 6.99922..."
1,1,1055,1056,1,12.517082,chemical,"[12.517081895047532, 6.857899681644975, 6.3584..."


# Learning models and re-ranking

Ok, lets get onto the actual machine learning. We can use standard Python ML techniques. We will demonstrate a few here, including from sci-kit learn and xgBoost.

In each case, the pattern is the same:
 - Create a transformer that does the re-ranking
 - Call the fit() method on the created object with the training topics (and validation topics as necessary)
 - Evaluate the results with the Experiment function by using the test topics

 Firstly, lets separate our topics into train/validation/test.

In [10]:
train_topics, valid_topics, test_topics = np.split(topics, [int(.6*len(topics)), int(.8*len(topics))])

## sci-kit learn RandomForestRegressor

Our first learning-to-rank will be done using sci-kit learn's [RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html).

We use `pt.ltr.apply_learned_model()`, which returns a PyTerrier Transformer that passes the document features as "X" features to RandomForest. To learn the model (called fitting) the RandomForest, we invoke the `fit()` method - on the entire pipeline, specifying the queries (topics) and relevance assessment (qrels). The latter are used for the "Y" labels for the RandomForest fitting.

NB: due to their bootstrap nature, Random Forests do not overfit, so we do not provide validation data to `fit()`.

On the other hand, we could use any regression learner from sklearn, and adjust its parameters ourselves.

Finally, we Experiment() on the test data to compare performances.

In [11]:
from sklearn.ensemble import RandomForestRegressor

BaselineLTR = fbr >> pt.ltr.apply_learned_model(RandomForestRegressor(n_estimators=400))
BaselineLTR.fit(train_topics, qrels)

results = pt.Experiment([PL2, BaselineLTR], test_topics, qrels, ["map"], names=["PL2 Baseline", "LTR Baseline"])
results

Unnamed: 0,name,map
0,PL2 Baseline,0.206031
1,LTR Baseline,0.1539


## XgBoost Pipeline

We now demonstrate the use of a LambdaMART implementation from [xgBoost](https://xgboost.readthedocs.io/en/latest/). Again, PyTerrier provides a Transformer object from ``pt.ltr.apply_learned_model()'', this time passing `form='ltr'` as kwarg.

This takes in the constrcutor the actual xgBoost model that you want to train. We took the xgBoost configuration from [their example code](https://github.com/dmlc/xgboost/blob/master/demo/rank/rank.py).

Call the `fit()` method on the full pipeline with the training and validation topics.

The same pipeline can also be used with [LightGBM](https://github.com/microsoft/LightGBM).

Evaluate the results with the Experiment function by using the test topics.

In [12]:
import xgboost as xgb
params = {'objective': 'rank:ndcg',
          'learning_rate': 0.1,
          'gamma': 1.0, 'min_child_weight': 0.1,
          'max_depth': 6,
          'random_state': 42
         }

BaseLTR_LM = fbr >> pt.ltr.apply_learned_model(xgb.sklearn.XGBRanker(**params), form='ltr')
BaseLTR_LM.fit(train_topics, qrels, valid_topics, qrels)

And evaluate the results.

In [13]:
allresultsLM = pt.Experiment([PL2, BaseLTR_LM],
                                test_topics,
                                qrels, ["map"],
                                names=["PL2 Baseline", "LambdaMART"])
allresultsLM

Unnamed: 0,name,map
0,PL2 Baseline,0.206031
1,LambdaMART,0.208878
