# Pyterrier - Example Experiment

# Preparation

In [1]:
%pip install -q python-terrier

Collecting python-terrier
  Cloning https://github.com/terrier-org/pyterrier.git to /tmp/pip-install-ootrg_ul/python-terrier
  Running command git clone -q https://github.com/terrier-org/pyterrier.git /tmp/pip-install-ootrg_ul/python-terrier
Collecting pyjnius~=1.3.0
[?25l  Downloading https://files.pythonhosted.org/packages/d8/50/098cb5fb76fb7c7d99d403226a2a63dcbfb5c129b71b7d0f5200b05de1f0/pyjnius-1.3.0-cp36-cp36m-manylinux2010_x86_64.whl (1.1MB)
[K     |████████████████████████████████| 1.1MB 2.8MB/s 
Collecting wget
  Downloading https://files.pythonhosted.org/packages/47/6a/62e288da7bcda82b935ff0c6cfe542970f04e29c756b0e147251b2fb251f/wget-3.2.zip
Collecting pytrec_eval
  Downloading https://files.pythonhosted.org/packages/36/0a/5809ba805e62c98f81e19d6007132712945c78e7612c11f61bac76a25ba3/pytrec_eval-0.4.tar.gz
Collecting matchpy
[?25l  Downloading https://files.pythonhosted.org/packages/47/95/d265b944ce391bb2fa9982d7506bbb197bb55c5088ea74448a5ffcaeefab/matchpy-0.5.1-py3-none-any

In [None]:
import pyterrier as pt

we're using the Dataset interface to quickly access a test collection.

In [2]:
vaswani = pt.datasets.get_dataset("vaswani")

# Experiment

Тhe `experiment` function allows you to perform retrieval and evaluation in a declarative fashoin, allowing a simple function call with multiple retrieval systems to be executed using a single function call.

First create the Retriever objects with the configurartion that you wish to use

In [3]:
TF_IDF = pt.terrier.Retriever(vaswani.get_index(), wmodel="TF_IDF")
BM25 = pt.terrier.Retriever(vaswani.get_index(), wmodel="BM25")
PL2 = pt.terrier.Retriever(vaswani.get_index(), wmodel="PL2")

Call `pt.Experiment` with the list of retrieval objects, topics, qrels and list of metrics

Optional arguments:    
 - `perquery=False` - Show the results for each query instad of the mean    
 - `dataframe=True` - Return the result as a dataframe if True, or as a dictionary if False
 - `round=4` - round all measures to 4 decimal places
 

In [4]:
pt.Experiment(
    [TF_IDF,BM25,PL2],
    vaswani.get_topics(),
    vaswani.get_qrels(),
    ['map','ndcg'])

Unnamed: 0,name,map,ndcg
0,BR(TF_IDF),0.290905,0.615367
1,BR(BM25),0.296517,0.621197
2,BR(PL2),0.276264,0.601225


## Declaring a baseline

Use the `baseline=` kwarg to specify which system should be considered the baseline in this experiment. 
This will enable significance testing (paired t-test), as well as showing the number of queries improved and degraded 
wrt. the baseline.



In [5]:
pt.Experiment(
    [TF_IDF,BM25,PL2],
    vaswani.get_topics(),
    vaswani.get_qrels(),
    ['map','ndcg'],
    baseline=0)

Unnamed: 0,name,map,ndcg,map +,map -,map p-value,ndcg +,ndcg -,ndcg p-value
0,BR(TF_IDF),0.290905,0.615367,,,,,,
1,BR(BM25),0.296517,0.621197,46.0,45.0,0.237317,45.0,46.0,0.143493
2,BR(PL2),0.276264,0.601225,13.0,77.0,0.008827,17.0,73.0,0.002583


We can also apply correcion for multiple testing. Use the `correction=` kwarg - 'b' is a short hand for 'bonferroni'. The additional columns
"map p-value corrected" "map reject" are added. The former shows what the new, adjusted, p-value is, and the latter shows if the 
null hypothesis can rejected at $\alpha=0.05$.

In [6]:
pt.Experiment(
    [TF_IDF,BM25,PL2],
    vaswani.get_topics(),
    vaswani.get_qrels(),
    ['map','ndcg'],
    baseline=0,
    correction='b')

Unnamed: 0,name,map,ndcg,map +,map -,map p-value,map reject,map p-value corrected,ndcg +,ndcg -,ndcg p-value,ndcg reject,ndcg p-value corrected
0,BR(TF_IDF),0.290905,0.615367,,,,False,,,,,False,
1,BR(BM25),0.296517,0.621197,46.0,45.0,0.237317,False,0.711951,45.0,46.0,0.143493,False,0.43048
2,BR(PL2),0.276264,0.601225,13.0,77.0,0.008827,True,0.026482,17.0,73.0,0.002583,True,0.00775
