# Tutorial with Full-Rank Retrievers

This notebook shows how post-hoc experiments of the IR Experiment Platform can be conducted.

To start the notebook, please clone the archived shared task repository:

```
git@github.com:tira-io/ir-experiment-platform-benchmarks.git
```

Inside the cloned repository, you can start the Jupyter notebook which automatically installs a minimal virtual environment using:
```
make jupyterlab
```

The notebook covers how to run full-rank appraoches submitted to TIRA in reproducibility/replicability experiments on the same or new data.

For each of the softwares submitted to TIRA, the `tira` integration to PyTerrier loads the Docker Image submitted to TIRA to execute it in PyTerrier pipelines (i.e., a first execution could take sligthly longer).


## Import Dependencies

In [3]:
import pandas as pd
pd.set_option('display.max_colwidth', 0)

from tira.local_client import Client
tira = Client()

import pyterrier as pt
if not pt.started():
    pt.init()


### Initialize A Full-Rank Retriever

We create a pyterrier retriever called `submitted_baseline` that is an approach submitted to a shared task in TIRA.
The approach is identified by the name `ir-benchmarks/tira-ir-starter/BM25 (tira-ir-starter-pyterrier)`, i.e., a software `BM25 (tira-ir-starter-pyterrier)` submitted to `ir-benchmarks` by the team `tira-ir-starter` (that hosts baselines).
This software consists of two stages: First, a first software component builds an PyTerrier Index, and the second software does the actual retrieval with BM25.

With this API, any full-rank approach submitted in TIRA can be executed and re-executed, e.g., on new data.

We can run the retriever on any dataset integrated in `ir_dataset`.
Here, we use `vaswani` to show the overall functionality with a fast example.

In [4]:
submitted_baseline = tira.pt.retriever(
    'ir-benchmarks/tira-ir-starter/BM25 (tira-ir-starter-pyterrier)',
    dataset='vaswani',
)


Next, we can make the actual retrieval, here on two topics to keep the result set size small.

In [5]:
topics = pd.DataFrame([
    {'qid': 1, 'query': 'MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES'},
    {'qid': 2, 'query': 'MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS'},
])

submitted_baseline(topics)

Unnamed: 0,qid,query,q0,docno,rank,score,system
0,1,MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES,Q0,8172,1,24.566031,pyterrier.default_pipelines.wmodel_batch_retrieve
1,1,MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES,Q0,9881,2,22.110514,pyterrier.default_pipelines.wmodel_batch_retrieve
2,1,MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES,Q0,5502,3,21.717148,pyterrier.default_pipelines.wmodel_batch_retrieve
3,1,MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES,Q0,1502,4,19.478355,pyterrier.default_pipelines.wmodel_batch_retrieve
4,1,MEASUREMENT OF DIELECTRIC CONSTANT OF LIQUIDS BY THE USE OF MICROWAVE TECHNIQUES,Q0,9859,5,18.626342,pyterrier.default_pipelines.wmodel_batch_retrieve
...,...,...,...,...,...,...,...
1995,2,MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS,Q0,4833,996,5.161525,pyterrier.default_pipelines.wmodel_batch_retrieve
1996,2,MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS,Q0,3529,997,5.161525,pyterrier.default_pipelines.wmodel_batch_retrieve
1997,2,MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS,Q0,271,998,5.161525,pyterrier.default_pipelines.wmodel_batch_retrieve
1998,2,MATHEMATICAL ANALYSIS AND DESIGN DETAILS OF WAVEGUIDE FED MICROWAVE RADIATIONS,Q0,2429,999,5.161525,pyterrier.default_pipelines.wmodel_batch_retrieve
