# PyTerrier Notebook for Full-Rank Submissions

This notebook serves as a baseline full-rank submission for [TIRA](https://tira.io)/[TIREx](https://tira.io/tirex) that builds a PyTerrier index and subsequently creates a run with BM25.

### Step 1: Ensure Libraries are Imported

In [1]:
import os

# Detect if we are in the TIRA sandbox
# Install the required dependencies if we are not in the sandbox.
if 'TIRA_DATASET_ID' not in os.environ:
    !pip3 install python-terrier tira ir_datasets
else:
    print('We are in the TIRA sandbox.')

In [3]:
from tira.third_party_integrations import ensure_pyterrier_is_loaded, persist_and_normalize_run

# this loads and starts pyterrier so that it also works in the TIRA
ensure_pyterrier_is_loaded()

# PyTerrier must be imported after the call to ensure_pyterrier_is_loaded in TIRA.
import pyterrier as pt


Due to execution in TIRA, I have patched ir_datasets to always return the single input dataset mounted to the sandbox.


### Step 2: Load the data

In [4]:
data = pt.get_dataset('irds:ir-lab-jena-leipzig-wise-2023/validation-20231104-training')

Load ir_dataset "ir-lab-jena-leipzig-wise-2023/validation-20231104-training" from tira.


In [5]:
print('See the first two queries:')
topics = data.get_topics('title')
print(topics.head(2))

See the first two queries:
No settings given in /root/.tira/.tira-settings.json. I will use defaults.


       qid              query
0  q072224     purchase money
1  q072226  purchase used car


### Step 3: Build the Index

In [6]:
print('Build index:')
iter_indexer = pt.IterDictIndexer("/tmp/index", meta={'docno': 100}, verbose=True)
!rm -Rf /tmp/index
indexref = iter_indexer.index(data.get_corpus_iter())

print('Done. Index is created')

Build index:
No settings given in /root/.tira/.tira-settings.json. I will use defaults.


ir-lab-jena-leipzig-wise-2023/validation-20231104-training documents: 100%|██████████| 61307/61307 [01:55<00:00, 530.37it/s]


Done. Index is created


### Step 4: Create the Retrieval Pipeline

In [7]:
bm25 = pt.BatchRetrieve(indexref, wmodel="BM25", verbose=True)

### Step 5: Create the Run and Persist the Run

In [8]:
print('Create run')
run = bm25(topics)
print('Done, run was created')

Create run


BR(BM25): 100%|██████████| 882/882 [00:44<00:00, 19.92q/s]


Done, run was created


In [9]:
persist_and_normalize_run(run, 'bm25-baseline')

I use the environment variable "TIRA_OUTPUT_DIR" to determine where I should store the run file using "." as default.
Done. run file is stored under "./run.txt".
