# Small Retrieval Baseline with PyTerrier

This is a simple submission of a retrieval approach that uses a prepared PyTerrier index to create and output an BM25 ranking.

### Step 1: Import All Libraries


In [1]:
from tira.third_party_integrations import ensure_pyterrier_is_loaded, persist_and_normalize_run
from tira.rest_api_client import Client

# This method ensures that that PyTerrier is loaded so that it also works in the TIRA sandbox
ensure_pyterrier_is_loaded()
import pyterrier as pt

tira = Client()

PyTerrier 0.10.0 has loaded Terrier 5.8 (built by craigm on 2023-11-01 18:05) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


### Step 2: Do some Analysis

In [2]:
dataset_id = 'longeval-tiny-train-20240315-training'
data = pt.get_dataset(f'irds:ir-lab-padua-2024/{dataset_id}')

### Step 3: Load the Index

In [4]:
index = tira.pt.index('ir-benchmarks/tira-ir-starter/Index (tira-ir-starter-pyterrier)', 'longeval-tiny-train-20240315-training')

### Step 4: Create the Retrieval Pipeline


In [5]:
bm25 = pt.BatchRetrieve(index, wmodel="BM25", verbose=True)

### Step 5: Create the Run and Persist the Run


In [6]:
print('Create run')
run = bm25(data.get_topics("title"))
print('Done, run was created')


Create run


Download: 376kiB [00:00, 2.97MiB/s]


Download finished. Extract...
Extraction finished:  /root/.tira/extracted_datasets/ir-lab-padua-2024/longeval-tiny-train-20240315-training/


BR(BM25): 100%|██████████| 672/672 [00:06<00:00, 98.97q/s] 


Done, run was created


In [7]:
persist_and_normalize_run(run, 'bm25-default_weights')

I use the environment variable "TIRA_OUTPUT_DIR" to determine where I should store the run file using "." as default.
Done. run file is stored under "./run.txt".
