# InB2 Improvement

## Step 1: Import libraries

 No additional libraries apart from the baselines.
 Ensure pyterrier is loaded, confirm the pt is initialized.
 Persist and normalize run, further writing of the result in out txt.

In [1]:
from tira.third_party_integrations import ensure_pyterrier_is_loaded, persist_and_normalize_run
from tira.rest_api_client import Client
import pyterrier as pt

In [2]:
# Ensure PyTerrier is loaded
ensure_pyterrier_is_loaded()

# Initialize the TIRA client
tira = Client()

PyTerrier 0.10.0 has loaded Terrier 5.8 (built by craigm on 2023-11-01 18:05) and terrier-helper 0.0.8

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


# Step 2: Load the data and create the index 
Index already built in PT format from TIRA)
We will be using the IR and ACL anthology

In [3]:
# Load the IR Anthology and ACL Anthology dataset
pt_dataset = pt.get_dataset('irds:ir-lab-sose-2024/ir-acl-anthology-20240504-training')

# Load the pre-built PyTerrier index from TIRA
index = tira.pt.index('ir-lab-sose-2024/tira-ir-starter/Index (tira-ir-starter-pyterrier)', pt_dataset)

# Step 3: Define the Retrieval Pipeline

We are using the retrieval model InB2
[Class details](http://terrier.org/docs/v4.1/javadoc/org/terrier/matching/models/InB2.html)

In [4]:
# Define the InB2 retrieval pipeline
inb2 = pt.BatchRetrieve(index, wmodel="InB2")

In [5]:
# Preview the first three topics
print('First, we have a short look at the first three topics:')
print(pt_dataset.get_topics('text').head(3))

First, we have a short look at the first three topics:
  qid                                     query
0   1  retrieval system improving effectiveness
1   2  machine learning language identification
2   3             social media detect self harm


# Step 4: Create the run

In [6]:
# Perform the retrieval using InB2
print('Now we do the retrieval with InB2...')
run_inb2 = inb2(pt_dataset.get_topics('text'))

Now we do the retrieval with InB2...


In [7]:
# Display the first 10 entries of the InB2 run
print('Done. Here are the first 10 entries of the InB2 run:')
print(run_inb2.head(10))

Done. Here are the first 10 entries of the InB2 run:
  qid   docid                               docno  rank      score  \
0   1   94858        2004.cikm_conference-2004.47     0  14.131396   
1   1  125137   1989.ipm_journal-ir0volumeA25A4.2     1  13.910089   
2   1   94415       2008.cikm_conference-2008.183     2  12.828211   
3   1   82490     1998.sigirconf_conference-98.33     3  12.761525   
4   1  125817  2005.ipm_journal-ir0volumeA41A5.11     4  12.722657   
5   1  125153   2008.ipm_journal-ir0volumeA44A3.9     5  12.663065   
6   1   82472     1998.sigirconf_conference-98.15     6  12.555249   
7   1   84876       2016.ntcir_conference-2016.90     7  12.546686   
8   1  111300        2005.trec_conference-2005.26     8  12.387169   
9   1  124801   2006.ipm_journal-ir0volumeA42A3.2     9  12.375590   

                                      query  
0  retrieval system improving effectiveness  
1  retrieval system improving effectiveness  
2  retrieval system improving effectiv

# Step 5: Persist the run file for subsequent evaluations

In [8]:
#Now we add the result to our run.txt
persist_and_normalize_run(run_inb2, system_name='In_B2', default_output='../runs')

The run file is normalized outside the TIRA sandbox, I will store it at "../runs".
Done. run file is stored under "../runs/run.txt".
