# SearcherR1 in PyTerrier-RAG

This notebook deomnstrates SeacherR1 within the PyTerrier-RAG framework.

## Setup - you'll need these installed...

In [None]:
# %pip install vllm bitsandbytes
# %pip install pyterrier_dr git+https://github.com/terrierteam/pyterrier_rag.git

In [None]:
import pyterrier as pt
import pyterrier_rag

Lets get a BM25 retriever

In [None]:
sparse_index = pt.Artifact.from_hf('pyterrier/ragwiki-terrier')

# queries from the model may have tokens that Terrier doesnt like. We can remove them and put them back later.
bm25 = pt.rewrite.tokenise() >> sparse_index.bm25(include_fields=['docno', 'text', 'title']) >> pt.rewrite.reset()

Java started (triggered by tokenise) and loaded: pyterrier.java, pyterrier.terrier.java [version=5.11 (build: craig.macdonald 2025-01-13 21:29), helper_version=0.0.8]


14:59:14.571 [main] WARN org.terrier.structures.BaseCompressingMetaIndex -- Structure meta reading lookup file directly from disk (SLOW) - try index.meta.index-source=fileinmem in the index properties file. 160.3 MiB of memory would be required.
14:59:14.580 [main] WARN org.terrier.structures.BaseCompressingMetaIndex -- Structure meta reading data file directly from disk (SLOW) - try index.meta.data-source=fileinmem in the index properties file. 8.2 GiB of memory would be required.


In [6]:
bm25.search("chemical reactions")

Unnamed: 0,qid,docid,docno,text,title,rank,score,query
0,1,1027780,1027780,Chemical change Chemical changes occur when a ...,"""Chemical change""",0,29.908465,chemical reactions
1,1,53321,53321,are called reactants or reagents. Chemical rea...,"""Chemical reaction""",1,29.825437,chemical reactions
2,1,860125,860125,Chemical reaction A chemical reaction is a pro...,"""Chemical reaction""",2,29.526942,chemical reactions
3,1,17245956,17245956,Autocatalysis A single chemical reaction is sa...,Autocatalysis,3,29.098607,chemical reactions
4,1,20592029,20592029,other meaning of chemical synthesis is narrow ...,"""Chemical synthesis""",4,29.060437,chemical reactions
...,...,...,...,...,...,...,...,...
995,1,7756419,7756419,developed countries. The use of water for tran...,Water,995,21.984838,chemical reactions
996,1,11794771,11794771,methionine adenosyltransferase. This continuou...,Enzyme,996,21.984838,chemical reactions
997,1,15114947,15114947,Cycloaddition A cycloaddition is a chemical re...,Cycloaddition,997,21.984838,chemical reactions
998,1,11118096,11118096,"serve as an inhibitor. In addition, the sulfon...","""Transition state analog""",998,21.969369,chemical reactions


Now instantiate the model. We have to quantize it a bit to fit onto a single GPU. This should jsut about work on a Collab GPU.

In [7]:
searcherR1 = pyterrier_rag.R1Searcher(bm25, 
                                      verbose=False,
                                      model_kw_args={'tensor_parallel_size':1, 
                                                     'dtype':'bfloat16', # data type for model
                                                     'quantization':"bitsandbytes", # how to quantize the model
                                                     'gpu_memory_utilization':0.6, # how much memory to use on GPU
                                                     'max_model_len':92000}) # limit on input lenght to allow less GPU memory consumption

INFO 04-24 14:59:14 [__init__.py:239] Automatically detected platform cuda.
INFO 04-24 14:59:21 [config.py:689] This model supports multiple tasks: {'embed', 'score', 'reward', 'generate', 'classify'}. Defaulting to 'generate'.
INFO 04-24 14:59:21 [config.py:1901] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 04-24 14:59:23 [core.py:61] Initializing a V1 LLM engine (v0.8.4) with config: model='XXsongLALA/Qwen-2.5-7B-base-RAG-RL', speculative_config=None, tokenizer='XXsongLALA/Qwen-2.5-7B-base-RAG-RL', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=92000, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto',

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


INFO 04-24 14:59:27 [gpu_model_runner.py:1291] Model loading took 5.2277 GiB and 2.902277 seconds
INFO 04-24 14:59:32 [backends.py:416] Using cache directory: /root/.cache/vllm/torch_compile_cache/4afabdf0b2/rank_0_0 for vLLM's torch.compile
INFO 04-24 14:59:32 [backends.py:426] Dynamo bytecode transform time: 5.12 s
INFO 04-24 14:59:33 [backends.py:115] Directly load the compiled graph for shape None from the cache
INFO 04-24 14:59:37 [monitor.py:33] torch.compile takes 5.12 s in total
INFO 04-24 14:59:38 [kv_cache_utils.py:634] GPU KV cache size: 129,984 tokens
INFO 04-24 14:59:38 [kv_cache_utils.py:637] Maximum concurrency for 92,000 tokens per request: 1.41x
INFO 04-24 15:00:02 [gpu_model_runner.py:1626] Graph capturing finished in 24 secs, took 5.60 GiB
INFO 04-24 15:00:02 [core.py:163] init engine (profile, create kv cache, warmup model) took 35.11 seconds
INFO 04-24 15:00:02 [core_client.py:435] Core engine process 0 ready.


In [8]:
searcherR1.search("how would I express a dense retrieval pipeline in PyTerrier?")

Unnamed: 0,qid,query,iteration,all_queries,output,stop_reason_final,qanswer
0,1,how would I express a dense retrieval pipeline...,0,[],To express a dense retrieval pipeline in PyTer...,shot_down,I don't know.


In [9]:
dataset = pt.get_dataset('rag:nq')
from ir_measures import define_byquery
Iterations = define_byquery(lambda qrels, run: run.iloc[0].iteration, name="Iterations")
pt.Experiment(
    [searcherR1],
    dataset.get_topics('dev').head(100), # NB: remove .head(100) to run on all dev topics
    dataset.get_answers('dev'),
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM, Iterations, "mrt"],
    batch_size=1,
    verbose=True,
    names=['SearcherR1(BM25)']
)

pt.Experiment: 100%|██████████| 100/100 [15:51<00:00,  9.51s/batches]


Unnamed: 0,name,F1,EM,Iterations,mrt
0,SearcherR1(BM25),0.336667,0.27,2.13,9510.409416


In [10]:
dataset = pt.get_dataset('rag:nq')
from ir_measures import define_byquery
Iterations = define_byquery(lambda qrels, run: run.iloc[0].iteration, name="Iterations")
pt.Experiment(
    [searcherR1],
    dataset.get_topics('dev').head(100), # NB: remove .head(100) to run on all dev topics
    dataset.get_answers('dev'),
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM, Iterations, "mrt"],
    batch_size=1,
    verbose=True,
    names=['SearcherR1(BM25)']
)

pt.Experiment: 100%|██████████| 100/100 [04:32<00:00,  2.73s/batches]


Unnamed: 0,name,F1,EM,Iterations,mrt
0,SearcherR1(BM25),0.336667,0.27,2.13,2725.103279


# Dense Retrieval

Lets use an E5 model for dense retrieval. We'll download an E5 index from Huggingface.

NB: This index is quite large, so typically wont work on a free Collab instance.

In [26]:
import pyterrier_dr

e5 = pyterrier_dr.E5() >> pt.Artifact.from_hf("pyterrier/ragwiki-e5.flex") >> sparse_index.text_loader(['docno', 'title', 'text'])
e5[1].verbose = False

In [27]:
e5.search("chemical reactions").head(5)

Unnamed: 0,qid,query,query_vec,docid,score,rank,docno,text,title
0,1,chemical reactions,"[-0.024651688, 0.0076081795, -0.041764636, 0.0...",860125,0.871463,0,860125,Chemical reaction A chemical reaction is a pro...,"""Chemical reaction"""
1,1,chemical reactions,"[-0.024651688, 0.0076081795, -0.041764636, 0.0...",12495298,0.861304,1,12495298,redistribution of substances in the human body...,"""Chemical reaction"""
2,1,chemical reactions,"[-0.024651688, 0.0076081795, -0.041764636, 0.0...",14698044,0.859305,2,14698044,a unimolecular reaction is the cis–trans isome...,"""Chemical reaction"""
3,1,chemical reactions,"[-0.024651688, 0.0076081795, -0.041764636, 0.0...",53321,0.858646,3,53321,are called reactants or reagents. Chemical rea...,"""Chemical reaction"""
4,1,chemical reactions,"[-0.024651688, 0.0076081795, -0.041764636, 0.0...",1027780,0.856919,4,1027780,Chemical change Chemical changes occur when a ...,"""Chemical change"""


Now a copy of the SearcherR1 model setup for E5.

In [28]:
import copy
searcherR1_e5 = copy.copy(searcherR1)
searcherR1_e5.retriever = e5

Lets check it works..

Weirdly, we get an answer of "yes", but the model's reasoning (from the output column) was reasonable.

In [29]:
res = searcherR1_e5.search("Which borders in the middle east contentious?")
print(res["qanswer"])
print(res.iloc[0]["output"])

0     yes 
Name: qanswer, dtype: object
To determine which borders in the Middle East are contentious, I need to identify the borders that are currently causing disputes or tensions between countries in the region. 

Step 1: Identify the major borders in the Middle East.
The major borders in the Middle East include:
1. The border between Israel and Palestine
2. The border between Syria and Turkey
3. The border between Iraq and Iran
4. The border between Saudi Arabia and Yemen
5. The border between Egypt and Libya

Step 2: Determine which of these borders are contentious.
I'll use a search query to find out which of these borders are contentious:

<|begin_of_query|> contentious borders in the Middle East<|end_of_query|>

<|begin_of_documents|>
(1)has also been established in academic scholarship between border and frontier, the latter denoting a state of mind rather than state boundaries. Types of borders :- superimposed border,relict border, physical borders,geometric border In the pas

Lets see how it compares to the model using a BM25 retriever.

In [30]:
dataset = pt.get_dataset('rag:nq')
from ir_measures import define_byquery
Iterations = define_byquery(lambda qrels, run: run.iloc[0].iteration, name="Iterations")
pt.Experiment(
    [searcherR1_e5],
    dataset.get_topics('dev').head(100), # NB: remove .head(100) to run on all dev topics
    dataset.get_answers('dev'),
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM, Iterations, "mrt"],
    batch_size=1,
    verbose=True,
    names=['SearcherR1(E5)']
)

pt.Experiment: 100%|██████████| 100/100 [11:23<00:00,  6.84s/batches]


Unnamed: 0,name,F1,EM,Iterations,mrt
0,SearcherR1(E5),0.423571,0.35,1.86,6834.606079


Much higher!