# R1-Searcher in PyTerrier-RAG

This notebook demonstrates R1-Searcher within the PyTerrier-RAG framework.

## Setup - you'll need these installed...

In [1]:
%pip install vllm bitsandbytes
#%pip install pyterrier_dr pyterrier-rag

[0mNote: you may need to restart the kernel to use updated packages.


In [2]:
import pyterrier as pt
import pyterrier_rag

Lets get a BM25 retriever

In [3]:
sparse_index = pt.Artifact.from_hf('pyterrier/ragwiki-terrier')

# queries from the model may have tokens that Terrier doesnt like. We can remove them and put them back later.
bm25 = pt.rewrite.tokenise() >> sparse_index.bm25(include_fields=['docno', 'text', 'title']) >> pt.rewrite.reset()

Java started (triggered by tokenise) and loaded: pyterrier.java, pyterrier.terrier.java [version=5.11 (build: craig.macdonald 2025-01-13 21:29), helper_version=0.0.8]


In [4]:
bm25.search("chemical reactions")

Unnamed: 0,qid,docid,docno,text,title,rank,score,query
0,1,1027780,1027780,Chemical change Chemical changes occur when a ...,"""Chemical change""",0,29.908465,chemical reactions
1,1,53321,53321,are called reactants or reagents. Chemical rea...,"""Chemical reaction""",1,29.825437,chemical reactions
2,1,860125,860125,Chemical reaction A chemical reaction is a pro...,"""Chemical reaction""",2,29.526942,chemical reactions
3,1,17245956,17245956,Autocatalysis A single chemical reaction is sa...,Autocatalysis,3,29.098607,chemical reactions
4,1,20592029,20592029,other meaning of chemical synthesis is narrow ...,"""Chemical synthesis""",4,29.060437,chemical reactions
...,...,...,...,...,...,...,...,...
995,1,7756419,7756419,developed countries. The use of water for tran...,Water,995,21.984838,chemical reactions
996,1,11794771,11794771,methionine adenosyltransferase. This continuou...,Enzyme,996,21.984838,chemical reactions
997,1,15114947,15114947,Cycloaddition A cycloaddition is a chemical re...,Cycloaddition,997,21.984838,chemical reactions
998,1,11118096,11118096,"serve as an inhibitor. In addition, the sulfon...","""Transition state analog""",998,21.969369,chemical reactions


Now instantiate the model. We have to quantize it a bit to fit onto a single GPU. This should jsut about work on a Collab GPU.

In [5]:
searcherR1 = pyterrier_rag.R1Searcher(bm25, 
                                      verbose=False,
                                      model_kw_args={'tensor_parallel_size':1, 
                                                     'dtype':'bfloat16', # data type for model
                                                     'quantization':"bitsandbytes", # how to quantize the model
                                                     'gpu_memory_utilization':0.6, # how much memory to use on GPU
                                                     'max_model_len':92000}) # limit on input lenght to allow less GPU memory consumption

INFO 07-14 20:23:14 [__init__.py:239] Automatically detected platform cuda.
INFO 07-14 20:23:27 [config.py:689] This model supports multiple tasks: {'score', 'classify', 'embed', 'reward', 'generate'}. Defaulting to 'generate'.
INFO 07-14 20:23:27 [config.py:1901] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 07-14 20:23:29 [core.py:61] Initializing a V1 LLM engine (v0.8.4) with config: model='XXsongLALA/Qwen-2.5-7B-base-RAG-RL', speculative_config=None, tokenizer='XXsongLALA/Qwen-2.5-7B-base-RAG-RL', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=92000, download_dir=None, load_format=LoadFormat.BITSANDBYTES, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=bitsandbytes, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto',

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


INFO 07-14 20:23:34 [gpu_model_runner.py:1291] Model loading took 5.2277 GiB and 2.489406 seconds
INFO 07-14 20:23:41 [backends.py:416] Using cache directory: /root/.cache/vllm/torch_compile_cache/2d8725ec30/rank_0_0 for vLLM's torch.compile
INFO 07-14 20:23:41 [backends.py:426] Dynamo bytecode transform time: 6.78 s
INFO 07-14 20:23:42 [backends.py:115] Directly load the compiled graph for shape None from the cache
INFO 07-14 20:23:46 [monitor.py:33] torch.compile takes 6.78 s in total
INFO 07-14 20:23:47 [kv_cache_utils.py:634] GPU KV cache size: 400,912 tokens
INFO 07-14 20:23:47 [kv_cache_utils.py:637] Maximum concurrency for 92,000 tokens per request: 4.36x
INFO 07-14 20:24:14 [gpu_model_runner.py:1626] Graph capturing finished in 27 secs, took 5.51 GiB
INFO 07-14 20:24:14 [core.py:163] init engine (profile, create kv cache, warmup model) took 40.56 seconds
INFO 07-14 20:24:14 [core_client.py:435] Core engine process 0 ready.


In [6]:
results = searcherR1.search("what city is directly west of Venice?")
results

Unnamed: 0,qid,query,iteration,all_queries,qanswer,stop_reason_final,output
0,1,what city is directly west of Venice?,1,"[(0, Venice, Italy, west of which city )]",Padua,finished,To find out what city is directly west of Veni...


In [7]:
results.qanswer

0     Padua 
Name: qanswer, dtype: object

In [11]:
print(results.output.iloc[0])

To find out what city is directly west of Venice, I need to follow these steps:

1. Identify the location of Venice.
2. Determine which city is directly west of that location.

**Step 1: Identify the location of Venice.**

Venice is a city located in northeastern Italy, on a group of islands in the Adriatic Sea.

**Step 2: Determine which city is directly west of that location.**

I'll use a map or geographical information to find out which city is directly west of Venice. 

<|begin_of_query|> Venice, Italy, west of which city<|end_of_query|>

<|begin_of_documents|>
(1)the city center of Treviso. Daily coach service operated by DRD to Ljubljana (Slovenia) through the Venice Marco Polo Airport (Venice) and the Friuli Venezia Giulia Airport (Trieste) A public coach vice versa service from Treviso to Padua is operated by SITA. Treviso Airport Treviso Airport, , is an international airport located west-southwest of Treviso and approximately away from the city of Venice, Italy. It is mainly

In [9]:
dataset = pt.get_dataset('rag:nq')
from ir_measures import define_byquery
Iterations = define_byquery(lambda qrels, run: run.iloc[0].iteration, name="Iterations")
pt.Experiment(
    [searcherR1],
    dataset.get_topics('dev').head(100), # NB: remove .head(100) to run on all dev topics
    dataset.get_answers('dev'),
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM, Iterations, "mrt"],
    batch_size=1,
    verbose=True,
    names=['SearcherR1(BM25)']
)

pt.Experiment: 100%|██████████| 100/100 [07:17<00:00,  4.37s/batches]


Unnamed: 0,name,F1,EM,Iterations,mrt
0,SearcherR1(BM25),0.309667,0.23,2.11,4372.33446


# Dense Retrieval

Lets use an E5 model for dense retrieval. We'll download an E5 index from Huggingface.

NB: This index is quite large, so typically wont work on a free Collab instance.

In [None]:
import pyterrier_dr

e5 = pyterrier_dr.E5() >> pt.Artifact.from_hf("pyterrier/ragwiki-e5.flex") >> sparse_index.text_loader(['docno', 'title', 'text'])

In [None]:
e5.search("chemical reactions").head(5)

Now a copy of the SearcherR1 model setup for E5.

In [None]:
import copy
searcherR1_e5 = copy.copy(searcherR1)
searcherR1_e5.retriever = e5

Lets check it works..

Weirdly, we get an answer of "yes", but the model's reasoning (from the output column) was reasonable.

In [None]:
res = searcherR1_e5.search("Which borders in the middle east contentious?")
print(res["qanswer"])
print(res.iloc[0]["output"])

Lets see how it compares to the model using a BM25 retriever.

In [None]:
dataset = pt.get_dataset('rag:nq')
from ir_measures import define_byquery
Iterations = define_byquery(lambda qrels, run: run.iloc[0].iteration, name="Iterations")
pt.Experiment(
    [searcherR1_e5],
    dataset.get_topics('dev').head(100), # NB: remove .head(100) to run on all dev topics
    dataset.get_answers('dev'),
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM, Iterations, "mrt"],
    batch_size=1,
    verbose=True,
    names=['SearcherR1(E5)']
)

Much higher!