# Search-R1

This is a model trained to generated queries as part of its inference process. The original implementation is from infer.py in https://github.com/PeterGriffinJin/Search-R1 by the original authors.


In [None]:
%pip install -q python-terrier accelerate pyterrier_t5
%pip install -q git+https://github.com/terrierteam/pyterrier_rag.git

In [2]:
import pyterrier as pt
import pyterrier_rag

## Retrieval Setup

Lets get a BM25 retriever. This (Terrier) retriever also has the 'text', 'title' metadata for passages.

In [3]:
sparse_index = pt.Artifact.from_hf('pyterrier/ragwiki-terrier')

# queries from R1 may have tokens that Terrier doesnt like. We can remove them and put them back later.
bm25 = pt.rewrite.tokenise() >> sparse_index.bm25(include_fields=['docno', 'text', 'title']) >> pt.rewrite.reset()


Java started (triggered by tokenise) and loaded: pyterrier.java, pyterrier.terrier.java [version=5.11 (build: craig.macdonald 2025-01-13 21:29), helper_version=0.0.8]


12:47:39.879 [main] WARN org.terrier.structures.BaseCompressingMetaIndex -- Structure meta reading lookup file directly from disk (SLOW) - try index.meta.index-source=fileinmem in the index properties file. 160.3 MiB of memory would be required.
12:47:39.894 [main] WARN org.terrier.structures.BaseCompressingMetaIndex -- Structure meta reading data file directly from disk (SLOW) - try index.meta.data-source=fileinmem in the index properties file. 8.2 GiB of memory would be required.


## Search-R1 model

We invoke SearchR1 using our BM25 retrieval pipeline. By default, SearchR1 takes only 3 passages from the specified retriever.

In [4]:
r1_bm25 = pyterrier_rag.SearchR1(bm25)

Loading checkpoint shards:   0%|          | 0/7 [00:00<?, ?it/s]

Lets try it out. We get back a dataframe with one row, which has the generated answer in the qanswer column.

In [5]:
res = r1_bm25.search("what are chemical reactions?")
res

Unnamed: 0,qid,query,qanswer,output,iteration,all_queries
0,1,what are chemical reactions?,chemical transformation of one set of chemical...,<|im_start|>system\nYou are a helpful assistan...,1,"[(0, what are chemical reactions )]"


We can see the answer in the qanswer column

In [6]:
res.iloc[0].qanswer

'chemical transformation of one set of chemical substances to another'

The all_queries column shows the queries that were passed to the search engine

In [7]:
res.iloc[0].all_queries

[(0, ' what are chemical reactions ')]

Finally, we can also see the full output of the model, including its reasoning, its generated query, the retrieved documents, and the generated final answer...

In [8]:
print(res.iloc[0].output)

<|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Answer the given question. You must conduct reasoning inside <think> and </think> first every time you get new information. After reasoning, if you find you lack some knowledge, you can call a search engine by <search> query </search> and it will return the top searched results between <information> and </information>. You can search as many times as your want. If you find no further external knowledge needed, you can directly provide the answer inside <answer> and </answer>, without detailed illustrations. For example, <answer> Beijing </answer>. Question: what are chemical reactions?
<|im_end|>
<|im_start|>assistant


<think>I need to find out what chemical reactions are. I'll search for it.</think>

<search> what are chemical reactions </search>

<information>Doc 1(Title: "Chemical change") Chemical change Chemical changes occur when a substance combines with another to form a new substance, called chemical s

## Improving the Retriever

As SearchR1 takes only the top 3 passages, the precision is very important. Lets rerank the top 20 passages using the MonoT5 cross-encoder.

In [9]:
from pyterrier_t5 import MonoT5ReRanker
monoT5 = MonoT5ReRanker()
r1_monoT5 = r1_bm25.clone_for_retriever(bm25 % 20 >> monoT5)
r1_monoT5.search("what are chemical reactions?").iloc[0]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
monoT5:   0%|          | 0/5 [00:00<?, ?batches/s]Passing a tuple of `past_key_values` is deprecated and will be removed in Transformers v4.48.0. You should pass an instance of `EncoderDecoderCache` instead, e.g. `past_key_values=EncoderDecoderCache.from_

qid                                                            1
query                               what are chemical reactions?
qanswer        chemical transformation of one set of chemical...
output         <|im_start|>system\nYou are a helpful assistan...
iteration                                                      1
all_queries                 [(0,  what are chemical reactions )]
Name: 0, dtype: object

So lets see if using monoT5 quantitively improved the results...

## Evaluation

Now lets run a quick experiment using Natural Questions, comparing our two R1 invocations. I'm also going to add a custom measure to see how many (search/thought) iterations were used by the two settings.

In [24]:
dataset = pt.get_dataset('rag:nq')
from ir_measures import define_byquery
Iterations = define_byquery(lambda qrels, run: run.iloc[0].iteration, name="Iterations")
pt.Experiment(
    [r1_bm25, r1_monoT5],
    dataset.get_topics('dev').head(100), # NB: remove .head(100) to run on all dev topics
    dataset.get_answers('dev'),
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM, Iterations],
    batch_size=25,
    verbose=True,
    names=['R1(BM25)', 'R1(monoT5)']
)

pt.Experiment: 100%|██████████| 8/8 [19:32<00:00, 146.58s/batches]


Unnamed: 0,name,F1,EM,Iterations
0,R1(BM25),0.433,0.34,2.48
1,R1(monoT5),0.531571,0.44,2.2


So here, using monoT5 over BM25 improves the answer quality (both F1 and EM), and reduced the number of iterations...

## What about Dense Retrieval?

Don't fear, there is a dense index for wiki available.... Instructions coming soon.