# Demonstration Experiments with Precomputation and Caching

This notebook contains example experiments demonstrating the efficiency benefit of using precomputation within pt.Experiment(), and the use of a ScorerCache around an expensive cross-encoder.

This notebook was used to produce the timings reported in ``On Precomputation and Caching in Information Retrieval Experiments with Pipeline Architectures'', published in Second International Workshop on Open Web Search (WOWS 2025).

```bibtex
@{macdonald2025wows,
 author = {Craig Macdonald and Sean MacAvaney},
 title = {On Precomputation and Caching in Information Retrieval Experiments with Pipeline Architectures},
 booktitle = {Proceedings of the Second International Workshop on Open Web Search (WOWS 2025)},
 year = 2025,
}
```

Experiments are conducted on the MSMARCO passage corpus, using the 43 queries of the TREC 2019 Deep Learning track,

In [2]:
%pip install -q python-terrier pyterrier_caching pyterrier_t5

[0mNote: you may need to restart the kernel to use updated packages.


In [3]:
import pyterrier as pt

Load our BM25 retrieval index

In [4]:
index = pt.IndexFactory.of(pt.get_dataset('msmarco_passage').get_index('terrier_stemmed_text'), memory=False)
bm25 = pt.terrier.Retriever(index, metadata=['docno', 'text'], wmodel='BM25', verbose=True)

Java started (triggered by IndexFactory.of) and loaded: pyterrier.java, pyterrier.terrier.java [version=5.11 (build: craig.macdonald 2025-01-13 21:29), helper_version=0.0.8]


10:36:26.090 [main] WARN org.terrier.structures.BaseCompressingMetaIndex -- Structure meta reading data file directly from disk (SLOW) - try index.meta.data-source=fileinmem in the index properties file. 1.9 GiB of memory would be required.


In [5]:
bm25.search("chemical reactions")

TerrierRetr(BM25): 100%|██████████| 1/1 [00:00<00:00,  4.03q/s]


Unnamed: 0,qid,docid,docno,text,rank,score,query
0,1,1725585,1725585,Share. Reaction Definition: A reaction or chem...,0,26.160870,chemical reactions
1,1,397884,397884,the difference between chemical reaction and c...,1,26.030967,chemical reactions
2,1,1074096,1074096,Reaction Definition: A reaction or chemical re...,2,25.820800,chemical reactions
3,1,7012661,7012661,Learn about the different types of chemical re...,3,25.808845,chemical reactions
4,1,1074095,1074095,Share. Reaction Definition: A reaction or chem...,4,25.740887,chemical reactions
...,...,...,...,...,...,...,...
995,1,6069643,6069643,Metabolic pathways in human organism form vast...,995,21.435788,chemical reactions
996,1,7208929,7208929,There are a few telltale signs that can help i...,996,21.435788,chemical reactions
997,1,8777339,8777339,Reactions in which energy is released are exot...,997,21.435788,chemical reactions
998,1,397890,397890,"In a chemical reaction, chemical equilibrium i...",998,21.435341,chemical reactions


In [6]:
from pyterrier_t5 import MonoT5ReRanker, DuoT5ReRanker
monoT5 = MonoT5ReRanker() # loads castorini/monot5-base-msmarco by default
duoT5 = DuoT5ReRanker() # loads castorini/duot5-base-msmarco by default

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


In [7]:
dataset = pt.get_dataset("irds:msmarco-passage/trec-dl-2019/judged")

In [None]:
!rm -rf monoT5.cache

In [8]:
from pyterrier.measures import *
def one(): # no precomputation, no caching
    return pt.Experiment(
        [bm25 % k >> monoT5 % 10 >> duoT5 for k in [20, 50, 100, 200]],
        dataset.get_topics(),
        dataset.get_qrels(),
        [nDCG@10],
        precompute_prefix=False # <---- disable precomputation 
    )

def two(): # precomputation, no caching
    return pt.Experiment(
        [bm25 % k >> monoT5 % 10 >> duoT5 for k in [20, 50, 100, 200]],
        dataset.get_topics(),
        dataset.get_qrels(),
        [nDCG@10],
        precompute_prefix=True # <---- enable precomputation 
    )

def three_and_four(): # precomputation, caching
    from pyterrier_caching import ScorerCache
    cached_scorer = ScorerCache('monoT5.cache', monoT5)
    return pt.Experiment(
        [bm25 % k >> cached_scorer % 10 >> duoT5 for k in [20, 50, 100, 200]],
        dataset.get_topics(),
        dataset.get_qrels(),
        [nDCG@10],
        precompute_prefix=True # <---- enable precomputation 
    )

In [9]:
%time one()

TerrierRetr(BM25): 100%|██████████| 43/43 [00:04<00:00,  8.67q/s]
monoT5: 100%|██████████| 212/212 [00:05<00:00, 38.37batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.93queries/s]
TerrierRetr(BM25): 100%|██████████| 43/43 [00:03<00:00, 11.12q/s]
monoT5: 100%|██████████| 527/527 [00:11<00:00, 46.66batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.95queries/s]
TerrierRetr(BM25): 100%|██████████| 43/43 [00:03<00:00, 11.70q/s]
monoT5: 100%|██████████| 1052/1052 [00:22<00:00, 46.67batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.95queries/s]
TerrierRetr(BM25): 100%|██████████| 43/43 [00:03<00:00, 11.33q/s]
monoT5: 100%|██████████| 2102/2102 [00:44<00:00, 46.77batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.93queries/s]

CPU times: user 3min 8s, sys: 4.48 s, total: 3min 13s
Wall time: 3min 11s





Unnamed: 0,name,nDCG@10
0,(TerrierRetr(BM25) >> RankCutoff(20) >> MonoT5...,0.628905
1,(TerrierRetr(BM25) >> RankCutoff(50) >> MonoT5...,0.677657
2,(TerrierRetr(BM25) >> RankCutoff(100) >> MonoT...,0.701306
3,(TerrierRetr(BM25) >> RankCutoff(200) >> MonoT...,0.71067


In [10]:
%time two()

Precomputing results of 43 topics on shared pipeline component TerrierRetr(BM25)
TerrierRetr(BM25): 100%|██████████| 43/43 [00:03<00:00, 12.33q/s]
monoT5: 100%|██████████| 212/212 [00:04<00:00, 47.16batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.95queries/s]
monoT5: 100%|██████████| 527/527 [00:11<00:00, 47.05batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.95queries/s]
monoT5: 100%|██████████| 1052/1052 [00:22<00:00, 46.40batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.93queries/s]
monoT5: 100%|██████████| 2102/2102 [00:45<00:00, 46.52batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.94queries/s]

CPU times: user 2min 54s, sys: 1.96 s, total: 2min 56s
Wall time: 2min 55s





Unnamed: 0,name,nDCG@10
0,(TerrierRetr(BM25) >> RankCutoff(20) >> MonoT5...,0.628905
1,(TerrierRetr(BM25) >> RankCutoff(50) >> MonoT5...,0.677657
2,(TerrierRetr(BM25) >> RankCutoff(100) >> MonoT...,0.701306
3,(TerrierRetr(BM25) >> RankCutoff(200) >> MonoT...,0.71067


In [11]:
%time three_and_four()

Precomputing results of 43 topics on shared pipeline component TerrierRetr(BM25)
TerrierRetr(BM25): 100%|██████████| 43/43 [00:03<00:00, 11.69q/s]
monoT5: 100%|██████████| 212/212 [00:04<00:00, 47.23batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.92queries/s]
monoT5: 100%|██████████| 315/315 [00:06<00:00, 46.65batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.93queries/s]
monoT5: 100%|██████████| 525/525 [00:11<00:00, 46.97batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.94queries/s]
monoT5: 100%|██████████| 1050/1050 [00:22<00:00, 46.53batches/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.92queries/s]

CPU times: user 2min 16s, sys: 1.99 s, total: 2min 18s
Wall time: 2min 19s





Unnamed: 0,name,nDCG@10
0,(TerrierRetr(BM25) >> RankCutoff(20) >> Sqlite...,0.630382
1,(TerrierRetr(BM25) >> RankCutoff(50) >> Sqlite...,0.677996
2,(TerrierRetr(BM25) >> RankCutoff(100) >> Sqlit...,0.701042
3,(TerrierRetr(BM25) >> RankCutoff(200) >> Sqlit...,0.710616


In [12]:
%time three_and_four()

Precomputing results of 43 topics on shared pipeline component TerrierRetr(BM25)
TerrierRetr(BM25): 100%|██████████| 43/43 [00:03<00:00, 11.59q/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.90queries/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.88queries/s]
duoT5: 100%|██████████| 43/43 [00:22<00:00,  1.90queries/s]
duoT5: 100%|██████████| 43/43 [00:23<00:00,  1.81queries/s]

CPU times: user 1min 34s, sys: 2.09 s, total: 1min 36s
Wall time: 1min 36s





Unnamed: 0,name,nDCG@10
0,(TerrierRetr(BM25) >> RankCutoff(20) >> Sqlite...,0.630382
1,(TerrierRetr(BM25) >> RankCutoff(50) >> Sqlite...,0.677996
2,(TerrierRetr(BM25) >> RankCutoff(100) >> Sqlit...,0.701042
3,(TerrierRetr(BM25) >> RankCutoff(200) >> Sqlit...,0.710616


NB: For `three_and_four()`, there are very minor changes in effectiveness, at the 5th decimal place, compared to `one()` and `two()`. This is due to changes in GPU batching when scoring MonoT5. The scores should usually be the same when the order of the data onto the GPU is the same, but differences can be observed (e.g.) a different batch due to some of the values being cached from a prior round.