# ColBERT PRF Demo - TREC Deep Learning Track 2019 & 2020 on MSMARCO Passage


This notebook demonstrates the application of ColBERT PRF, a pseudo-relevance feedback approach for ColBERT dense retrieval.

**Citation**
Xiao Wang, Craig Macdonald, Nicola Tonellotto, Iadh Ounis. Pseudo-Relevance Feedback for Multiple Representation Dense Retrieval. In Proceedings of ICTIR 2021. https://arxiv.org/abs/2106.11251


**Pre-requisites**
This requires PyTerrier, pyterrier_colbert and FAISS-GPU to be installed and working.

```python
#conda install faiss-gpu==0.6.3
!pip install python-terrier
!pip install git+https://github.com/cmacdonald/pyterrier_colbert.git
```

See the [pyterrier_colbert README](https://github.com/terrierteam/pyterrier_colbert/blob/main/README.md) for more information on pre-requisites.

## PyTerrier Setup

Ensure that FAISS-GPU is installed and working, and setup PyTerrier.

In [1]:
import faiss
assert faiss.get_num_gpus() > 0

In [2]:
import pyterrier as pt
pt.init()

dataset = pt.get_dataset("trec-deep-learning-passages")
checkpoint="http://www.dcs.gla.ac.uk/~craigm/ecir2021-tutorial/colbert_model_checkpoint.zip"

PyTerrier 0.6.0 has loaded Terrier 5.5 (built by craigmacdonald on 2021-05-20 13:12)


## PyTerrier COLBERT setup

This notebook assumes that you already have an index for the MSMARCO passage ranking corpus. If not, it can be created as follows:
```python
from pyterrier_colbert.indexing import ColBERTIndexer
indexer = ColBERTIndexer(checkpoint, "/path/to/index", "index_name", ids=True)
indexer.index(dataset.get_corpus_iter())
```

We use a ColBERT checkpoint trained by the University of Glasgow on the MSMARCO passage ranking triples file for 44k batches. Its available online, and our version of ColBERT can download that automatically. Loading the ColBERT model will take a minute or so. You can ignore warnings about training on down-stream task and Torch versions.

In [4]:
from pyterrier_colbert.ranking import ColBERTFactory
#update this to the location of your ColBERT index for MSMARCO passage ranking.
index=("/nfs/indices/colbert_passage","index_name3")

#our GPU didnt have enough memory for the FAISS index and ColBERT PRF, so we set gpu=False here
pytcolbert = ColBERTFactory(checkpoint, *index, gpu=False)

  warn("Gpu disabled, YMMV")
Some weights of the model checkpoint at bert-base-uncased were not used when initializing ColBERT: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing ColBERT from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).
- This IS NOT expected if you are initializing ColBERT from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ColBERT were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['linear.weig

[Jun 22, 14:18:53] #> Loading model checkpoint.
[Jun 22, 14:18:53] #> Loading checkpoint http://www.dcs.gla.ac.uk/~craigm/ecir2021-tutorial/colbert_model_checkpoint.zip




[Jun 22, 14:19:04] #> checkpoint['epoch'] = 0
[Jun 22, 14:19:04] #> checkpoint['batch'] = 44500


Next, we instantiate the standard dense retrieval pipeline for ColBERT. The first time this is instantiated, it will take some time, as the FAISS index and the embeddings index is loaded into memory (5-7 minutes). 

In [5]:
dense_e2e = pytcolbert.end_to_end()

[Jun 22, 14:19:05] #> Loading the FAISS index from /nfs/indices/colbert_passage/index_name3/ivfpq.faiss ..
[Jun 22, 14:19:40] #> Building the emb2pid mapping..
[Jun 22, 14:20:13] len(self.emb2pid) = 687989391
Loading reranking index, memtype=mem


Loading index shards to memory: 100%|██████████| 24/24 [07:11<00:00, 17.98s/shard]


Lets have a look at the output from dense_e2e for a given query - you can see that the docnos are ranked by descending score. The columns include the query tokens and their embedded representations.

In [6]:
dense_e2e.search("chemical reactions").head(5)

Unnamed: 0,qid,query,docid,query_toks,query_embs,score,docno,rank
1590,1,chemical reactions,5453527,"[tensor(101), tensor(1), tensor(5072), tensor(...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...",28.381031,5453527,0
787,1,chemical reactions,7605154,"[tensor(101), tensor(1), tensor(5072), tensor(...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...",27.98658,7605154,1
1786,1,chemical reactions,1833253,"[tensor(101), tensor(1), tensor(5072), tensor(...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...",27.973652,1833253,2
1785,1,chemical reactions,1833252,"[tensor(101), tensor(1), tensor(5072), tensor(...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...",27.823465,1833252,3
1692,1,chemical reactions,7575552,"[tensor(101), tensor(1), tensor(5072), tensor(...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...",27.791912,7575552,4


## ColBERT PRF

Now we instantiate the ColBERT PRF pipelines - one that reranks the initial set, and one that performs a new FAISS retrieval. There is a small delay the first time these pipelines are instantiated, as the collection and document frequencies are calculated by scanning the embeddings index (~3 minutes).

In [7]:
prf_rank = pytcolbert.prf(rerank=False)
prf_rerank = pytcolbert.prf(rerank=True)

[Jun 22, 14:27:38] #> Building the emb2tid mapping..
687989391
Computing collection frequencies
Done
Loading doclens


  0%|          | 4504/8841823 [00:00<03:16, 45028.23d/s]

Computing document frequencies


100%|██████████| 8841823/8841823 [03:11<00:00, 46080.63d/s]
100%|██████████| 30522/30522 [00:00<00:00, 292554.98it/s]
  0%|          | 0/30522 [00:00<?, ?it/s]

Done


100%|██████████| 30522/30522 [00:00<00:00, 294430.20it/s]


A ColBERT PRF pipeline can be searched too...

In [8]:
prf_rank.search("chemical reactions").head(4)

Unnamed: 0,qid,query,docid,query_toks,query_embs,query_weights,score,docno,rank
673,1,chemical reactions,7605154,"[##´, vinegar, baking, reactions, substances, ...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...","[tensor(1.), tensor(1.), tensor(1.), tensor(1....",64.681244,7605154,0
3050,1,chemical reactions,2765749,"[##´, vinegar, baking, reactions, substances, ...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...","[tensor(1.), tensor(1.), tensor(1.), tensor(1....",60.661751,2765749,1
3256,1,chemical reactions,5453527,"[##´, vinegar, baking, reactions, substances, ...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...","[tensor(1.), tensor(1.), tensor(1.), tensor(1....",59.867943,5453527,2
7359,1,chemical reactions,1833253,"[##´, vinegar, baking, reactions, substances, ...","[[tensor(0.0681), tensor(-0.0083), tensor(0.11...","[tensor(1.), tensor(1.), tensor(1.), tensor(1....",59.789986,1833253,3


In the output, you can see that the "query_weights" column is present, showing the weight of the expansion embeddings (1.0 in this case, the same as the original query embeddings.)

The parameters of ColBERT PRF can be varied by using the kwargs of the `pytcolbert.prf()` method. These are as follows (extracted from the documentation):
 - `rerank`(bool): Whether to rerank the initial documents, or to perform a new set retrieve to gather new documents.
 - `fb_docs`(int): Number of passages to use as feedback. Defaults to 3. 
 - `k`(int): Number of clusters to apply on the embeddings of the top K documents. Defaults to 24.
 - `fb_embs`(int): Number of expansion embeddings to add to the query. Defaults to 10.
 - `beta`(float): Weight of the new embeddings compared to the original emebddings. Defaults to 1.0.
 
For example, a pipe with different parameter configuration can be instantiated as follows:
```python
prf_pipe2 = pytcolbert.prf(rerank=False, fb_docs=2, k=30, fb_embs=10, beta=0.5)
```

## Evaluation on TREC 2019

We now compare standard ColBERT dense retrieval with the ranking and reranking pipelines of ColBERT PRF using the TREC 2019 Deep Learning track topics & qrels. We use the same evaluation measures as the [TREC 2019 Deep Learning track Overview paper](https://arxiv.org/abs/2003.07820).

Mean response times are higher than those in the paper due to non-use of GPUs.

In [9]:
from pyterrier.measures import *
pt.Experiment(
    [
        dense_e2e,
        prf_rank,
        prf_rerank
    ],
    dataset.get_topics('test-2019'),
    dataset.get_qrels('test-2019'),
    eval_metrics=[ AP(rel=2)@1000, nDCG@10, RR(rel=2)@10, "mrt"],
    batch_size=10,
    drop_unused=True,
    names = ["ColBERT E2E","ColBERT-PRF Ranker beta=1","ColBERT-PRF ReRanker beta=1"]
)

Unnamed: 0,name,AP(rel=2)@1000,nDCG@10,RR(rel=2)@10,mrt
0,ColBERT E2E,0.430957,0.693407,0.852713,2348.040845
1,ColBERT-PRF Ranker beta=1,0.543034,0.735153,0.885659,7382.965954
2,ColBERT-PRF ReRanker beta=1,0.503836,0.736944,0.885659,5962.463029


## Evaluation on TREC 2020 

Further, the same experiment can be performed on the TREC 2020 topics and qrels:

In [10]:
pt.Experiment(
    [
        dense_e2e,
        prf_rank,
        prf_rerank
    ],
    dataset.get_topics('test-2020'),
    dataset.get_qrels('test-2020'),
    eval_metrics=[ AP(rel=2)@1000, nDCG@10, RR(rel=2)@10, "mrt"],
    drop_unused=True,
    batch_size=10,
    names = ["ColBERT E2E","ColBERT-PRF Ranker beta=1","ColBERT-PRF ReRanker beta=1"]
)

Unnamed: 0,name,AP(rel=2)@1000,nDCG@10,RR(rel=2)@10,mrt
0,ColBERT E2E,0.464902,0.687093,0.85,2341.704021
1,ColBERT-PRF Ranker beta=1,0.496238,0.699296,0.837654,7302.986394
2,ColBERT-PRF ReRanker beta=1,0.491899,0.70062,0.837654,6025.785316
