# PyTerrier CIKM 2021 Tutorial Notebook - Part 4.1 - DeepImpact

This notebook provides experiences to attendees for building transformer pipelines in [PyTerrier](https://github.com/terrier-org/pyterrier). 

This notebook aims to demonstrate use of the [DeepImpact learned sparse retrieval](https://github.com/DI4IR/SIGIR2021) for end-to-end indexing and retrieval in PyTerrier, as provided by the [*Pyterrier DeepImpact*](https://github.com/terrierteam/pyterrier_deepimpact) plugin.

In this notebook, you will experience indexing and retrieval using pyterrier_deepimpact.

NB: DeepImpact is time-consuming. For this reason, we are not able to demonstate ColBERT on corpora larger than Vaswani (11k abstracts) within time constraints for the tutorial. Please refer to [this example](https://github.com/terrierteam/pyterrier_deepimpact/blob/main/cord19_example.py) for an experiment conducted using the [CORD19 corpus](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251955/).

## Setup

In the following, we will set up the libraries required to execute the notebook.

In [None]:
# Fixing a temporary visualisation problem on Google Colab
!pip uninstall -y  pandas
!pip install pandas

Installing the required packages, i.e., PyTerrier and its DeepImpact plugin.

In [None]:
!pip install --upgrade git+https://github.com/terrier-org/pyterrier.git#egg=python-terrier
!pip install --upgrade git+https://github.com/terrierteam/pyterrier_deepimpact.git

## [PyTerrier](https://github.com/terrier-org/pyterrier) initialization

Lets get [PyTerrier](https://github.com/terrier-org/pyterrier) started. This will download the latest version of the [Terrier](http://terrier.org) IR platform.

In [None]:
import pyterrier as pt
pt.init()

Importing the DeepImpact custom PyTerrier's indexer.

In [None]:
from pyt_deepimpact import DeepImpactIndexer

## [DeepImpact](http://terrier.org) inverted indexing

We (automatically) download a pre-trained version of the DeepImpact model, and we use it to generate a [Terrier](http://terrier.org) inverted index for the Vaswani collection. Note that the pre-trained DeepImpact model requires no stemming and no stopword removal.

In [None]:
vaswani = pt.get_dataset("vaswani")

pt_index_path = './terrier_di_vaswani'

parent = pt.index.IterDictIndexer(pt_index_path)
parent.setProperty("termpipelines", "")
indexer = DeepImpactIndexer(parent, batch_size=32)
indexer.index(vaswani.get_corpus_iter())

index_ref = pt.IndexRef.of(pt_index_path + "/data.properties")
index_di = pt.IndexFactory.of(index_ref)


## [Terrier](http://terrier.org) inverted indexing

We built a regular [Terrier](http://terrier.org) inverted index for the Vaswani collection. To be consistent with the DeepImpact index generated previously, we disable stemming and stopword removal.

In [None]:
pt_index_path = './terrier_vaswani'

indexer = pt.index.IterDictIndexer(pt_index_path)
indexer.setProperty("termpipelines", "")
index_ref = indexer.index(vaswani.get_corpus_iter())

index_ref = pt.IndexRef.of(pt_index_path + "/data.properties")
index = pt.IndexFactory.of(index_ref)

# Retrieval experiments

Now that indexing completed, we can prepare an experiment. We are going to retrieve the top 10 ranked documents for the official Vaswani topics, and compute several effectiveness metrics. 

In [None]:
pt.Experiment([
        pt.BatchRetrieve(index, wmodel="BM25"),
        pt.BatchRetrieve(index_di, wmodel="Tf")
    ],
    vaswani.get_topics(), vaswani.get_qrels(),
    names=['bm25', "deep_impact"],
    eval_metrics=["map", "recip_rank", "ndcg_cut_10"]
)

#  That's all folks

Once you have finished all of the Part 4 notebooks, please dont forget to complete our exit quiz: https://forms.office.com/r/RiYSAxAKhk!