# SIGIR 2022 Showcase
- showcase some features of the [`ir_axioms`](https://github.com/webis-de/ir_axioms) framework
- example data:
    - MSMARCO documents and index
    - TREC Deep Learning topics

## Preparation

1. initialize PyTerrier

In [1]:
from pyterrier import started, init

if not started():
    init(tqdm="auto")

PyTerrier 0.8.1 has loaded Terrier 5.6 (built by craigmacdonald on 2021-09-17 13:27)

No etc/terrier.properties, using terrier.default.properties for bootstrap configuration.


2. download MSMARCO and TREC 2019/2020 Deep Learning tracks (using `ir_datasets`)

In [2]:
from pyterrier.datasets import get_dataset

msmarco = get_dataset("irds:msmarco-passage")
msmarco_trec_2019 = get_dataset("irds:msmarco-passage/trec-dl-2019/judged")
msmarco_trec_2020 = get_dataset("irds:msmarco-passage/trec-dl-2020/judged")

3. index MSMARCO passages (skip if index already exists)

In [3]:
from os.path import exists
from pyterrier import IndexRef
from pyterrier.index import IterDictIndexer

if not exists("./msmarco-index/"):
    index = IterDictIndexer("./msmarco-index/").index(msmarco.get_corpus_iter())
else:
    index = IndexRef.of("./msmarco-index/")

### Baseline Rankers
- define initial retrieval with [PyTerrier](https://pyterrier.readthedocs.io/)
    - BM25
    - PL2

In [4]:
from pyterrier.batchretrieve import BatchRetrieve

bm25 = BatchRetrieve(index, wmodel="BM25")
pl2 = BatchRetrieve(index, wmodel="PL2")

## Post-hoc Axiomatic Analyses
- create axiomatic analyses with the `AxiomaticExperiment` class
- parameters:
    - retrieval systems
    - axioms
    - depth (how many documents to consider when building preference matrices)
    - topics and qrels
    - index location
    - document dataset (for full texts)

In [5]:
from ir_axioms.axiom import TFC1, STMC1_fastText, STMC2_fastText, LB1
from ir_axioms.backend.pyterrier.experiment import AxiomaticExperiment

experiment = AxiomaticExperiment(
    [bm25, pl2],
    names=["BM25", "PL2"],
    axioms=[TFC1(), STMC1_fastText(), STMC2_fastText(), LB1()],
    depth=5,
    topics=msmarco_trec_2020.get_topics(),
    qrels=msmarco_trec_2020.get_qrels(),
    index=index,
    dataset=msmarco,
    filter_by_qrels=True,
    verbose=True,
)

### Calculate Pairwise Preferences
- `DataFrame` of all pairwise preferences (up to the specified depth)
- rows: document pairs
- colums: preferences of all axioms + original DF columns
- used in further analyses

In [6]:
experiment.preferences

Computing system axiomatic preferences:   0%|          | 0/2 [00:00<?, ?system/s]

Computing query axiom preferences:   0%|          | 0/54 [00:00<?, ?query/s]

Computing query axiom preferences:   0%|          | 0/54 [00:00<?, ?query/s]

Unnamed: 0,qid,docid_a,docno_a,rank_a,score_a,query,name,label_a,iteration_a,docid_b,...,rank_b,score_b,label_b,iteration_b,ORIG_preference,ORACLE_preference,TFC1_preference,STMC1-fastText_preference,STMC2-fastText_preference,LB1_preference
0,1030303,8726436,8726436,0,54.354218,who is aziz hashim,BM25,3,0,8726436,...,0,54.354218,3,0,0,0,0,0,0,0
1,1030303,8726436,8726436,0,54.354218,who is aziz hashim,BM25,3,0,8726433,...,1,44.220280,0,0,1,1,0,1,0,0
2,1030303,8726436,8726436,0,54.354218,who is aziz hashim,BM25,3,0,8726435,...,2,40.514966,3,0,1,0,0,1,0,0
3,1030303,8726436,8726436,0,54.354218,who is aziz hashim,BM25,3,0,8726429,...,3,39.687092,0,0,1,1,0,-1,0,0
4,1030303,8726436,8726436,0,54.354218,who is aziz hashim,BM25,3,0,8726437,...,4,35.847262,3,0,1,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1318,997622,2182282,2182282,4,17.011348,where is the show shameless filmed,PL2,0,0,5430574,...,0,18.361460,0,0,-1,0,0,1,0,0
1319,997622,2182282,2182282,4,17.011348,where is the show shameless filmed,PL2,0,0,5430569,...,1,18.093962,0,0,-1,0,0,-1,-1,0
1320,997622,2182282,2182282,4,17.011348,where is the show shameless filmed,PL2,0,0,73465,...,2,17.972888,3,0,-1,-1,0,-1,0,-1
1321,997622,2182282,2182282,4,17.011348,where is the show shameless filmed,PL2,0,0,4543865,...,3,17.548734,3,0,-1,-1,0,-1,0,-1


In [7]:
experiment.preferences[[
    "name", "query", "docno_a", "docno_b", 
    "ORIG_preference", "ORACLE_preference", "STMC1-fastText_preference"
]].head()

Unnamed: 0,name,query,docno_a,docno_b,ORIG_preference,ORACLE_preference,STMC1-fastText_preference
0,BM25,who is aziz hashim,8726436,8726436,0,0,0
1,BM25,who is aziz hashim,8726436,8726433,1,1,1
2,BM25,who is aziz hashim,8726436,8726435,1,0,1
3,BM25,who is aziz hashim,8726436,8726429,1,1,-1
4,BM25,who is aziz hashim,8726436,8726437,1,0,1


### Preference Distribution
- distribution of axiom preferences compared to ORIG
- how many...
    - return a "zero" preference?
    - agree with ORIG?
    - disagree with ORIG?
- identify "useful" axioms
- interesting: axioms that often disagree with ORIG

In [8]:
experiment.preference_distribution

Unnamed: 0,axiom,axiom == 0,axiom == ORIG,axiom != ORIG
0,ORIG,0,1056,0
1,ORACLE,521,308,227
2,TFC1,943,93,20
3,STMC1-fastText,11,620,425
4,STMC2-fastText,952,59,45
5,LB1,819,146,91


### Inconsistent Pairs
- document pairs where...
    - ORIG contradicts ORACLE
    - at least 1 axiom could correct the mistake
- useful for focussed debugging of retrieval systems

In [9]:
inconsistent_pair = experiment.inconsistent_pairs.head(1)
inconsistent_pair[[
    "name", "query", "docno_a", "docno_b", 
    "ORIG_preference", "ORACLE_preference", "STMC1-fastText_preference"
]]

Unnamed: 0,name,query,docno_a,docno_b,ORIG_preference,ORACLE_preference,STMC1-fastText_preference
11,BM25,who is aziz hashim,8726435,8726433,-1,1,1


In [None]:
msmarco.irds_ref().docs_store().get(inconsistent_pair["docno_a"].iloc[0]).text

## Axiomatic Re-Ranking
- improve consistency with axioms
- steps:
    1. combine/weight multiple axioms
    1. re-rank with combined axiom

### Majority Vote
- idea: only change order where multiple axioms agree
    - majority vote between specified axioms
    - each axiom votes for its preference
    - fall back to original ranking order if not a majority >= 50%
- combine axioms with Python operators
    `%`: majority vote (with 50% majority)
    `|`: fallback, if preference is 0
- combined axiom is itself an `Axiom`

In [12]:
from ir_axioms.axiom import ORIG

majority_vote = (TFC1() % STMC1_fastText() % STMC2_fastText() % LB1()) | ORIG()

### KwikSort
- similar to quicksort
    1. select pivot
    1. compare and split rankings before and after the pivot
    1. repeat for both sub-rankings
- results depend on pivot selection
- implemented as PyTerrier `Transformer`
    - easy to use with PyTerrier pipelines (e.g., to re-rank top-10)

In [13]:
from ir_axioms.backend.pyterrier.transformers import KwikSortReranker

kwiksort = KwikSortReranker(
    axiom=majority_vote,
    index=index,
    dataset=msmarco,
    verbose=True
)
bm25_kwiksort = (bm25 % 10 >> kwiksort) ^ bm25
pl2_kwiksort = (pl2 % 10 >> kwiksort) ^ pl2

### Evaluate Re-Rankers
- evaluate with PyTerrier `Experiment`
- use re-ranked pipelines like any other PyTerrier pipeline

In [14]:
from pyterrier.pipelines import Experiment

experiment = Experiment(
    [bm25, bm25_kwiksort, pl2, pl2_kwiksort],
    names=["BM25", "BM25 + KwikSort", "PL2", "PL2 + KwikSort"],
    topics=msmarco_trec_2020.get_topics(),
    qrels=msmarco_trec_2020.get_qrels(),
    eval_metrics=["ndcg_cut_10", "map"],
    verbose=True,
)
experiment

pt.Experiment:   0%|          | 0/4 [00:00<?, ?system/s]

Reranking query axiomatically:   0%|          | 0/54 [00:00<?, ?query/s]

Reranking query axiomatically:   0%|          | 0/54 [00:00<?, ?query/s]

Unnamed: 0,name,ndcg_cut_10,map
0,BM25,0.493627,0.358724
1,BM25 + KwikSort,0.492106,0.353486
2,PL2,0.483847,0.345541
3,PL2 + KwikSort,0.48488,0.344907


#### Results
- only slight changes in effectiveness
- but improved consistency with axioms
- more explainable