# PyTerrier Starter: Full-Rank with ChatNoir for Task 2 in Touche 2023

This is the [PyTerrier](https://github.com/terrier-org/pyterrier) baseline for [task 2 on Evidence Retrieval for Causal Questions](https://touche.webis.de/clef23/touche23-web/evidence-retrieval-for-causal-questions.html) in [Touché 2023](https://touche.webis.de/clef23/touche23-web/).

This notebook implements "full-rank" retrieval with ChatNoir so that the results retrieved by ChatNoir can be easily re-ranked.

### Adapt the notebook locally

You can adapt/run this baseline locally with docker and can directly deploy and run it in [TIRA.io](https://www.tira.io/task/touche-2023-task-2).

With docker installed, you can start this notebook with the command:

```
docker run --rm -ti \
    -p 8888:8888 \
    -v ${PWD}:/workspace \
    webis/tira-touche23-task-2-pyterrier-baseline:0.0.2 \
    jupyter-lab --allow-root --ip 0.0.0.0
```

### Deployment in TIRA

To deploy approaches in TIRA, you upload the image and specify the command that is to be executed in the image. TIRA gives you a personalized documentation on how to upload the image, and to run this notebook in TIRA you can specify the following command in TIRA:

```
/workspace/run-pyterrier-notebook.py --notebook /workspace/full-rank-pipeline.ipynb --input $inputDataset --output $outputDir
```

You can dry-run this on your machine by executing the command:

```
tira-run \
    --input-directory ${PWD}/sample-input/full-rank \
    --image webis/tira-touche23-task-2-pyterrier-baseline:0.0.2 \
    --command '/workspace/run-pyterrier-notebook.py --notebook /workspace/full-rank-pipeline.ipynb --input $inputDataset --output $outputDir'
```

In this example above, the command `/workspace/run-pyterrier-notebook.py --notebook /workspace/full-rank-pipeline.ipynb --input $inputDataset --output $outputDir` is the command that you would enter in TIRA, and the `--input-directory` flag points to the inputs.

This creates a run file `tira-output/run.txt`, with content like (`cat sample-output/run.txt |head -3`):

```
1 NEU clueweb12-1106wb-16-17437 1 1182.2903 chatnoir-baseline
1 NEU clueweb12-0302wb-19-28258 2 1168.2012 chatnoir-baseline
1 NEU clueweb12-1212wb-00-02238 3 1167.9583 chatnoir-baseline
```


### Additional Resources

- The [PyTerrier tutorial](https://github.com/terrier-org/ecir2021tutorial)
- The [PyTerrier documentation](https://pyterrier.readthedocs.io/en/latest/)
- The [TIRA quickstart](https://touche.webis.de/clef23/touche23-web/evidence-retrieval-for-causal-questions.html#tira-quickstart)



### Step 1: Import Dependencies

In [2]:
import pyterrier as pt
import pandas as pd
from tira.third_party_integrations import ensure_pyterrier_is_loaded, get_preconfigured_chatnoir_client, get_input_directory_and_output_directory, persist_and_normalize_run
import json
from tqdm import tqdm

ensure_pyterrier_is_loaded()
input_directory, output_directory = get_input_directory_and_output_directory('/workspace/sample-input/full-rank')

chatnoir = get_preconfigured_chatnoir_client(config_directory = input_directory, features = [], verbose = True, num_results=1000, page_size=1000)

I will use a small hardcoded example located in /workspace/sample-input/full-rank.
The output directory is /tmp/
ChatNoir Client will retrieve the top-1000 with page size of 1000 from index ClueWeb22 with 25 retries.


### Step 2: Load the Data

In [3]:
print('Step 2: Load the data.')

queries = pt.io.read_topics(input_directory + '/queries.xml', format='trecxml')

Step 2: Load the data.


### Step 3: Create Run

In [4]:
print('Step 3: Create Run.')
run = chatnoir(queries)

Step 3: Create Run.


Searching with ChatNoir: 100%|███████████████████████████████████████████████████████████████████████████████████| 3/3 [02:17<00:00, 45.98s/query]


In [5]:
run.head(3)

Unnamed: 0,qid,query,docno,score,rank
1000,2,does regular exercise lower blood pressure,clueweb22-en0037-54-15622,3306.2217,0
1001,2,does regular exercise lower blood pressure,clueweb22-en0010-17-05649,3057.9604,1
1002,2,does regular exercise lower blood pressure,clueweb22-en0012-40-06930,2979.3528,2


### Step 4: Stence Detection

In [6]:
print('Step 4: Run stence detection')

def detect_stance(query_document_pair):
    # As baseline, we return always neutral
    return 'NEU'

run['Q0'] = run.apply(lambda i: detect_stance(i), axis=1)


Step 4: Run stence detection


### Step 5: Persist Run

In [7]:
print('Step 5: Persist Run.')

persist_and_normalize_run(run, 'chatnoir-baseline', output_file=output_directory + '/run.txt')

print('Done...')

Step 5: Persist Run.
Done...


In [8]:
!head -3 {output_directory}/run.txt

1 NEU clueweb22-en0035-94-10429 1 1693.283 chatnoir-baseline
1 NEU clueweb22-en0044-02-00366 2 1539.9128 chatnoir-baseline
1 NEU clueweb22-en0031-00-09280 3 1405.4568 chatnoir-baseline
