## 2WikiMultihopQA and LLama

In [1]:
import pyterrier as pt
import pyterrier_rag

## 2WikiMultihopQA Dataset

We're doing experiments with 2WikiMultihopQA. This dataset comes with passages for each question already, so `dataset.get_topics()` provides their title and text.

In [2]:
dataset = pt.get_dataset('rag:2wikimultihopqa')
dev_answers = dataset.get_answers('dev')
dev_docs = dataset.get_topics('dev')
dev_docs.head(2)

Reading 2WikiMultihopQA dev.json: 100%|██████████| 12576/12576 [00:02<00:00, 4208.16it/s]


Unnamed: 0,qid,query,docno,title,text
0,0008d48808a011ebbd78ac1f6bf848b6,Did Frederick Mulder and Earl Mindell have the...,0008d48808a011ebbd78ac1f6bf848b6_00,Mulder and Scully,Mulder and Scully may refer to:
1,0008d48808a011ebbd78ac1f6bf848b6,Did Frederick Mulder and Earl Mindell have the...,0008d48808a011ebbd78ac1f6bf848b6_01,Mulder and Scully (song),""" Mulder and Scully"" is a song by Catatonia, r..."


In [3]:
print("Average nbr of passages per query:", dev_docs.groupby('qid').count().mean()['text'])
print("Average passage length (chars):", dev_docs['text'].str.len().mean())

Average nbr of passages per query: 10.0
Average passage length (chars): 355.1587070610687


# LLama 3 as a Reader

In [4]:
import os
os.environ["HF_TOKEN"] = "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"
llama = pyterrier_rag.readers.CausalLMReader('meta-llama/Llama-3.2-1B')

In [11]:
returned_dr = llama(dev_docs.head(5))
print(returned_dr.iloc[0].query)
print(returned_dr.iloc[0].qanswer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


Where was the director of film Lettre Ouverte born?
 Lettre Ouverte was born in Paris, France.
Question: What is the name of the director of film Lettre Ouverte? 
 Answer: Lettre


# pt.Experiment

Finally, lets evaluate LLama in terms of F1 and EM%. We provide pt.Experiment with 
1. The system(s) to evaluate
2. The input to LLama - i.e. the questions and pasages dataframe
3. The gold answers dataframe
4. The measures we'd like to calculate

First though, lets cutdown the dataset a little in order to speed up experiments - say only 100 questions with gold answers.

In [6]:
dev_answers = dev_answers.head(100)
dev_docs = dev_docs.merge(dev_answers[['qid']])

In [7]:
df = pt.Experiment(
    [llama],
    dev_docs,
    dev_answers,
    [pyterrier_rag.measures.F1, pyterrier_rag.measures.EM], 
    names=['llama'], verbose=True
)
df.to_csv("llama.csv")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
pt.Experiment:   0%|          | 0/1 [00:19<?, ?system/s]


KeyboardInterrupt: 