In [1]:
import dspy

In [2]:
from dspy.datasets import HotPotQA

# Load the dataset.
dataset = HotPotQA(train_seed=1, train_size=20, eval_seed=2023, dev_size=50, test_size=0)

# Tell DSPy that the 'question' field is the input. Any other fields are labels and/or metadata.
trainset = [x.with_inputs('question') for x in dataset.train]
devset = [x.with_inputs('question') for x in dataset.dev]

len(trainset), len(devset)

  table = cls._concat_blocks(blocks, axis=0)


(20, 50)

In [3]:
dataset.train[0].items()

[('question',
  'At My Window was released by which American singer-songwriter?'),
 ('answer', 'John Townes Van Zandt')]

# Use Ollama
- Use the `Ollamalocal` in the dspy to interact with ollama model.

In [4]:
ollama_model = dspy.OllamaLocal(
    model='phi3',
    model_type='text',
    max_tokens=350,
    temperature=0.7,
    top_p=0.9,
    frequency_penalty=1.17,
    top_k=40,
    timeout_s=180
)

In [5]:
ollama_model("Tell me about the weather on pluto?")

[" Pluto, being a dwarf planet located in our solar system's Kuiper Belt, experiences extremely harsh and cold conditions. Its average surface temperature is estimated to be around -375 to -400 degrees Fahrenheit (-225 to -240 degrees Celsius). \n\nThe weather on Pluto consists mostly of thin atmospheric layers made up mainly of nitrogen, with trace amounts of methane and carbon monoxide. The atmosphere is so tenuous that it only exerts a pressure equivalent to the difference between mountain air at sea level and top-floor altitude in tall buildings on Earth – about 1/10 millionth atmospheric pressure (approximately 2e-8 bar).\n\nThe weather patterns are also influenced by Pluto's elliptical orbit around the Sun, which causes seasonal changes. However, due to its great distance from the Sun and extremely low temperatures, these seasons have little effect on surface conditions compared to what we experience here on Earth. The atmosphere expands when Pluto is closest to the sun (in perih

# Configure LLM 
- In order to use the ollama model, set the DsPy settings.

In [6]:
dspy.settings.configure(lm=ollama_model)

- The `Signature` is more like a `Task` that you want to be performed.
- The docstring is like `system` prompt.
- For the example below (QA) even the input and the output fields are defined.

In [7]:
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words.")

In [8]:
# Define predictor. 
# The predictor is "informed" about the task to perform
generate_answer = dspy.Predict(BasicQA)

In [9]:
dev_example = devset[18]
pred = generate_answer(question=dev_example.question)
print(dev_example)
print(pred)

Example({'question': 'What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?', 'answer': 'English', 'gold_titles': {'Robert Irvine', 'Restaurant: Impossible'}}) (input_keys={'question'})
Prediction(
    answer='Answer: American-born Frenchman'
)


In [10]:
# You can inspect the ollama history to see the exact prompt
ollama_model.inspect_history(1)




Answer questions with short factoid answers.

---

Follow the following format.

Question: ${question}
Answer: often between 1 and 5 words.

---

Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?
Answer:[32m Answer: American-born Frenchman[0m





'\n\n\nAnswer questions with short factoid answers.\n\n---\n\nFollow the following format.\n\nQuestion: ${question}\nAnswer: often between 1 and 5 words.\n\n---\n\nQuestion: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?\nAnswer:\x1b[32m Answer: American-born Frenchman\x1b[0m\n\n\n'

# COT
- Lets check how this looks as a `chain of thought` task.
- Note DsPy changes the prompt template (at least thats how I see it)

In [11]:
# The predictor is changing, the signature is not
generate_answer_with_cot = dspy.ChainOfThought(BasicQA)

# Lets run it on the same input
pred = generate_answer_with_cot(question=dev_example.question)

# Print the prediction
print(pred)

Prediction(
    rationale='Question: What is the nationality of the chef and restaurateur featured in Restaurant: Impossible?\nReasoning: To find this information, we can recall details from episodes or general knowledge about reality TV shows. The show "Restaurant: Impossible" focuses on struggling restaurant owners who are helped by celebrity chefs to improve their businesses.',
    answer='American'
)


# Using the Retrieval Logic

In [12]:
# A retrieval machine is needed
colbertv2_wiki17_abstracts = dspy.ColBERTv2(url='http://20.102.90.50:2017/wiki17_abstracts')
dspy.settings.configure(lm=ollama_model, rm=colbertv2_wiki17_abstracts)



In [13]:
# retrieve the top 2 passages/contexts
retrieve = dspy.Retrieve(k=3)
topk_passages = retrieve(dev_example.question).passages

print(f"Top {retrieve.k} passages for the question : {dev_example.question} \n", '-'*30, '\n')

for idx, passage in enumerate(topk_passages):
    print(f"{idx+1}]", passage, '\n')

Top 3 passages for the question : What is the nationality of the chef and restaurateur featured in Restaurant: Impossible? 
 ------------------------------ 

1] Restaurant: Impossible | Restaurant: Impossible is an American reality television series, featuring chef and restaurateur Robert Irvine, that aired on Food Network from 2011 to 2016. 

2] Jean Joho | Jean Joho is a French-American chef and restaurateur. He is chef/proprietor of Everest in Chicago (founded in 1986), Paris Club Bistro & Bar and Studio Paris in Chicago, The Eiffel Tower Restaurant in Las Vegas, and Brasserie JO in Boston. 

3] List of Restaurant: Impossible episodes | This is the list of the episodes for the American cooking and reality television series "Restaurant Impossible", produced by Food Network. The premise of the series is that within two days and on a budget of $10,000, celebrity chef Robert Irvine renovates a failing American restaurant with the goal of helping to restore it to profitability and promin

In [14]:
retrieve("When was the first FIFA World Cup held?").passages[0]

'History of the FIFA World Cup | The FIFA World Cup was first held in 1930, when FIFA president Jules Rimet decided to stage an international football tournament. The inaugural edition, held in 1930, was contested as a final tournament of only thirteen teams invited by the organization. Since then, the World Cup has experienced successive expansions and format remodeling to its current 32-team final tournament preceded by a two-year qualifying process, involving over 200 teams from around the world.'

# Program 1

- A complete program
- A RAG pipeline.
- Given a question, search and retrieve the top 3 passages in wikipedia and then use them as context for LLM
- Generate an answer from the LLM.

In [15]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words.")

# now the actual program. `Module`
- It needs 2 things.
    1. The `__init__` method that declares the sub-module it needs. In this case the `dspy.Retrieve` and `dspy.ChainOfThought`.
    2. The `forward` method will describe the control flow of answering the question using the modules we have.

In [16]:
class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question):
        context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)



# Compiling the RAG
- A training set of 20 QAs
- A metric of validations, to check if the answer is correct. We can also check of the retrieved context is correct.
- A teleprompter that can optimize the programs.

To me this is more like having a `few shot examples`, and way to validate the responses from the LLM or the retrieval engine, and a teleprompter that optimzes your prompt for the task.

The author of DsPy are using a different language, but underneath the goal is just creating and optimizing a RAG prompt (basically prompt-engineering)

In [17]:
import dspy.evaluate
from dspy.teleprompt import BootstrapFewShot

# Validation logic: Check that the predict answer is correct
# Also check that the retrieved ccontext does actually contain the answer.

def validate_context_and_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM, answer_PM

# Set up a basic teleprompter, which will compile out RAG pipeline
teleprompter = BootstrapFewShot(metric=validate_context_and_answer, max_bootstrapped_demos=10)

# Compile
compile_rag = teleprompter.compile(RAG(), trainset=trainset)
    

 50%|█████     | 10/20 [00:15<00:15,  1.53s/it]


In [18]:
# Now it compiled. Lets test it

In [19]:
test_question = "What castle did David Gregory inherit?"

pred = compile_rag(test_question)

In [20]:
print(f"Question: {test_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contets (truncated): {[c[:200]+'...' for c in pred.context]}")

Question: What castle did David Gregory inherit?
Predicted Answer: There seems to be some confusion; however, the castle inherited by David Gregory (physician) mentioned herein is Kinnairdy Castle based on external knowledge beyond these passages.
Retrieved Contets (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'Gregory Tarchaneiotes | Gregory Tarchaneiotes (Greek: Γρηγόριος Ταρχανειώτης , Italian: "Gregorio Tracanioto" or "Tracamoto" ) was a "protospatharius" and the long-reigning catepan of Italy from 998 t...', 'David Gregory (mathematician) | David Gregory (originally spelt Gregorie) FRS (? 1659 – 10 October 1708) was a Scottish mathematician and astronomer. He was professor of mathematics at the University ...']


In [21]:
# To peek at the learned objects
for name, parameter in compile_rag.named_predictors():
    print(name)
    print(parameter.demos[0])
    print()


generate_answer
Example({'augmented': True, 'context': ['At My Window (album) | At My Window is an album released by Folk/country singer-songwriter Townes Van Zandt in 1987. This was Van Zandt\'s first studio album in the nine years that followed 1978\'s "Flyin\' Shoes", and his only studio album recorded in the 1980s. Although the songwriter had become less prolific, this release showed that the quality of his material remained high.', 'Little Window | Little Window is the debut album of American singer-songwriter Baby Dee. The album was released in 2002 on the Durtro label. It was produced, composed, and performed entirely by Dee.', 'Windows and Walls | Windows and Walls is the eighth album by American singer-songwriter Dan Fogelberg, released in 1984 (see 1984 in music). The first single, "The Language of Love", reached 13 on the U.S. "Billboard" Hot 100 chart. Although the follow-up, "Believe in Me", missed the Top 40 of the pop chart, peaking at No. 48, it became the singer\'s fou

In [22]:
# Let's test on the dev set.

# Evaluate the answers
- Using exact match

In [23]:
from dspy.evaluate.evaluate import Evaluate

# Define the evaluation
evaluate_on_hotpotqa = Evaluate(devset=devset, 
                                num_threads=1, 
                                display_progress=True,
                                display_table=5)

# Evaluate the `compiled_rag` program with the `answer_exact_match` metric.
metric = dspy.evaluate.answer_exact_match
evaluate_on_hotpotqa(compile_rag, metric=metric)

Average Metric: 2 / 50  (4.0): 100%|██████████| 50/50 [02:38<00:00,  3.18s/it] 


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,answer_exact_match
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...","Winston Churchill served as Prime Minister of the United Kingdom during World War II, playing a crucial role in leading Britain through one of its...",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,National Hockey League (NHL) Expansion Draft conducted during the 2017–18 season,False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...","The Wings entered a new era, following the retirement of Canadian retired professional ice hockey player and current general manager of the Tampa Bay Lightning...",False
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...","The river near the Crichton Collegiate Church is not directly mentioned in the provided context, but based on its proximity to Crichton Castle which lies...",False
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties...",King Alfred the Great,✔️ [True]


4.0

- One thing I notice, is that the last pred should be `True`. 
- I think the exact_match is looking for exact string match. So, even though all the answers above are `True`, some are labeled as False.

# Evaluating the Retrieval
- The dev set includes the gold titles that should be retrieved. So we can use them for evaluation

In [24]:
def gold_passages_retrieved(example, pred, trace=None):
    gold_titles = set(map(dspy.evaluate.normalize_text, example['gold_titles']))
    found_titles = set(map(dspy.evaluate.normalize_text, [c.split(' | ')[0] for c in pred.context]))

    return gold_titles.issubset(found_titles)

compiled_rag_retrieval_score = evaluate_on_hotpotqa(compile_rag, metric=gold_passages_retrieved)

Average Metric: 13 / 50  (26.0): 100%|██████████| 50/50 [02:52<00:00,  3.45s/it]


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,gold_passages_retrieved
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...","No, only Cangzhou is located in the Hebei province of China. Qionghai does not appear to be part of this province based on the provided...",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}",['2017–18 Pittsburgh Penguins season | The 2017–18 Pittsburgh Penguins season will be the 51st season for the National Hockey League ice hockey team that was...,"The National Hockey League (NHL) conducted the draft in which Marc-Andre Fleury was selected by the Vegas Golden Knights for their inaugural 2017-18 season, known...",✔️ [True]
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...","The Wings entered a new era, following the retirement of Canadian retired professional ice hockey player Steve Yzerman. However, it's important to clarify that as...",✔️ [True]
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...","The river near Crichton Collegiate Church is not directly mentioned in either of the provided contexts [1] or [2]. However, using historical knowledge and geographical...",✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","[""Æthelweard of East Anglia | Æthelweard (died 854) was a 9th-century king of East Anglia, the long-lived Anglo-Saxon kingdom which today includes the English counties...","In the 10th Century A.D., Ealhswith had a son called Æthelweard by King Alfred the Great. However, it should be noted that while this information...",False


# Program 2: Multi-Hop Search
- Useful to answering complex question such as where was the singer of the song XYZ born? 
- To answer the question above, first the name of the singer would have to be identified, and then where he is from. These 2 pieces for information may not be in the same place.
- Approach is to retrieve results, and then generate additional queries to gather additional information if necessary.


- Still use the `GenerateAnswer` Signature. We also need a Signature for the `hop` behaviour.

In [25]:
class GenerateSearchQuery(dspy.Signature):
    """Write a simple search query that will help answer a complex question."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    query = dspy.OutputField()

# Note could have used GenerateAnswer.signature.context above to avoid duplication.    

In [26]:
# Implementation for simple Baleen (name is author's name for the approach )
from dsp.utils import deduplicate

class SimplifiedBaleen(dspy.Module):
    def __init__(self, passages_per_hop=3, max_hops=2):
        super().__init__()

        self.generate_query = [dspy.ChainOfThought(GenerateSearchQuery) for _ in range(max_hops)]
        self.retrieve = dspy.Retrieve(k=passages_per_hop)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)
        self.max_hops = max_hops

    def forward(self, question):
        context = []
        for hop in range(self.max_hops):
            query = self.generate_query[hop](context=context, question=question).query
            passages = self.retrieve(query).passages
            context = deduplicate(context + passages)

        pred = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=pred.answer)

# Zero-Shot
- Without compiling. Means we will be more dependent on the capabilties of model, and how good it is out of the box.


In [27]:
test_question = "How many storeys are in the castle that David Gregory inherited?"
# Get the prediction. This contains `pred.context` and `pred.answer`.
uncompiled_baleen = SimplifiedBaleen()

pred = uncompiled_baleen(test_question)

# Print the contexts and the answer.
print(f"Question: {test_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts (truncated): {[c[:200] + '...' for c in pred.context]}")


Question: How many storeys are in the castle that David Gregory inherited?
Predicted Answer: Five storeys.
Retrieved Contexts (truncated): ['David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinn...', 'David Gregory (footballer, born 1970) | Born in Polstead, Gregory began his career at Ipswich Town, making 32 appearances between 1987–1995. He made two appearances on loan at Hereford United and thre...', 'Karl D. Gregory Cooperative House | Karl D. Gregory Cooperative House is a member of the Inter-Cooperative Council at the University of Michigan. The structure that stands at 1617 Washtenaw was origin...', 'Kinnairdy Castle | Kinnairdy Castle is a tower house, having five storeys and a garret, two miles south of Aberchirder, Aberdeenshire, Scotland. The alternative name is Old Kinnairdy....', 'Kinfauns Castle | Kinfauns Castle was designed b

In [28]:
ollama_model.inspect_history(3)




Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context:
[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory's use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»
[2] «David Gregory (footballer, born 1970) | Born in Polstead, Gregory beg

'\n\n\nWrite a simple search query that will help answer a complex question.\n\n---\n\nFollow the following format.\n\nContext: may contain relevant facts\n\nQuestion: ${question}\n\nReasoning: Let\'s think step by step in order to ${produce the query}. We ...\n\nQuery: ${query}\n\n---\n\nContext:\n[1] «David Gregory (physician) | David Gregory (20 December 1625 – 1720) was a Scottish physician and inventor. His surname is sometimes spelt as Gregorie, the original Scottish spelling. He inherited Kinnairdy Castle in 1664. Three of his twenty-nine children became mathematics professors. He is credited with inventing a military cannon that Isaac Newton described as "being destructive to the human species". Copies and details of the model no longer exist. Gregory\'s use of a barometer to predict farming-related weather conditions led him to be accused of witchcraft by Presbyterian ministers from Aberdeen, although he was never convicted.»\n[2] «David Gregory (footballer, born 1970) | Born 

# Compiling the Baleen program
Now is the time to compile our multi-hop (SimplifiedBaleen) program.

We will first define our validation logic, which will simply require that:

- The predicted answer matches the gold answer.
- The retrieved context contains the gold answer.
- None of the generated queries is rambling (i.e., none exceeds 100 characters in length).
- None of the generated queries is roughly repeated (i.e., none is within 0.8 or higher F1 score of earlier queries).

In [29]:
def validate_context_and_answer_and_hops(example, pred, trace=None):
    if not dspy.evaluate.answer_exact_match(example, pred): return False
    if not dspy.evaluate.answer_passage_match(example, pred): return False

    hops = [example.question] + [outputs.query for *_, outputs in trace if 'query' in outputs]

    if max([len(h) for h in hops]) > 100: return False
    if any(dspy.evaluate.answer_exact_match_str(hops[idx], hops[:idx], frac=0.8) for idx in range(2, len(hops))): return False

    return True

In [30]:
teleprompter = BootstrapFewShot(metric=validate_context_and_answer_and_hops)
compiled_baleen = teleprompter.compile(SimplifiedBaleen(), teacher=SimplifiedBaleen(passages_per_hop=2), trainset=trainset)


100%|██████████| 20/20 [02:13<00:00,  6.69s/it]


# Evaluating the Retrieval
Earlier, it appeared like our simple RAG program was not very effective at finding all evidence required for answering each question. Is this resolved by the adding some extra steps in the forward function of SimplifiedBaleen? What about compiling, does it help for that?

The answer for these questions is not always going to be obvious. However, DSPy makes it extremely easy to try many diverse approaches with minimal effort.

Let's evaluate the quality of retrieval of our compiled and uncompiled Baleen pipelines!

In [31]:
uncompiled_baleen_retrieval_score = evaluate_on_hotpotqa(uncompiled_baleen, metric=gold_passages_retrieved)


Average Metric: 20 / 50  (40.0): 100%|██████████| 50/50 [06:15<00:00,  7.51s/it]


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,gold_passages_retrieved
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...",No,False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}",['NHL Central Scouting Bureau | The NHL Central Scouting Services (CSS) is a department within the National Hockey League that ranks prospects for the NHL...,"It was not specifically the National Hockey League (NHL) that conducted Marc-André Fleury's selection to join the Vegas Golden Knights in the 2017-18 season, but...",False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Steve Yzerman | Stephen Gregory ""Steve"" Yzerman ( ; born May 9, 1965) is a Canadian retired professional ice hockey player and current general manager...",Steve Yzerman,False
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...",River Tyne,✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}",['Mother of Kings | Mother of Kings is a historical novel by Poul Anderson. It was first published in 2001 by Tor Books. The book...,Edward,False


In [32]:
compiled_baleen_retrieval_score = evaluate_on_hotpotqa(compiled_baleen, metric=gold_passages_retrieved)


Average Metric: 11 / 50  (22.0): 100%|██████████| 50/50 [08:16<00:00,  9.92s/it]


Unnamed: 0,question,example_answer,gold_titles,context,pred_answer,gold_passages_retrieved
0,Are both Cangzhou and Qionghai in the Hebei province of China?,no,"{'Cangzhou', 'Qionghai'}","['Cangzhou | Cangzhou () is a prefecture-level city in eastern Hebei province, People\'s Republic of China. At the 2010 census, Cangzhou\'s built-up (""or metro"") area...","No, only Cangzhou is located within the Hebei province of China. Qionghai County is not mentioned as being part of this region; it belongs to...",False
1,Who conducts the draft in which Marc-Andre Fleury was drafted to the Vegas Golden Knights for the 2017-18 season?,National Hockey League,"{'2017–18 Pittsburgh Penguins season', '2017 NHL Expansion Draft'}","[""Marc-André Fleury | Marc-André Fleury (born November 28, 1984) is a French-Canadian professional ice hockey goaltender playing for the Vegas Golden Knights of the National...","The NHL (National Hockey League) conducts the Entry Draft where teams select eligible players, including Marc-Andre Fleury for their 2017-18 season roster. However, it's important...",False
2,"The Wings entered a new era, following the retirement of which Canadian retired professional ice hockey player and current general manager of the Tampa Bay...",Steve Yzerman,"{'2006–07 Detroit Red Wings season', 'Steve Yzerman'}","['Jim Schoenfeld | James Grant Schoenfeld (born September 4, 1952) is a Canadian professional ice hockey executive and former player. He is currently the assistant...",There is no information provided about any Canadian retired professional ice hockey player who became a General Manager for the Tampa Bay Lightning after retiring...,False
3,What river is near the Crichton Collegiate Church?,the River Tyne,"{'Crichton Castle', 'Crichton Collegiate Church'}","[""Crichton Collegiate Church | Crichton Collegiate Church is situated about 0.6 mi south west of the hamlet of Crichton in Midlothian, Scotland. Crichton itself is...","There is no information provided about ""Crichton Collegiate Church"" being near any river, as it does not appear in the given texts related to Anatoly...",✔️ [True]
4,In the 10th Century A.D. Ealhswith had a son called Æthelweard by which English king?,King Alfred the Great,"{'Æthelweard (son of Alfred)', 'Ealhswith'}","['Æthelweard (son of Alfred) | Æthelweard (d. 920 or 922) was the younger son of King Alfred the Great and Ealhswith.', 'Eadred Ætheling | Eadred...",There seems to be some confusion with the information provided as none of the context directly answers the question about Æthelweard and King Edmund I...,False


In [33]:
print(f"## Retrieval Score for RAG: {compiled_rag_retrieval_score}")  # note that for RAG, compilation has no effect on the retrieval step
print(f"## Retrieval Score for uncompiled Baleen: {uncompiled_baleen_retrieval_score}")
print(f"## Retrieval Score for compiled Baleen: {compiled_baleen_retrieval_score}")

## Retrieval Score for RAG: 26.0
## Retrieval Score for uncompiled Baleen: 40.0
## Retrieval Score for compiled Baleen: 22.0


In [34]:
compiled_baleen("How many storeys are in the castle that David Gregory inherited?")
ollama_model.inspect_history(n=3)




Write a simple search query that will help answer a complex question.

---

Follow the following format.

Context: may contain relevant facts

Question: ${question}

Reasoning: Let's think step by step in order to ${produce the query}. We ...

Query: ${query}

---

Context:
[1] «1972 FA Charity Shield | The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.»
[2] «1968 FA Charity Shield | The 1968 FA Charity Shield was a football match played on 3 August 1968 between Football League champions Manchester City and FA Cup winners West Bromwich Albion. It was the 46th Charity Shield match and was played at City's home ground, Maine Road. Manchester City won 6–1.»

Question: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield

Reasoning: Let's think step by step in order to Context: The context provided includes information about different years when football clubs participated in the FA Charity Shield. One of them mention

'\n\n\nWrite a simple search query that will help answer a complex question.\n\n---\n\nFollow the following format.\n\nContext: may contain relevant facts\n\nQuestion: ${question}\n\nReasoning: Let\'s think step by step in order to ${produce the query}. We ...\n\nQuery: ${query}\n\n---\n\nContext:\n[1] «1972 FA Charity Shield | The 1972 FA Charity Shield was contested between Manchester City and Aston Villa.»\n[2] «1968 FA Charity Shield | The 1968 FA Charity Shield was a football match played on 3 August 1968 between Football League champions Manchester City and FA Cup winners West Bromwich Albion. It was the 46th Charity Shield match and was played at City\'s home ground, Maine Road. Manchester City won 6–1.»\n\nQuestion: In what year was the club founded that played Manchester City in the 1972 FA Charity Shield\n\nReasoning: Let\'s think step by step in order to Context: The context provided includes information about different years when football clubs participated in the FA Charit

- I notice that there is not much difference between (maybe actually degradation after compilation)
- However, this could be because I am using `phi3`, which is a small model.