# Homework and bakeoff: Few-shot OpenQA with DSP

In [1]:
__author__ = "Christopher Potts and Omar Khattab"
__version__ = "CS224u, Stanford, Spring 2023"

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cgpotts/cs224u/blob/master/hw_openqa.ipynb)
[![Open in SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/cgpotts/cs224u/blob/master/hw_openqa.ipynb)

If Colab is opened with this badge, please **save a copy to drive** (from the File menu) before running the notebook.

## Overview

The goal of this homework is to explore retrieval-augmented in-context learning. This is an exciting area that brings together a number of recent task ideas and modeling innovations. We will use the [DSP programming library](https://github.com/stanfordnlp/dsp) to build systems in this new mode.

Our core task is __open-domain question answering (OpenQA)__. In this task, all that is given by the dataset is a question text, and the task is to answer that question. By contrast, in modern QA tasks, the dataset provides a text and a gold passage, usually with a firm guarantee that the answer will be a substring of the passage. 

OpenQA is substantially harder than standard QA. The usual strategy is to use a _retriever_ to find passages in a large collection of texts and train a _reader_ to find answers in those passages. This means we have no guarantee that the retrieved passage will contain the answer we need. If we don't retrieve a passage containing the answer, our reader has no hope of succeeding. Although this is challenging, it is much more realistic and widely applicable than standard QA. After all, with the right retriever, an OpenQA system could be deployed over the entire Web.

The task posed by this homework is harder even than OpenQA. We are calling this task __few-shot OpenQA__. The defining feature of this task is that the reader is simply a frozen, general purpose language model. It accepts string inputs (prompts) and produces text in response. It is not trained to answer questions per se, and nothing about its structure ensures that it will respond with a substring of the prompt corresponding to anything like an answer.

__Few-shot QA__ (but not OpenQA!) is explored in the famous GPT-3 paper ([Brown et al. 2020](https://arxiv.org/abs/2005.14165)). The authors are able to get traction on the problem using GPT-3, an incredible finding. Our task here – __few-shot OpenQA__ – pushes this even further by retrieving passages to use in the prompt rather than assuming that the gold passage can be used in the prompt. If we can make this work, then it should be a major step towards flexibly and easily deploying QA technologies in new domains.

In summary:

| Task             | Passage given | Task-specific reader training |Task-specific retriever training  | 
|-----------------:|:-------------:|:-----------------------------:|:--------------------------------:|
| QA               | yes           | yes                           | n/a                              |
| OpenQA           | no            | yes                           | maybe                            |
| Few-shot QA      | yes           | no                            | n/a                              |
| Few-shot OpenQA  | no            | no                            | maybe                            | 

Just to repeat: your mission is to explore the final line in this table. The core notebook and assignment don't address the issue of training the retriever in a task-specific way, but this is something you could pursue for a final project; [the ColBERT codebase](https://github.com/stanford-futuredata/ColBERT) makes easy.

As usual, this notebook sets up the task and provides starter code. We will be relying on the DSP library, which allows us to define retrieval-augmented in-context learning systems in code. We first provide two fully implemented examples:

* _Few-shot OpenQA_: The given input is a question and the goal is to provide an answer. Some _demonstration_ Q/A pairs are sampled from a train set (in our case, SQuAD).

* _Few-shot QA with context_: The given input is a question with an associated evidence passage, and the goal is to provide an answer. The _demonstrations_ are now Q/A pairs with associated gold evidence passages. These are sampled from a train set (in our case, SQuAD).

The above examples are followed by some assignment questions aimed at helping you to think creatively about the problem. The first of these defines a core system for our target task:

* _Few-shot OpenQA with context_: This is like _few-shot QA with context_ except the passages are now retrieved from a large search index using ColBERT. 

The second question illustrates how to use the powerful DSP `annotate` function to improve the set of demonstrations used by the system.

It is a requirement of the bake-off that a general-purpose language model be used. In particular, trained QA systems cannot be used at all, and no fine-tuning is allowed either. See the original system question at the bottom of this message for guidance on which models are allowed.

Note: the models we are working with here are _big_. This poses a challenge that is increasingly common in NLP: you have to pay one way or another. You can pay to use the GPT-3 API, or you can pay to use an Eleuther model on a heavy-duty cluster computer, or you can pay with time by using an Eleuther model on a more modest computer.  __For now, though, the Cohere models are free to use, so they should be your first choice; see [setup.ipynb](setup.ipynb) if you don't have an account__.

## Set-up

We have sought to make this notebook self-contained and easy to use on a personal computer, on Google Colab, and in Sagemaker Studio. For personal computer use, we assume you have already done everything in [setup.ipynb](setup.ipynb]). For cloud usage, the next few code blocks should handle all set-up steps.

In [2]:
try: 
    # This library is our indicator that the required installs
    # need to be done.
    import datasets
    root_path = '.'
except ModuleNotFoundError:
    !git clone https://github.com/cgpotts/cs224u/
    !pip install -r cs224u/requirements.txt
    root_path = 'dsp'

In [3]:
import cohere
from datasets import load_dataset
import openai
import os
import dsp

In [4]:
os.environ["AZURE_OPENAI_KEY"]='68f239e80dbf48a0a2a69004f6af0d57'
os.environ["AZURE_OPENAI_ENDPOINT"]='https://xsum.openai.azure.com/'
os.environ["DSP_NOTEBOOK_CACHEDIR"] = os.path.join(root_path, 'cache')

openai_key = os.getenv('AZURE_OPENAI_KEY')  # or replace with your API key (optional)
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") # your endpoint should look like the following https://YOUR_RESOURCE_NAME.openai.azure.com/
openai.api_type = 'azure'
openai.api_version = '2023-03-15-preview' # this may change in the future
openai.engine='gpt4'
cohere_key = os.getenv('COHERE_API_KEY')  # or replace with your API key (optional)

colbert_server = 'http://index.contextual.ai:8893/api/search'

Here we establish the Language Model `lm` and Retriever Model `rm` that we will be using. The defaults for `lm` are just for development. You may want to develop using an inexpensive model and then do your final evalautions wih an expensive one.

In [5]:
lm = dsp.GPT3(model='gpt-35', api_key=openai_key, engine='gpt35')

# Options for Cohere: command-medium-nightly, command-xlarge-nightly
#lm = dsp.Cohere(model='command-xlarge-nightly', api_key=cohere_key)

rm = dsp.ColBERTv2(url=colbert_server)

dsp.settings.configure(lm=lm, rm=rm)

Here's a command you can run to see which OpenAI models are available; OpenAI has entered into an increasingly closed mode where many older models are not available, so there are likely to be some surprises lurking here:

In [6]:
[d["root"] for d in openai.Model.list()["data"]]

[]

## SQuAD

Our core development dataset is [SQuAD](https://rajpurkar.github.io/SQuAD-explorer/). We chose this dataset because it is well-known and widely used, and it is large enough to support lots of meaningful development work, without, though, being so large as to require lots of compute power. It is also useful that it has gold passages supporting the standard QA formulation, so we can see how well our LM performs with an "oracle" retriever that always retrieves the gold passage.

In [7]:
squad = load_dataset("squad")

The following utility just reads a SQuAD split in as a list of `SquadExample` instances:

In [8]:
def get_squad_split(squad, split="validation"):
    """
    Use `split='train'` for the train split.

    Returns
    -------
    list of SquadExample named tuples with attributes
    id, title, context, question, answers

    """
    data = zip(*[squad[split][field] for field in squad[split].features])
    return [dsp.Example(id=eid, title=title, context=context, question=q, answer=a['text']) 
            for eid, title, context, q, a in data]

### SQuAD train

To build few-shot prompts, we will often sample SQuAD train examples, so we load that split here:

In [9]:
squad_train = get_squad_split(squad, split="train")
squad_train

[{'id': '5733be284776f41900661182',
  'title': 'University_of_Notre_Dame',
  'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
  'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
  'answer': ['Saint Bernadette Soubirous']},
 {'id': '5733be284776f4190066117f',
  'title': 'University_of_Notre_Dame',
  '

### SQuAD dev

In [10]:
squad_dev = get_squad_split(squad)
squad_dev

[{'id': '56be4db0acb8001400a502ec',
  'title': 'Super_Bowl_50',
  'context': 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.',
  'question': 'Which NFL team represented the AFC at Super Bowl 50?',
  'answer': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos']},
 {'i

### SQuAD dev sample

Evaluations are expensive in this new era! Here's a small sample to use for dev assessments:

In [11]:
dev_exs = sorted(squad_dev, key=lambda x: hash(x.id))[: 200]

## Evaluation

Our evaluation protocols are the standard ones for SQuAD and related tasks: exact match of the answer (EM) and token-level F1. We'll reply primarily on DSP for these evaluation utilities; the following is a light modification of `dsp.evaluation.utils.evaluateAnswer`, which is itself built evaluation code from [apple/ml-qrecc](https://github.com/apple/ml-qrecc/blob/main/utils/evaluate_qa.py) repository. It performs very basic string normalization before doing the core comparisons.

In [12]:
from dsp.utils import EM, F1
import tqdm
import pandas as pd

def evaluateAnswer(fn, dev):
    """Evaluate a DSP program on `dev`.

    Parameters
    ----------
    fn : DSP system
    def : list of `dsp.Example` instances

    Returns
    -------
    dict with keys "df", "em", "f1" storung assessment data
    """
    data = []
    for example in tqdm.tqdm(dev):
        prediction = fn(example)
        d = dict(example)
        pred = prediction.answer
        d['prediction'] = pred
        d['em'] = EM(pred, example.answer)
        d['f1'] = F1(pred, example.answer)
        data.append(d)
    df = pd.DataFrame(data)
    em = round(100.0 * df['em'].sum() / len(dev), 1)
    df['em'] = df['em'].apply(lambda x: '✔️' if x else '❌')
    f1 = df['f1'].mean()
    return {'df': df, 'em': em, 'f1': f1}

## DSP basics

### LM usage

Here's the most basic way to use the LM:

In [13]:
lm("Which U.S. states border no U.S. states?")

['\n\nHawaii and Alaska are the only two U.S. states that border no other U.S. states.']

Keyword arguments to the underlying LM are passed through:

In [14]:
lm("Which U.S. states border no U.S. states?", temperature=0.9, n=5)

['\n\nHawaii and Alaska.',
 '\n\nHawaii and Alaska are the only two U.S. states that border no other U.S. states.',
 '\n\nHawaii and Alaska are the only two U.S. states that do not border other U.S. states.',
 '\n\nAlaska and Hawaii are the only two U.S. states that border no other U.S. states.',
 '\n\nThe following U.S. states border no other U.S. states: Hawaii, Alaska, and Connecticut.']

With `lm.inspect_history`, we can see the most recent language model calls:

In [15]:
lm.inspect_history(n=1)





Which U.S. states border no U.S. states?[32m

Hawaii and Alaska.[0m[31m 	 (and 4 other completions)[0m





### Prompt templates

In DSP, the more usual way to call the LM is to define a prompt template. Here we define a generic QA prompt template:

In [16]:
Question = dsp.Type(
    prefix="Question:", 
    desc="${the question to be answered}")

Answer = dsp.Type(
    prefix="Answer:", 
    desc="${a short factoid answer, often between 1 and 5 words}", 
    format=dsp.format_answers)

qa_template = dsp.Template(
    instructions="Answer questions with short factoid answers.", 
    question=Question(), 
    answer=Answer())

And here is a self-contained example that uses our question and template to create a prompt:

In [17]:
states_ex = dsp.Example(
    question="Which U.S. states border no U.S. states?",
    demos=dsp.sample(squad_train, k=3))

print(qa_template(states_ex))

Answer questions with short factoid answers.

---

Follow the following format.

Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Question: Immunoassays are able to detect what type of proteins?
Answer: generated by an infected organism in response to a foreign agent

Question: Why did country borders not affect differences in style within Gothic architecture?
Answer: proximity of some regions

Question: Which U.S. states border no U.S. states?
Answer:


### Prompt-based generation

We can how put the above pieces together to call the model with our constructed prompt:

In [18]:
states_ex, states_compl = dsp.generate(qa_template)(states_ex, stage='basics')

In [19]:
print(states_ex)
print(states_compl.answer)

{'question': 'Which U.S. states border no U.S. states?', 'demos': [{'id': '56bf6b0f3aeaaa14008c9604', 'title': 'Beyoncé', 'context': 'Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".', 'question': 'What album made her a worldwide known artist?', 'answer': ['Dangerously in Love']}, {'id': '57342720d058e614000b6a29', 'title': 'Infection', 'context': "

And here's precisely what the model saw and did:

In [20]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Question: Immunoassays are able to detect what type of proteins?
Answer: generated by an infected organism in response to a foreign agent

Question: Why did country borders not affect differences in style within Gothic architecture?
Answer: proximity of some regions

Question: Which U.S. states border no U.S. states?
Answer:[32m Hawaii, Alaska[0m





### Retrieval

The final major component of our systems is retrieval. When we defined `rm`, we connected to a remote ColBERT index and retriever system that we can now use for search.

In [21]:
states_ex.question

'Which U.S. states border no U.S. states?'

The basic `dsp.retrieve` method returns only passages:

In [22]:
passages = dsp.retrieve(states_ex.question, k=2)

In [23]:
passages

['Mexico–United States border | has the shortest. Among the states in Mexico, Chihuahua has the longest border with the United States, while Nuevo León has the shortest. Texas borders four Mexican states—Tamaulipas, Nuevo León, Coahuila, and Chihuahua—the most of any U.S. states. New Mexico and Arizona each borders two Mexican states (Chihuahua and Sonora; Sonora and Baja California, respectively). California borders only Baja California. Three Mexican states border two U.S. states each: Baja California borders California and Arizona; Sonora borders Arizona and New Mexico; and Chihuahua borders New Mexico and Texas. Tamaulipas, Nuevo León, and Coahuila each borders only one U.S. state: Texas. The',
 'Mexico–United States border | fertile lands along the rivers in both countries. The Rio Grande frequently meanders along the Texas–Mexico border. As a result, the United States and Mexico have a treaty by which the Rio Grande is maintained as the border, with new cut-offs and islands being

If we need passages with scores and other metadata, we can call `rm` directly:

In [24]:
rm(states_ex.question, k=2)

[{'pid': 6140356,
  'prob': 0.614211429888272,
  'rank': 1,
  'score': 22.40652084350586,
  'text': 'Mexico–United States border | has the shortest. Among the states in Mexico, Chihuahua has the longest border with the United States, while Nuevo León has the shortest. Texas borders four Mexican states—Tamaulipas, Nuevo León, Coahuila, and Chihuahua—the most of any U.S. states. New Mexico and Arizona each borders two Mexican states (Chihuahua and Sonora; Sonora and Baja California, respectively). California borders only Baja California. Three Mexican states border two U.S. states each: Baja California borders California and Arizona; Sonora borders Arizona and New Mexico; and Chihuahua borders New Mexico and Texas. Tamaulipas, Nuevo León, and Coahuila each borders only one U.S. state: Texas. The',
  'long_text': 'Mexico–United States border | has the shortest. Among the states in Mexico, Chihuahua has the longest border with the United States, while Nuevo León has the shortest. Texas bor

## Few-shot OpenQA

With the above pieces in place, we can define our first DSP system. This one does few-shot OpenQA with no context passages. In essense, our prompts contain

1. A sequences of Q/A demonstrations (no context passages).
2. The target question (no context passage).

Here is the full system; note the use of the decorator `@dsp.transformation` – this will ensure that no `example` instances are modified when the program is used.

In [25]:
@dsp.transformation
def few_shot_openqa(example, train=squad_train, k=2): 
    example.demos = dsp.sample(train, k=k)
    example, completions = dsp.generate(qa_template)(example, stage='qa')
    return completions

There are really just two steps here. Let's go through them individually. Our example:

In [26]:
ex = squad_dev[0].copy()

ex

{'id': '56be4db0acb8001400a502ec',
 'title': 'Super_Bowl_50',
 'context': 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.',
 'question': 'Which NFL team represented the AFC at Super Bowl 50?',
 'answer': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos']}

We add some demonstrations:

In [27]:
ex.demos = dsp.sample(squad_train, k=2)

ex

{'id': '56be4db0acb8001400a502ec',
 'title': 'Super_Bowl_50',
 'context': 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.',
 'question': 'Which NFL team represented the AFC at Super Bowl 50?',
 'answer': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos'],
 'demos': 

And then we call the LM using `qa_template`:

In [28]:
ex, ex_compl = dsp.generate(qa_template)(ex, stage='qa')

Here, `ex_compl` is a `Completions` instance. We will typically use only the `answer` attribute:

In [29]:
print(ex_compl.answer)

Denver Broncos


And, as a final check, we can see precisely what the LM saw:

In [30]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Question: ${the question to be answered}
Answer: ${a short factoid answer, often between 1 and 5 words}

---

Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Question: Immunoassays are able to detect what type of proteins?
Answer: generated by an infected organism in response to a foreign agent

Question: Which NFL team represented the AFC at Super Bowl 50?
Answer:[32m Denver Broncos[0m





## Few-shot QA with context

The above system makes no use of evidence passages. As a first step toward bringing in such passages, we define a regular few-shot QA system. For this system, prompts contain:

1. A sequences of Q/A demonstrations, each with a gold context passage.
2. The target question with a gold context passage.

This kind of system is very demanding in terms of data, since we need to have gold evidence passages for every Q/A pair used for demonstations and the Q that is our target. Datasets like SQuAD support this, but it's a rare situation in the world. (Our next system will address this by dropping the need for gold passages).

### Template with context

The first step toward defining this system is a new prompt template that includes context:

In [31]:
Context = dsp.Type(
    prefix="Context:\n",
    desc="${sources that may contain relevant content}",
    format=dsp.passages2text)

qa_template_with_passages = dsp.Template(
    instructions=qa_template.instructions,
    context=Context(), 
    question=Question(), 
    answer=Answer())

Here's what this does for a SQUaD example:

In [66]:
print(qa_template_with_passages(ex))

Answer questions with short factoid answers.

---

Context:
Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".
Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Context:
Complex serological techniques have been developed into what are known as Immunoassays. Immunoassays can use the basic antibody – antigen binding as th

### The system

And here is the full system; the code is identical to `few_shot_openqa` except we now use `qa_template_with_passages`:

In [33]:
@dsp.transformation
def few_shot_qa_with_context(example, train=squad_train, k=3):
    example.demos = dsp.sample(train, k=k)
    generator = dsp.generate(qa_template_with_passages)
    example, completions = generator(example, stage='qa')
    return completions

In [34]:
print(few_shot_qa_with_context(squad_dev[0]).answer)

Denver Broncos


In [35]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Context:
Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".
Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Context:
Complex serological techniques have been developed into what are known as Immunoassays. Immunoassays can use the basic antibody – antigen binding a

## Dev evaluations

This quick section shows some full evaluations using `evaluateAnswer` (see [Evaluation](#Evaluation) above). Depending on which model you're using, these evaluations could be expensive, so you might want to run them only sparingly. Here I am running them on just 25 dev examples to further avoid cost run-ups.

In [36]:
tiny_dev = dev_exs[: 25]

In [38]:
# few_shot_openqa_results = evaluateAnswer(few_shot_openqa, tiny_dev)

# print(few_shot_openqa_results['em'])
# print(few_shot_openqa_results['f1'])

You can also see the full set of results:

In [39]:
# few_shot_openqa_results['df'].head()

In [94]:
# few_shot_qa_results = evaluateAnswer(few_shot_qa_with_context, tiny_dev)

# print(few_shot_qa_results['em'])
# print(few_shot_qa_results['f1'])

  4%|▍         | 1/25 [00:00<00:13,  1.82it/s]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

  8%|▊         | 2/25 [00:10<02:22,  6.21s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 12%|█▏        | 3/25 [00:22<03:09,  8.63s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 16%|█▌        | 4/25 [00:35<03:39, 10.47s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 20%|██        | 5/25 [00:52<04:12, 12.65s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 24%|██▍       | 6/25 [01:11<04:45, 15.00s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 28%|██▊       | 7/25 [01:22<04:04, 13.60s/it]

Backing off 0.0 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 32%|███▏      | 8/25 [01:36<03:57, 13.94s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 36%|███▌      | 9/25 [02:00<04:33, 17.08s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 40%|████      | 10/25 [02:27<04:57, 19.86s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 44%|████▍     | 11/25 [02:41<04:15, 18.24s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 48%|████▊     | 12/25 [02:51<03:25, 15.82s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 52%|█████▏    | 13/25 [03:07<03:09, 15.75s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 56%|█████▌    | 14/25 [03:45<04:06, 22.37s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 60%|██████    | 15/25 [03:57<03:14, 19.44s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 64%|██████▍   | 16/25 [04:15<02:49, 18.79s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 68%|██████▊   | 17/25 [04:38<02:42, 20.25s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 72%|███████▏  | 18/25 [05:01<02:27, 21.14s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 76%|███████▌  | 19/25 [05:24<02:09, 21.66s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 80%|████████  | 20/25 [05:42<01:41, 20.39s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 84%|████████▍ | 21/25 [05:54<01:11, 17.88s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 88%|████████▊ | 22/25 [06:16<00:57, 19.21s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 92%|█████████▏| 23/25 [06:27<00:33, 16.71s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

 96%|█████████▌| 24/25 [06:38<00:14, 14.87s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x7f2a74b4e3a0> with args (<dsp.modules.gpt3.GPT3 object at 0x7f2a74ac9550>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously 

100%|██████████| 25/25 [06:48<00:00, 16.33s/it]

72.0
0.7701960784313725





In [41]:
# few_shot_qa_results['df'].head()

## Question 1: Few-shot OpenQA with context [3 points]

Your task here is to define a first instance of our target system: Few-shot OpenQA with context passages. To do this, you simply complete `few_shot_openqa_with_context`:

In [42]:
squad_train[0]

{'id': '5733be284776f41900661182',
 'title': 'University_of_Notre_Dame',
 'context': 'Architecturally, the school has a Catholic character. Atop the Main Building\'s gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend "Venite Ad Me Omnes". Next to the Main Building is the Basilica of the Sacred Heart. Immediately behind the basilica is the Grotto, a Marian place of prayer and reflection. It is a replica of the grotto at Lourdes, France where the Virgin Mary reputedly appeared to Saint Bernadette Soubirous in 1858. At the end of the main drive (and in a direct line that connects through 3 statues and the Gold Dome), is a simple, modern stone statue of Mary.',
 'question': 'To whom did the Virgin Mary allegedly appear in 1858 in Lourdes France?',
 'answer': ['Saint Bernadette Soubirous']}

In [86]:



@dsp.transformation
def few_shot_openqa_with_context(example, train=squad_train, k=3):
    
    # Sample `k` demonstrations from `train`:
    ##### YOUR CODE HERE
    k_demos = dsp.sample(train, k)

    # For each demonstration, retrieve one passage and add it
    # as the `context` attribute` so we can use our template
    # `qa_template_with_passages`:
    ##### YOUR CODE HERE
    for demo in k_demos:
        demo.context = dsp.retrieve(demo.question, k=1)
    

    # Add the list of demonstrations to `example` as the `demos` attribute:
    ##### YOUR CODE HERE
    example.demos = k_demos


    # Retrieve a context passage for `example` itself and add it
    # as the `context` attribute:
    ##### YOUR CODE HERE
    example.context = dsp.retrieve(example.question, k=1)


    # Use `dsp.generate` to call the model on `example` using
    # `qa_template_with_passages`:
    ##### YOUR CODE HERE
    generator = dsp.generate(qa_template_with_passages)



    # Return the Completions instance returned by `dsp.generate`:
    ##### YOUR CODE HERE
    example, completions = generator(example, stage='qa')
    return completions



A quick test you can use:

In [124]:
def test_few_shot_openqa_with_context(func):
    ex = dsp.Example(question="Q0", context="C0", answer=["A0"])
    train = [
        dsp.Example(question="Q1", context=None, answer=["A1"]),
        dsp.Example(question="Q2", context=None, answer=["A2"]),
        dsp.Example(question="Q3", context=None, answer=["A3"])]
    compl = func(ex, train=train, k=2)
    errcount = 0
    # Check the LM was used as expected:
    if len(compl.data) != 1:
        errcount += 1
        print(f"Error for `{func.__name__}`: Unexpected LM output.")
    data = compl.data[0]
    # Check that the right number of demos was used:
    demos = data['demos']
    if len(demos) > 2:
        errcount += 1
        print(f"Error for `{func.__name__}`: "
              f"Unexpected demo count: {len(demos)}")
    # Check that context passages were included in the prompt:
    fields = compl.template.fields
    if not any(f.name == 'Context:' for f in fields):
        errcount += 1
        print(f"Error for `{func.__name__}`: "
              f"No context passages in the prompt.")
    # Check that the context passages were retrieved:
    if data['context'] == "C0":
        errcount += 1
        print(f"Error for `{func.__name__}`: "
              f"No context passage retrieved for the target.")
    for d in demos:
        if d['context'] is None:
            errcount += 1
            print(f"Error for `{func.__name__}`: "
                  f"No context passage retrieved for demo {d}.")
    if errcount == 0:
        print(f"No errors found for `{func.__name__}`")

In [45]:
test_few_shot_openqa_with_context(few_shot_openqa_with_context)

No errors found for `few_shot_openqa_with_context`


In [82]:
print(few_shot_openqa_with_context(dev_exs[0]).answer)

deep sequencing


In [83]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Context:
[1] «Janet (album) | worldwide sales of over 14 million copies, it is Janet's best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»
[2] «Global Recording Artist of the Year | her third studio album, "25". It became the best-selling album of the year worldwide, selling 17.4 million copies. The album's lead single, "Hello" reached number one in nearly all countries where it charted, an

Here's an optional evaluation of the system using `tiny_dev`:

In [80]:
few_shot_openqa_with_context_results = evaluateAnswer(
    few_shot_openqa_with_context, tiny_dev)

few_shot_openqa_with_context_results['f1']

  8%|▊         | 2/25 [00:00<00:06,  3.56it/s]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 12%|█▏        | 3/25 [00:15<02:17,  6.24s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 16%|█▌        | 4/25 [00:28<03:10,  9.05s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 20%|██        | 5/25 [00:40<03:19,  9.96s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 24%|██▍       | 6/25 [00:51<03:16, 10.33s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 28%|██▊       | 7/25 [01:02<03:08, 10.45s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 32%|███▏      | 8/25 [01:17<03:24, 12.00s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 36%|███▌      | 9/25 [01:33<03:32, 13.31s/it]

Backing off 0.5 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 40%|████      | 10/25 [01:46<03:18, 13.21s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 44%|████▍     | 11/25 [02:01<03:09, 13.50s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 48%|████▊     | 12/25 [02:17<03:07, 14.40s/it]

Backing off 0.1 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 52%|█████▏    | 13/25 [02:32<02:53, 14.47s/it]

Backing off 0.2 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 56%|█████▌    | 14/25 [02:43<02:27, 13.45s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 60%|██████    | 15/25 [02:55<02:09, 12.96s/it]

Backing off 0.4 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 64%|██████▍   | 16/25 [03:15<02:17, 15.26s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 68%|██████▊   | 17/25 [03:33<02:09, 16.14s/it]

Backing off 1.0 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 72%|███████▏  | 18/25 [03:50<01:53, 16.25s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 76%|███████▌  | 19/25 [04:11<01:46, 17.81s/it]

Backing off 0.9 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 80%|████████  | 20/25 [04:25<01:22, 16.49s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 84%|████████▍ | 21/25 [04:49<01:15, 18.79s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 88%|████████▊ | 22/25 [05:01<00:50, 16.81s/it]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 92%|█████████▏| 23/25 [05:22<00:36, 18.03s/it]

Backing off 0.3 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

 96%|█████████▌| 24/25 [05:36<00:16, 16.79s/it]

Backing off 0.7 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nContext:\n«Janet (album) | worldwide sales of over 14 million copies, it is Janet\'s best selling album. Although Jackson has reached superstar status in the United States, she has yet to achieve the same level of response internationally. According to Nacy Berry, vice chairman of Virgin Records, "Janet" marked the first time the label "had centrally coordinated and strategized a campaign on a worldwide basis" which ultimately brought her to a plateau of global recognition. Her historic multimillion-dollar contract made her the highest-paid artist in history, until brother Michael renegotiated his contract with Sony Music Entertainment only days later. Sonia Murry noted that she»\nQuestion: What album made her a worldwide known artist?\nAnswer: Dangerously in Love\n\nContext:

100%|██████████| 25/25 [05:54<00:00, 14.17s/it]


0.5872727272727273

In [133]:
few_shot_openqa_with_context_results['df'].head()

Unnamed: 0,id,title,context,question,answer,prediction,em,f1
0,571c4132dd7acb1400e4c0b3,Oxygen,"In the meantime, on August 1, 1774, an experim...",Why is Priestley usually given credit for bein...,"[published his findings first, he published hi...",He published his findings first.,✔️,1.0
1,5733f9fa4776f4190066161f,French_and_Indian_War,"Colonel Monckton, in the sole British success ...",Who captured Fort Beausejour?,"[Colonel Monckton, Colonel Monckton, Colonel M...",Jonathan Eddy,❌,0.0
2,56d9c551dc89441400fdb7d2,Super_Bowl_50,"In late November 2015, reports surfaced statin...",Which single did Beyoncé and Coldplay collabor...,"[Hymn for the Weekend, Hymn for the Weekend, M...","""Up&Up""",❌,0.0
3,5705e26d75f01819005e76d5,Southern_California,"Southern California, often abbreviated SoCal, ...","Despite being traditionall described as ""eight...","[10 counties, 10, 10]",45,❌,0.0
4,56dfb5777aa994140058e025,Nikola_Tesla,After leaving Edison's company Tesla partnered...,What was produced at tesla's company?,"[dynamo electric machine commutators, electric...","TVs, radio receivers, transistors, integrated ...",❌,0.0


## Question 2: Using annotate

This question is designed to give you some experience with DSP's powerful `annotate` method. You can think of this as a generic tool for defining general aspects of your prompt. Here we will use it to filter the set of demonstrations we use.

The overall idea here is that the demonstrations we sample might vary in quality in ways that could impact model performance. For example, if we want to try to push the model to provide extractive answers as in classical QA – answers that are substrings of the evidence passage – then it works against our interests to include demonstrations where the model is unabel to do this.

We will do this in two parts to facilitate testing.

### Task 1: Filtering demonstrations 1 [2 points]

This is the heart of the question: complete `filter_demos` so that, given a demonstration `d` and a list of demonstrations `demos`, it keeps `d` if and only if

1. The passage retrieved for `d` contrains `d.answer`, and
2. The model's generation for `d` based on `qa_template_with_passages` contains `d.answer`.

In [239]:
@dsp.transformation
def filter_demos(d):

    # Retrieve a passage for `d.question` and make sure that it
    # contains `d.answer`. Use `dsp.passage_match` for this!
    # return None if there is no match.
    ##### YOUR CODE HERE
    
    # For gpt-3.5-4k, too many passages and demos are retrieved would cause query fail to run. You shouldn't retrieve or sample too much otherwise use gpt-3.5-16k

    passages = dsp.retrieve(d.question, k=1)

    for passage in passages:
        if dsp.passage_match(passage, d.answer):
            d.context.append(passage)

    # Sample `k=3` demonstrations to help the model assess this
    # potential demonstration:
    ##### YOUR CODE HERE
    demos = []
    for i in range(3):
        demo = dsp.sample(squad_train, k=1)[0]
        if demo.context and dsp.passage_match(demo.context, demo.answer):
            demos.append(demo)
    
    d.demos = demos
    


    # Generate an answer based on `qa_template_with_passages`
    # and use `dsp.answer_match` to check that the predicted answer
    # contains `d.answer`. If it does not, return None.
    ##### YOUR CODE HERE
    generator = dsp.generate(qa_template_with_passages)
    d, completions = generator(d, stage='qa')
    if dsp.answer_match(completions.data[0]['answer'], d.answer) is False:
        return None


    # Return d, if you got this far:
    ##### YOUR CODE HERE
    return d



Here's a test; this is not an ideal unit test because we don't know which LM you will be using, but it should clarify our intentions and help you with debugging.

In [232]:
def test_filter_demos(func):
    # This example should be filtered at the retrieval step, since
    # 👽 is not in the index:
    ex1 = dsp.Example(
        question="Who is 👽?", context="C0", answer=["👽"])
    result1 = func(ex1)
    errcount = 0
    if result1 is not None:
        errcount += 1
        print(f"Error for `{func.__name__}`: Expected {None}, got {result1}")
    # This example should not be filtered given our tester LM:
    ex2 = dsp.Example(
        question="Who is Beyoncé?", context="Beyoncé", answer=["Beyoncé"])
    # This example should be filtered given our tester LM:
    ex3 = dsp.Example(
        question="Who is Beyoncé?", context="C0", answer=["NO MATCH"])
    class TestLM:
        def __init__(self, **kwargs):
            self.kwargs = kwargs
            self.history = []

        def __call__(self, prompt, **kwargs):
            answer = ["Beyoncé"]
            return answer
    dsp.settings.configure(lm=TestLM(), rm=rm)
    try:
        result2 = func(ex2)
        if result2 is None:
            errcount += 1
            print(f"Error for `{func.__name__}`: "
                  f"Expected example not to be filtered by `answer_match`.")
        result3 = func(ex3)
        if result3 is not None:
            errcount += 1
            print(f"Error for `{func.__name__}`: "
                  f"Expected example to be filtered by `answer_match`.")
    except:
        raise
    finally:
        # Restore the actual model:
        dsp.settings.configure(lm=lm, rm=rm)
    if errcount == 0:
        print(f"No errors detected for `{func.__name__}`")

In [233]:
test_filter_demos(filter_demos)
lm.inspect_history(n=1)

No errors detected for `filter_demos`




Answer questions with short factoid answers.

---

Follow the following format.

Context:
${sources that may contain relevant content}

Question: ${the question to be answered}

Answer: ${a short factoid answer, often between 1 and 5 words}

---

Context:
C0

Question: Who is 👽?

Answer:[32m Alien.[0m





### Task 2: Full filtering program [1 point]

The task is to complete `few_shot_openqa_with_context_and_demo_filtering` as a few-shot OpenQA system like the one from Question 1, but using the filtering mechanism defined by `filter_demos`.

In [220]:
@dsp.transformation
def few_shot_openqa_with_context_and_demo_filtering(example, train=squad_train, k=3):

    # Sample 20 demonstrations:
    ##### YOUR CODE HERE
    demos = dsp.sample(train, 20)

    # Filter the demonstrations using `annotate` and `filter_demos`.
    # The user's `k` should be used to specify the maximum number of
    # demonstrations kept at this stage.
    ##### YOUR CODE HERE
    function = dsp.annotate(filter_demos)
    result = function(demos, k)

    # Add the list of filtered demonstrations as a the `demos`
    # attribute of `example`:
    ##### YOUR CODE HERE
    example.demos = result


    # Retrieve a context passage for `example.question` and add it
    # as the `context` attribute for the example:
    ##### YOUR CODE HERE
    example.context = dsp.retrieve(example.question, k=3)


    # Generate a prediction using `qa_template_with_passages` as
    # we did before:
    ##### YOUR CODE HERE
    generator = dsp.generate(qa_template_with_passages)
    example, completions = generator(example, stage='qa')


    # Return the generated `Completions` instance:
    ##### YOUR CODE HERE
    return completions



Our previous test should suffice to help with debugging this program:

In [240]:
test_few_shot_openqa_with_context(
    few_shot_openqa_with_context_and_demo_filtering)

['A1']
['A1']


TypeError: normalize() argument 2 must be str, not list

Quiick example:

In [222]:
print(few_shot_openqa_with_context_and_demo_filtering(dev_exs[1],k=1).answer)

Robert Underwood Johnson


In [170]:
dev_exs[1]

{'id': '56e11f05e3433e1400422c2f',
 'title': 'Nikola_Tesla',
 'context': 'Tesla was asocial and prone to seclude himself with his work. However, when he did engage in a social life, many people spoke very positively and admiringly of Tesla. Robert Underwood Johnson described him as attaining a "distinguished sweetness, sincerity, modesty, refinement, generosity, and force." His loyal secretary, Dorothy Skerrit, wrote: "his genial smile and nobility of bearing always denoted the gentlemanly characteristics that were so ingrained in his soul." Tesla\'s friend, Julian Hawthorne, wrote, "seldom did one meet a scientist or engineer who was also a poet, a philosopher, an appreciator of fine music, a linguist, and a connoisseur of food and drink.":80',
 'question': 'Who said Tesla had a "distinguished sweetness"?',
 'answer': ['Robert Underwood Johnson',
  'Robert Underwood Johnson',
  'Robert Underwood Johnson']}

In [223]:
lm.inspect_history(n=1)





Answer questions with short factoid answers.

---

Follow the following format.

Context:
${sources that may contain relevant content}

Question: ${the question to be answered}

Answer: ${a short factoid answer, often between 1 and 5 words}

---

Context:
Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".

Question: What album made her a worldwide kn

Here is code for an optional initial evaluation with `tiny_dev`:

In [224]:
filtering_results = evaluateAnswer(
    few_shot_openqa_with_context_and_demo_filtering, tiny_dev)

filtering_results['f1']

  0%|          | 0/25 [00:00<?, ?it/s]

Backing off 0.6 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nFollow the following format.\n\nContext:\n${sources that may contain relevant content}\n\nQuestion: ${the question to be answered}\n\nAnswer: ${a short factoid answer, often between 1 and 5 words}\n\n---\n\nContext:\nFrom the 10th to the 13th century, Romanesque architecture had become a pan-European style and manner of construction, affecting buildings in countries as far apart as Ireland, Croatia, Sweden and Sicily. The same wide geographic area was then affected by the development of Gothic architecture, but the acceptance of the Gothic style and methods of construction differed from place to place, as did the expressions of Gothic taste. The proximity of some regions meant that modern country borders do not define divisions of style. On the other hand, some regions such a

  4%|▍         | 1/25 [00:47<18:57, 47.39s/it]

Backing off 0.8 seconds after 1 tries calling function <function GPT3.request at 0x7fea388d4dc0> with args (<dsp.modules.gpt3.GPT3 object at 0x7fea388868b0>, 'Answer questions with short factoid answers.\n\n---\n\nFollow the following format.\n\nContext:\n${sources that may contain relevant content}\n\nQuestion: ${the question to be answered}\n\nAnswer: ${a short factoid answer, often between 1 and 5 words}\n\n---\n\nContext:\nBeyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny\'s Child. Managed by her father, Mathew Knowles, the group became one of the world\'s best-selling girl groups of all time. Their hiatus saw the release of Beyoncé\'s debut album, Dangerously in Love (2003), which established her as a solo a

  4%|▍         | 1/25 [00:50<20:12, 50.51s/it]


KeyboardInterrupt: 

In [154]:
filtering_results['df'].head()

Unnamed: 0,id,title,context,question,answer,prediction,em,f1
0,571c4132dd7acb1400e4c0b3,Oxygen,"In the meantime, on August 1, 1774, an experim...",Why is Priestley usually given credit for bein...,"[published his findings first, he published hi...",priority in publication.,❌,0.0
1,5733f9fa4776f4190066161f,French_and_Indian_War,"Colonel Monckton, in the sole British success ...",Who captured Fort Beausejour?,"[Colonel Monckton, Colonel Monckton, Colonel M...",British,✔️,1.0
2,56d9c551dc89441400fdb7d2,Super_Bowl_50,"In late November 2015, reports surfaced statin...",Which single did Beyoncé and Coldplay collabor...,"[Hymn for the Weekend, Hymn for the Weekend, M...","""Hymn for the Weekend""",✔️,1.0
3,5705e26d75f01819005e76d5,Southern_California,"Southern California, often abbreviated SoCal, ...","Despite being traditionall described as ""eight...","[10 counties, 10, 10]",10,✔️,1.0
4,56dfb5777aa994140058e025,Nikola_Tesla,After leaving Edison's company Tesla partnered...,What was produced at tesla's company?,"[dynamo electric machine commutators, electric...",electronics,❌,0.0


## Question 3: Your original system [3 points]

This question asks you to design your own few-shot OpenQA system. All of the code above can be used and modified for this, and the requirement is just that you try something new that goes beyond what we've done so far. 

Terms for the bake-off:

* You can make free use of SQuAD and other publicly available data.

* The LM must be an autoregressive language model. No trained QA components can be used. This includes general purpose LMs that have been fine-tuned for QA. (We have obviously waded into some vague territory here. The spirit of this is to make use of frozen, general-purpose models. We welcome questions about exactly how this is defined, since it could be instructive to explore this.)

Here are some ideas for the original system:

* We have so far sampled randomly from the SQuaD train set to create few-shot prompts. One might instead sample passages that have some connection to the target question. See `dsp.knn`, for example.

* There are a lot of parameters to our LMs that we have so far ignored. Exploring different values might lead to better results. The `temperature` parameter is highly impactful for our task.

* We have so far made no use of the scores from the LM or the RM.

* We have so far made no use of DSP's functionality for self-consistency. See the DPS intro notebook for examples.

__Original system instructions__:

In the cell below, please provide a brief technical description of your original system, so that the teaching team can gain an understanding of what it does. This will help us to understand your code and analyze all the submissions to identify patterns and strategies.

In [57]:
# PLEASE MAKE SURE TO INCLUDE THE FOLLOWING BETWEEN THE START AND STOP COMMENTS:
#   1) Textual description of your system.
#   2) The code for your original system.
# PLEASE MAKE SURE NOT TO DELETE OR EDIT THE START AND STOP COMMENTS
'''
优化思路：
1. 更改temperature,让lm更加多样性的回答
2. 对采样的few-shot使其和问题更加相关
3. 利用语言模型或者检索模型的分数
4. 使用DSP的self-consistency
'''

# START COMMENT: Enter your system description in this cell.

Rationale = dsp.Type(
    prefix="Rationale: Let's think step by step with given Context.",
    desc="${a step-by-step deduction that identifies the correct response, which will be provided below}"
)

qa_template_with_passages_and_CoT = dsp.Template(
    instructions=qa_template.instructions,
    context=Context(), 
    question=Question(),
    rational=Rationale(),
    answer=Answer())


@dsp.transformation
def my_few_shot_openqa_with_context_and_demo_filtering(example, train=squad_train, k=3):
    
    @dsp.transformation
    def filter_demos1(d):

        passages = dsp.retrieve(d.question, k=1)

        for passage in passages:
            if dsp.passage_match(passage, d.answer):
                d.context.append(passage)

        # Sample `k=3` demonstrations to help the model assess this
        # potential demonstration:
        ##### YOUR CODE HERE
        demos = []
        for i in range(3):
            demo = dsp.sample(squad_train, k=1)[0]
            if demo.context and dsp.passage_match(demo.context, demo.answer):
                demos.append(demo)
        
        d.demos = demos


        generator = dsp.generate(qa_template_with_passages_and_CoT)
        d, completions = generator(d, stage='qa')
        if dsp.answer_match(completions.data[0]['answer'], d.answer) is False:
            return None
        d.rational = completions.data[0]['rational']

        return d


    # Sample 20 demonstrations:
    ##### YOUR CODE HERE
    demos = dsp.sample(train, 20)

    # Filter the demonstrations using `annotate` and `filter_demos`.
    # The user's `k` should be used to specify the maximum number of
    # demonstrations kept at this stage.
    ##### YOUR CODE HERE
    function = dsp.annotate(filter_demos1)
    
    result = function(demos, 1)

    # Add the list of filtered demonstrations as a the `demos`
    # attribute of `example`:
    ##### YOUR CODE HERE
    example.demos = result


    # Retrieve a context passage for `example.question` and add it
    # as the `context` attribute for the example:
    ##### YOUR CODE HERE
    example.context = dsp.retrieve(example.question, k=1)



    generator = dsp.generate(qa_template_with_passages_and_CoT, n=10, temperature=0.7)
    example, completions = generator(example, stage='qa')
    completions = dsp.majority(completions)



    return completions




# STOP COMMENT: Please do not remove this comment.

In [65]:

# generator = dsp.generate(qa_template_with_passages_and_CoT)
# example, completions = generator(ex, stage='qa')
print(ex)
filtering_results = evaluateAnswer(
    my_few_shot_openqa_with_context_and_demo_filtering, [ex])
# lm.inspect_history(n=2)

{'id': '56be4db0acb8001400a502ec', 'title': 'Super_Bowl_50', 'context': 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.', 'question': 'Which NFL team represented the AFC at Super Bowl 50?', 'answer': ['Denver Broncos', 'Denver Broncos', 'Denver Broncos'], 'demos': [{'id

  0%|          | 0/1 [00:00<?, ?it/s]

***************8
[{'id': '57269de45951b619008f77e3', 'title': 'Gothic_architecture', 'context': ['Gothic architecture | and Sweden and Sicily. The same wide geographic area was then affected by the development of Gothic architecture, but the acceptance of the Gothic style and methods of construction differed from place to place, as did the expressions of Gothic taste. The proximity of some regions meant that modern country borders did not define divisions of style. On the other hand, some regions such as England and Spain produced defining characteristics rarely seen elsewhere, except where they have been carried by itinerant craftsmen, or the transfer of bishops. Many different factors like geographical/geological, economic, social, or political situations caused the'], 'question': 'Why did country borders not affect differences in style within Gothic architecture?', 'answer': ['proximity of some regions'], 'demos': [{'id': '56bf6b0f3aeaaa14008c9604', 'title': 'Beyoncé', 'context': 'B

100%|██████████| 1/1 [00:16<00:00, 16.22s/it]


AssertionError: _typ

Exception ignored in: 'pandas._libs.lib.is_interval'
Traceback (most recent call last):
  File "/home/aiscuser/.local/lib/python3.9/site-packages/dsp/primitives/predict.py", line 38, in __getattr__
    assert False, name
AssertionError: _typ


AssertionError: _typ

Exception ignored in: 'pandas._libs.lib.is_interval'
Traceback (most recent call last):
  File "/home/aiscuser/.local/lib/python3.9/site-packages/dsp/primitives/predict.py", line 38, in __getattr__
    assert False, name
AssertionError: _typ


KeyError: '_typ'

Exception ignored in: 'pandas._libs.lib.is_interval'
Traceback (most recent call last):
  File "/home/aiscuser/.conda/envs/cs224u/lib/python3.9/site-packages/pandas/core/internals/construction.py", line 1030, in convert
    arr = lib.maybe_convert_objects(
KeyError: '_typ'


KeyError: '_typ'

Exception ignored in: 'pandas._libs.lib.is_interval'
Traceback (most recent call last):
  File "/home/aiscuser/.conda/envs/cs224u/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1180, in maybe_infer_to_datetimelike
    return lib.maybe_convert_objects(  # type: ignore[return-value]
KeyError: '_typ'


AssertionError: _typ

Exception ignored in: 'pandas._libs.lib.is_interval'
Traceback (most recent call last):
  File "/home/aiscuser/.local/lib/python3.9/site-packages/dsp/primitives/predict.py", line 38, in __getattr__
    assert False, name
AssertionError: _typ


KeyError: '_typ'

Exception ignored in: 'pandas._libs.lib.is_interval'
Traceback (most recent call last):
  File "/home/aiscuser/.conda/envs/cs224u/lib/python3.9/site-packages/pandas/core/dtypes/cast.py", line 1180, in maybe_infer_to_datetimelike
    return lib.maybe_convert_objects(  # type: ignore[return-value]
KeyError: '_typ'


In [59]:
print(tiny_dev)

[{'id': '57264e2f708984140094c1e3', 'title': 'Black_Death', 'context': 'In October 2010, the open-access scientific journal PLoS Pathogens published a paper by a multinational team who undertook a new investigation into the role of Yersinia pestis in the Black Death following the disputed identification by Drancourt and Raoult in 1998. They assessed the presence of DNA/RNA with Polymerase Chain Reaction (PCR) techniques for Y. pestis from the tooth sockets in human skeletons from mass graves in northern, central and southern Europe that were associated archaeologically with the Black Death and subsequent resurgences. The authors concluded that this new research, together with prior analyses from the south of France and Germany, ". . . ends the debate about the etiology of the Black Death, and unambiguously demonstrates that Y. pestis was the causative agent of the epidemic plague that devastated Europe during the Middle Ages".', 'question': 'How did scientists assess the DNA/RNA of yer

In [64]:
filtering_results = evaluateAnswer(
    my_few_shot_openqa_with_context_and_demo_filtering, tiny_dev[2:3])

filtering_results['f1']

  0%|          | 0/1 [00:00<?, ?it/s]

***************8
[{'id': '57269de45951b619008f77e3', 'title': 'Gothic_architecture', 'context': ['Gothic architecture | and Sweden and Sicily. The same wide geographic area was then affected by the development of Gothic architecture, but the acceptance of the Gothic style and methods of construction differed from place to place, as did the expressions of Gothic taste. The proximity of some regions meant that modern country borders did not define divisions of style. On the other hand, some regions such as England and Spain produced defining characteristics rarely seen elsewhere, except where they have been carried by itinerant craftsmen, or the transfer of bishops. Many different factors like geographical/geological, economic, social, or political situations caused the'], 'question': 'Why did country borders not affect differences in style within Gothic architecture?', 'answer': ['proximity of some regions'], 'demos': [{'id': '56bf6b0f3aeaaa14008c9604', 'title': 'Beyoncé', 'context': 'B

100%|██████████| 1/1 [00:03<00:00,  3.10s/it]


0.0

In [87]:
lm.inspect_history(n=3)





Answer questions with short factoid answers.

---

Context:
Beyoncé Giselle Knowles-Carter (/biːˈjɒnseɪ/ bee-YON-say) (born September 4, 1981) is an American singer, songwriter, record producer and actress. Born and raised in Houston, Texas, she performed in various singing and dancing competitions as a child, and rose to fame in the late 1990s as lead singer of R&B girl-group Destiny's Child. Managed by her father, Mathew Knowles, the group became one of the world's best-selling girl groups of all time. Their hiatus saw the release of Beyoncé's debut album, Dangerously in Love (2003), which established her as a solo artist worldwide, earned five Grammy Awards and featured the Billboard Hot 100 number-one singles "Crazy in Love" and "Baby Boy".
Question: What album made her a worldwide known artist?
Answer: Dangerously in Love

Context:
Complex serological techniques have been developed into what are known as Immunoassays. Immunoassays can use the basic antibody – antigen binding a

In [93]:
filtering_results['df']

Unnamed: 0,id,title,context,question,answer,prediction,em,f1
0,5730c059069b531400832306,United_Methodist_Church,"The common pattern comes from John Wesley, who...",When did John Wesley provide a revised version...,[When the Methodists in America were separated...,American Revolution,❌,0.0
1,572a1a5c6aef051400155284,Economic_inequality,While acknowledging the central role economic ...,What needs to be made to ensure poorer members...,"[special efforts, special efforts, special eff...",special efforts,✔️,1.0
2,572648e8dd62a815002e8076,Ctenophora,Ranging from about 1 millimeter (0.039 in) to ...,What are the hairs on ctenophores called?,"[cilia, cilia, cilia]",ctenes,❌,0.0
3,5726a4a9708984140094ccb7,Genghis_Khan,"For the next several years, Hoelun and her chi...",Which of Temüjin's brothers took up the role o...,"[Begter, Begter, Begter]",Asael,❌,0.0
4,57283dbeff5b5019007d9fca,Doctor_Who,It has won the Short Form of the Hugo Award fo...,What Doctor Who episode won a Hugo Award in 2010?,"[The Waters of Mars, The Waters of Mars, The W...","""The Waters of Mars""",✔️,1.0
5,572869b84b864d19001649ae,Yuan_dynasty,Kublai's government after 1262 was a compromis...,What did Kublai's government have to balance b...,[preserving Mongol interests in China and sati...,Mongol interests and Chinese subjects,❌,0.588235
6,572ffb02b2c2fd14005686b7,Rhine,"From the Eocene onwards, the ongoing Alpine or...",What rift system developed in the Alpine orogeny?,"[N–S, N–S, N–S rift system]",European Cenozoic Rift System,❌,0.571429
7,57376df3c3c5551400e51ed9,Force,Pushing against an object on a frictional surf...,Static friction balances what force when there...,"[applied, applied force, applied force, applie...",applied force,✔️,1.0
8,5729d44b1d04691400779615,Economic_inequality,According to PolitiFact the top 400 richest Am...,What Institute published findings in September...,"[Institute for Policy Studies, Institute for P...",Institute for Policy Studies,✔️,1.0
9,56d9a0eadc89441400fdb63f,Super_Bowl_50,Peyton Manning became the first quarterback ev...,How old was Manning when he played Super Bowl 50?,"[39., 39, 39, 39]",39,✔️,1.0


## Question 4: Bakeoff entry [1 point]

For the bake-off, you simply need to be able to run your system on the file 

```data/openqa/cs224u-openqa-test-unlabeled.txt```

The following code should download it for you if necessary:

In [None]:
if not os.path.exists(os.path.join("data", "openqa", "cs224u-openqa-test-unlabeled.txt")):
    !mkdir -p data/openqa
    !wget https://web.stanford.edu/class/cs224u/data/cs224u-openqa-test-unlabeled.txt -P data/openqa/

If the above fails, you can just download https://web.stanford.edu/class/cs224u/data/cs224u-openqa-test-unlabeled.txt and place it in `data/openqa`.

This file contains only questions. The starter code below will help you structure this. It writes a file "cs224u-openqa-bakeoff-entry.json" to the current directory. That file should be uploaded as-is. Please do not change its name.

In [None]:
import json

def create_bakeoff_submission(fn):
    """"
    The argument `fn` is a DSP program with the same signature as the 
    ones we wrote above: `dsp.Example` to `dsp.Completions`.
    """

    filename = os.path.join("data", "openqa", "cs224u-openqa-test-unlabeled.txt")

    # This should become a mapping from questions (str) to response
    # dicts from your system.
    gens = {} 

    with open(filename) as f:
        questions = f.read().splitlines()

    questions = [dsp.Example(question=q) for q in questions]

    # `questions` is the list of `dsp.Example` instances you need to 
    # evaluate your system on. 
    #
    # Here we loop over the questions, run the system `fn`, and
    # store its `answer` value as the prediction:
    for question in tqdm.tqdm(questions):
        gens[question.question] = fn(question).answer

    # Quick tests we advise you to run: 
    # 1. Make sure `gens` is a dict with the questions as the keys:
    assert all(q.question in gens for q in questions)
    # 2. Make sure the values are dicts and have the key we will use:
    assert all(isinstance(d, str) for d in gens.values())

    # And finally the output file:
    with open("cs224u-openqa-bakeoff-entry.json", "wt") as f:
        json.dump(gens, f, indent=4)

Here's what it looks like to evaluate our first program, `few_shot_openqa`, on the bakeoff data:

In [None]:
# create_bakeoff_submission(few_shot_openqa_with_context)