This notebook introduces DSPy for a simple few-shot prompt engineering example

### Setting Up

In [1]:
import dspy
from groq import Groq


from dotenv import load_dotenv, dotenv_values 
# loading variables from .env file
load_dotenv() 
import os

model = os.getenv('model')
api_key = os.getenv('GROQ_API_KEY')


lm = dspy.GROQ(model=model, api_key=api_key)
dspy.configure(lm=lm)

In [2]:
# !pip install -U pip
# !pip install dspy-ai
# !pip install openai~=0.28.1
# # !pip install -e $repo_path

import dspy
import sys
import os
import json

### Getting Started

We'll start by setting up the language model (LM). **DSPy** supports multiple API and local models. In this notebook, we'll work with GPT-3.5 (`gpt-3.5-turbo`).


# forf those using OpenAi

In [None]:
# turbo = dspy.OpenAI(model='gpt-3.5-turbo',api_key ="")

# dspy.settings.configure(lm=turbo)


In [None]:
# #loading in the data
# !git clone https://github.com/skandavivek/transformerQA-finetuning.git

Cloning into 'transformerQA-finetuning'...


In [5]:
import pandas as pd
df = pd.read_csv('train.csv')


In [6]:
#as you can see, there are 2 types of answers - one is ANSWERNOTFOUND, the other is when the extracted answer is present
df['human_ans_spans'].astype(str)

0                                      ANSWERNOTFOUND
1                                      ANSWERNOTFOUND
2                                      ANSWERNOTFOUND
3                            this show is OUTSTANDING
4       The costume design by Susan Matheson is great
                            ...                      
2496                                   ANSWERNOTFOUND
2497                                   ANSWERNOTFOUND
2498                                   ANSWERNOTFOUND
2499                                   ANSWERNOTFOUND
2500                                   ANSWERNOTFOUND
Name: human_ans_spans, Length: 2501, dtype: object

In [7]:
df1=df.loc[df['human_ans_spans']=='ANSWERNOTFOUND'].reset_index(drop=True)
df2=df.loc[df['human_ans_spans']!='ANSWERNOTFOUND'].reset_index(drop=True)


In [8]:
df1=df1[:4][['question','review','human_ans_spans']]
df2=df2[:4][['question','review','human_ans_spans']]

df1.columns=['question','context','answer']

df2.columns=['question','context','answer']
df_c=pd.concat([df1,df2]).sort_index().reset_index(drop=True)
df_c


Unnamed: 0,question,context,answer
0,Who is the author of this series?,Whether it be in her portrayal of a nerdy lesb...,ANSWERNOTFOUND
1,Is this series good and excelent?,"At the time of my review, there had been 910 c...",this show is OUTSTANDING
2,How is the costume design?,"""Fright Night"" is great! This is how the story...",The costume design by Susan Matheson is great
3,Can we enjoy the movie along with our family ?,"An outstanding romantic comedy, 13 Going on 30...",ANSWERNOTFOUND
4,What criticism deserves the movie Passion of C...,Revenge of the Sith is for children and for ad...,Oh my god is this a STUPID film
5,Does this one good?,"To let the truth be known, I watched this movi...",ANSWERNOTFOUND
6,How are the special effects?,As with all Star Wars (or science fiction) mov...,ANSWERNOTFOUND
7,How is the scene?,The best KingKong movie ever made in my opinio...,KingKong


In [9]:
qa_pairs = json.loads(df_c.to_json(orient="records"))

In [10]:
qa_pairs[0]

{'question': 'Who is the author of this series?',
 'context': "Whether it be in her portrayal of a nerdy lesbian or a punk rock rebel, Maslany's plural personalities, (though very stereotypical), are entertaining eye-candy. Combined with a complex and unpredictable plot line, this show is surprisingly addictive. ANSWERNOTFOUND",
 'answer': 'ANSWERNOTFOUND'}

In [11]:
# Create dataset
dataset = [dspy.Example(x).with_inputs('question','context') for x in qa_pairs]

trainset = [x.with_inputs('question','context') for x in dataset[:4]]
devset = [x.with_inputs('question','context') for x in dataset[4:]]

In [12]:
train_example = trainset[0]
print(f"Question: {train_example.question}")
print(f"Context: {train_example.context}")
print(f"Answer: {train_example.answer}")

Question: Who is the author of this series?
Context: Whether it be in her portrayal of a nerdy lesbian or a punk rock rebel, Maslany's plural personalities, (though very stereotypical), are entertaining eye-candy. Combined with a complex and unpredictable plot line, this show is surprisingly addictive. ANSWERNOTFOUND
Answer: ANSWERNOTFOUND


In [13]:
dev_example = devset[0]


After loading the raw data, we'd applied `x.with_inputs('question')` to each example to tell **DSPy** that our input field in each example will be just `question`. Any other fields are labels or metadata that are not given to the system.

In [14]:
print(f"For this dataset, training examples have input keys {train_example.inputs().keys()} and label keys {train_example.labels().keys()}")
print(f"For this dataset, dev examples have input keys {dev_example.inputs().keys()} and label keys {dev_example.labels().keys()}")

For this dataset, training examples have input keys ['question', 'context'] and label keys ['answer']
For this dataset, dev examples have input keys ['question', 'context'] and label keys ['answer']


### Building Blocks

##### Using the Language Model: **Signatures** & **Predictors**

Every call to the LM in a **DSPy** program needs to have a **Signature**.

A signature consists of three simple elements:

- A minimal description of the sub-task the LM is supposed to solve.
- A description of one or more input fields (e.g., input question) that we will give to the LM.
- A description of one or more output fields (e.g., the question's answer) that we will expect from the LM.

Let's define a simple signature for basic question answering.

In [15]:
class BasicQA(dspy.Signature):
    """Answer questions with short factoid answers."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

In `BasicQA`, the docstring describes the sub-task here (i.e., answering questions). Each `InputField` or `OutputField` can optionally contain a description `desc` too. When it's not given, it's inferred from the field's name (e.g., `question`).

Notice that there isn't anything special about this signature in **DSPy**. We can just as easily define a signature that takes a long snippet from a PDF and outputs structured information, for instance.

Anyway, now that we have a signature, let's define and use a **Predictor**. A predictor is a module that knows how to use the LM to implement a signature. Importantly, predictors can **learn** to fit their behavior to the task!

In [16]:
# Define the predictor.
generate_answer = dspy.Predict(BasicQA)

# Call the predictor on a particular input.
pred = generate_answer(question=dev_example.question,context = dev_example.context)

# Print the input and the prediction.
print(f"Question: {dev_example.question}")
print(f"Predicted Answer: {pred.answer}")

 		You are using the client GroqLM, which will be removed in DSPy 2.6.
 		Changing the client is straightforward and will let you use new features (Adapters) that improve the consistency of LM outputs, especially when using chat LMs. 

 		Learn more about the changes and how to migrate at
 		https://github.com/stanfordnlp/dspy/blob/main/examples/migration.ipynb


Question: What criticism deserves the movie Passion of Christ by Mel Gibson?
Predicted Answer: Anti-Semitic backlash.


In [18]:
lm.inspect_history(n=1)




Answer questions with short factoid answers.

---

Follow the following format.

Question: ${question}
Answer: often between 1 and 5 words

---

Question: What criticism deserves the movie Passion of Christ by Mel Gibson?
Answer:[32m Anti-Semitic backlash.[0m





'\n\n\nAnswer questions with short factoid answers.\n\n---\n\nFollow the following format.\n\nQuestion: ${question}\nAnswer: often between 1 and 5 words\n\n---\n\nQuestion: What criticism deserves the movie Passion of Christ by Mel Gibson?\nAnswer:\x1b[32m Anti-Semitic backlash.\x1b[0m\n\n\n'

In [19]:
# Define the predictor. Notice we're just changing the class. The signature BasicQA is unchanged.
generate_answer_with_chain_of_thought = dspy.ChainOfThought(BasicQA)

# Call the predictor on the same input.
pred = generate_answer_with_chain_of_thought(question=dev_example.question)

# Print the input, the chain of thought, and the prediction.
print(f"Question: {dev_example.question}")
print(f"Thought: {pred.rationale.split(':', 1)[1].strip()}")
print(f"Predicted Answer: {pred.answer}")

Question: What criticism deserves the movie Passion of Christ by Mel Gibson?
Thought: What criticism deserves the movie Passion of Christ by Mel Gibson?
Reasoning: Let's think step by step in order to consider the controversy surrounding the film's depiction of violence and its perceived anti-Semitic themes.
Predicted Answer: Anti-Semitic backlash.


In [20]:
pred.rationale

"Question: What criticism deserves the movie Passion of Christ by Mel Gibson?\nReasoning: Let's think step by step in order to consider the controversy surrounding the film's depiction of violence and its perceived anti-Semitic themes."

### Program 1: Basic Few Shot

Let's define our first complete program for this task. We'll build a few-shot pipeline for answer generation.


Let's start by defining this signature: `context, question --> answer`.

In [21]:
class GenerateAnswer(dspy.Signature):
    """Answer questions with short factoid answers."""

    context = dspy.InputField(desc="may contain relevant facts")
    question = dspy.InputField()
    answer = dspy.OutputField(desc="often between 1 and 5 words")

Great. Now let's define the actual program. This is a class that inherits from `dspy.Module`.

It needs two methods:

- The `__init__` method will simply declare the sub-modules it needs: `dspy.Retrieve` and `dspy.ChainOfThought`. The latter is defined to implement our `GenerateAnswer` signature.
- The `forward` method will describe the control flow of answering the question using the modules we have.

In [22]:
class fs(dspy.Module):
    def __init__(self, num_passages=3):
        super().__init__()

        #self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_answer = dspy.ChainOfThought(GenerateAnswer)

    def forward(self, question,context):
        #context = self.retrieve(question).passages
        prediction = self.generate_answer(context=context, question=question)
        return dspy.Prediction(context=context, answer=prediction.answer)

##### Compiling the few shot program

Having defined this program, let's now **compile** it. Compiling a program will update the parameters stored in each module. In our setting, this is primarily in the form of collecting and selecting good demonstrations for inclusion in your prompt(s).

Compiling depends on three things:

1. **A training set.** We'll just use our 20 question–answer examples from `trainset` above.
1. **A metric for validation.** We'll define a quick `validate_answer` that checks that the predicted answer is correct. It'll also check that the retrieved context does actually contain that answer.
1. **A specific teleprompter.** The **DSPy** compiler includes a number of **teleprompters** that can optimize your programs.

**Teleprompters:** Teleprompters are powerful optimizers that can take any program and learn to bootstrap and select effective prompts for its modules. Hence the name, which means "prompting at a distance".

Different teleprompters offer various tradeoffs in terms of how much they optimize cost versus quality, etc. We will use a simple default `BootstrapFewShot` in this notebook.


_If you're into analogies, you could think of this as your training data, your loss function, and your optimizer in a standard DNN supervised learning setup. Whereas SGD is a basic optimizer, there are more sophisticated (and more expensive!) ones like Adam or RMSProp._

In [23]:
from dspy.teleprompt import BootstrapFewShot

# Validation logic: check that the predicted answer is correct.
# Also check that the retrieved context does actually contain that answer.
def validate_answer(example, pred, trace=None):
    answer_EM = dspy.evaluate.answer_exact_match(example, pred)
    #answer_PM = dspy.evaluate.answer_passage_match(example, pred)
    return answer_EM

# Set up a basic teleprompter, which will compile our program.
teleprompter = BootstrapFewShot(metric=validate_answer)

# Compile!
compiled_fs = teleprompter.compile(fs(), trainset=trainset)

100%|██████████| 4/4 [00:05<00:00,  1.45s/it]

Bootstrapped 0 full traces after 3 examples for up to 1 rounds, amounting to 4 attempts.





Now that we've compiled our program, let's try it out.

In [24]:
# Ask any question you like to this simple few-shot program.


my_question = "Do you like Avocados?"
context = dev_example.context

# Get the prediction. This contains `pred.context` and `pred.answer`.
pred = compiled_fs(my_question,context)

# Print the contexts and the answer.
print(f"Question: {my_question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Retrieved Contexts: {pred.context}")

Question: Do you like Avocados?
Predicted Answer: I apologize, but it seems that the context provided is a review of a movie, and the question is not related to the movie. Therefore, I cannot provide an answer to the question "Do you like Avocados?" as it is not relevant to the context.
Retrieved Contexts: Revenge of the Sith is for children and for adults with the minds of children.  To a grownup, this movie is moronic and boring.The only good part of the movie is the very end when we see the birth of Princess Leia and her twin brother Luke Skywalker, and they are placed in foster homes because their mom is dead and their dad is the enemy.  The babies are cute, and you really can't lose by showing newborn babies.Aside from that, this movie is horrible.  The fight scenes are worse than kung fu flicks with idiots hopping around like bunnies.  The acting, particularly that of Anakin Skywalker, is really juvenile and pathetic.  The writing is worst of all.  The writing really deserves to 

Excellent. How about we inspect the last prompt for the LM?

In [25]:
lm.inspect_history(n=1)




Answer questions with short factoid answers.

---

Context: An outstanding romantic comedy, 13 Going on 30, brings to the screen exactly what the title implies: the story of a 13-year old girl who has her wish fulfilled and wakes up seven years later in the body of her 30-year old self!13 Going on 30 is based on the hit 80's movie "BIG" starring Tom Hanks, and it is a film about human relations, hope and second chances, but most importantly about trust, love, and inner strength.Jennifer Garner (who is ABSOLUTELY GORGEOUS!!!), Mark Rufallo, Andy Serkis, and the rest of the cast, have outdone themselves with their performances, which are exceptional to say the least.  All the actors, without exceptions, give it their 100% and it really shows (the chemistry is AMAZING)! Very well written and very well presented, the movie is without a doubt guaranteed to provide more than just a few laughs, not to mention a few tears.  The film is simple enough, but does a great job of describing peopl

'\n\n\nAnswer questions with short factoid answers.\n\n---\n\nContext: An outstanding romantic comedy, 13 Going on 30, brings to the screen exactly what the title implies: the story of a 13-year old girl who has her wish fulfilled and wakes up seven years later in the body of her 30-year old self!13 Going on 30 is based on the hit 80\'s movie "BIG" starring Tom Hanks, and it is a film about human relations, hope and second chances, but most importantly about trust, love, and inner strength.Jennifer Garner (who is ABSOLUTELY GORGEOUS!!!), Mark Rufallo, Andy Serkis, and the rest of the cast, have outdone themselves with their performances, which are exceptional to say the least.  All the actors, without exceptions, give it their 100% and it really shows (the chemistry is AMAZING)! Very well written and very well presented, the movie is without a doubt guaranteed to provide more than just a few laughs, not to mention a few tears.  The film is simple enough, but does a great job of describ