In [None]:
# https://dspy-docs.vercel.app/docs/quick-start/minimal-example

In [1]:
import dspy
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric

# Set up the LM
turbo = dspy.OpenAI(model='gpt-3.5-turbo-instruct', max_tokens=250)
dspy.settings.configure(lm=turbo)

# Load math questions from the GSM8K dataset
gms8k = GSM8K()
gsm8k_trainset, gsm8k_devset = gms8k.train[:10], gms8k.dev[:10]

Downloading readme:   0%|          | 0.00/7.94k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/2 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/419k [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/2 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7473/7473 [00:00<00:00, 45251.95it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 45481.19it/s]


In [13]:
gsm8k_trainset[0].keys()

['question', 'gold_reasoning', 'answer']

In [10]:
gsm8k_trainset[0].question

"The result from the 40-item Statistics exam Marion and Ella took already came out. Ella got 4 incorrect answers while Marion got 6 more than half the score of Ella. What is Marion's score?"

In [14]:
gsm8k_trainset[0].gold_reasoning

"Ella's score is 40 items - 4 items = <<40-4=36>>36 items. Half of Ella's score is 36 items / 2 = <<36/2=18>>18 items. So, Marion's score is 18 items + 6 items = <<18+6=24>>24 items."

In [15]:
gsm8k_trainset[0].answer

'24'

In [2]:
class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought("question -> answer")
    
    def forward(self, question):
        return self.prog(question=question)

In [21]:
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch

# Set up the optimizer: we want to "bootstrap" (i.e., self-generate) 4-shot examples of our CoT program.
config = dict(max_bootstrapped_demos=4, max_labeled_demos=4)

# Optimize! Use the `gms8k_metric` here. In general, the metric is going to tell the optimizer how well it's doing.
teleprompter = BootstrapFewShotWithRandomSearch(metric=gsm8k_metric, **config)
optimized_cot = teleprompter.compile(CoT(), trainset=gsm8k_trainset, valset=gsm8k_devset)

Going to sample between 1 and 4 traces per predictor.
Will attempt to train 16 candidate sets.


Average Metric: 6 / 10  (60.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1219.24it/s]


Average Metric: 6 / 10  (60.0%)
Score: 60.0 for set: [0]
New best score: 60.0 for seed -3
Scores so far: [60.0]
Best score: 60.0


Average Metric: 8 / 10  (80.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1957.39it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
New best score: 80.0 for seed -2
Scores so far: [60.0, 80.0]
Best score: 80.0


 50%|███████████████████████████████████████████████████████████████████████████████████                                                                                   | 5/10 [00:00<00:00, 2591.96it/s]


Bootstrapped 4 full traces after 6 examples in round 0.


Average Metric: 7 / 10  (70.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2191.15it/s]


Average Metric: 7 / 10  (70.0%)
Score: 70.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0]
Best score: 80.0
Average of max per entry across top 1 scores: 0.8
Average of max per entry across top 2 scores: 0.9
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 50%|███████████████████████████████████████████████████████████████████████████████████                                                                                   | 5/10 [00:00<00:00, 3579.37it/s]


Bootstrapped 4 full traces after 6 examples in round 0.


Average Metric: 8 / 10  (80.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2865.35it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0]
Best score: 80.0
Average of max per entry across top 1 scores: 0.8
Average of max per entry across top 2 scores: 0.9
Average of max per entry across top 3 scores: 0.9
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 20%|█████████████████████████████████▏                                                                                                                                    | 2/10 [00:00<00:00, 3236.35it/s]


Bootstrapped 2 full traces after 3 examples in round 0.


Average Metric: 8 / 10  (80.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2827.30it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0]
Best score: 80.0
Average of max per entry across top 1 scores: 0.8
Average of max per entry across top 2 scores: 0.9
Average of max per entry across top 3 scores: 0.9
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|████████████████▌                                                                                                                                                     | 1/10 [00:00<00:00, 1572.08it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 9 / 10  (90.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2507.51it/s]


Average Metric: 9 / 10  (90.0%)
Score: 90.0 for set: [4]
New best score: 90.0 for seed 2
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 30%|█████████████████████████████████████████████████▊                                                                                                                    | 3/10 [00:00<00:00, 1385.02it/s]


Bootstrapped 2 full traces after 4 examples in round 0.


Average Metric: 8 / 10  (80.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2615.88it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 20%|█████████████████████████████████▏                                                                                                                                    | 2/10 [00:00<00:00, 2684.35it/s]


Bootstrapped 2 full traces after 3 examples in round 0.


Average Metric: 4 / 10  (40.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2742.63it/s]


Average Metric: 4 / 10  (40.0%)
Score: 40.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 30%|██████████████████████████████████████████████████▍                                                                                                                     | 3/10 [00:00<00:00, 90.17it/s]


Bootstrapped 3 full traces after 4 examples in round 0.


Average Metric: 6 / 10  (60.0): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 587.82it/s]


Average Metric: 6 / 10  (60.0%)
Score: 60.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|████████████████▌                                                                                                                                                     | 1/10 [00:00<00:00, 2132.34it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 6 / 10  (60.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2360.06it/s]


Average Metric: 6 / 10  (60.0%)
Score: 60.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 60%|███████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                  | 6/10 [00:00<00:00, 2016.82it/s]

Bootstrapped 3 full traces after 7 examples in round 0.



Average Metric: 7 / 10  (70.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2483.89it/s]


Average Metric: 7 / 10  (70.0%)
Score: 70.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 20%|█████████████████████████████████▏                                                                                                                                    | 2/10 [00:00<00:00, 1438.87it/s]

Bootstrapped 2 full traces after 3 examples in round 0.



Average Metric: 8 / 10  (80.0): 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 942.41it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 40%|██████████████████████████████████████████████████████████████████▍                                                                                                   | 4/10 [00:00<00:00, 3205.43it/s]


Bootstrapped 4 full traces after 5 examples in round 0.


Average Metric: 7 / 10  (70.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2835.91it/s]


Average Metric: 7 / 10  (70.0%)
Score: 70.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|████████████████▌                                                                                                                                                     | 1/10 [00:00<00:00, 2680.07it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 6 / 10  (60.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2905.25it/s]


Average Metric: 6 / 10  (60.0%)
Score: 60.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0, 60.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 60%|███████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                  | 6/10 [00:00<00:00, 2940.96it/s]


Bootstrapped 4 full traces after 7 examples in round 0.


Average Metric: 7 / 10  (70.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2973.42it/s]


Average Metric: 7 / 10  (70.0%)
Score: 70.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0, 60.0, 70.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 40%|██████████████████████████████████████████████████████████████████▍                                                                                                   | 4/10 [00:00<00:00, 4051.49it/s]

Bootstrapped 4 full traces after 5 examples in round 0.



Average Metric: 7 / 10  (70.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2677.33it/s]


Average Metric: 7 / 10  (70.0%)
Score: 70.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0, 60.0, 70.0, 70.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 40%|██████████████████████████████████████████████████████████████████▍                                                                                                   | 4/10 [00:00<00:00, 3704.40it/s]


Bootstrapped 3 full traces after 5 examples in round 0.


Average Metric: 8 / 10  (80.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2275.31it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0, 60.0, 70.0, 70.0, 80.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 10%|████████████████▌                                                                                                                                                     | 1/10 [00:00<00:00, 2046.00it/s]


Bootstrapped 1 full traces after 2 examples in round 0.


Average Metric: 6 / 10  (60.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2209.85it/s]


Average Metric: 6 / 10  (60.0%)
Score: 60.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0, 60.0, 70.0, 70.0, 80.0, 60.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0


 30%|█████████████████████████████████████████████████▊                                                                                                                    | 3/10 [00:00<00:00, 3895.64it/s]


Bootstrapped 2 full traces after 4 examples in round 0.


Average Metric: 8 / 10  (80.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 2057.24it/s]


Average Metric: 8 / 10  (80.0%)
Score: 80.0 for set: [4]
Scores so far: [60.0, 80.0, 70.0, 80.0, 80.0, 90.0, 80.0, 40.0, 60.0, 60.0, 70.0, 80.0, 70.0, 60.0, 70.0, 70.0, 80.0, 60.0, 80.0]
Best score: 90.0
Average of max per entry across top 1 scores: 0.9
Average of max per entry across top 2 scores: 1.0
Average of max per entry across top 3 scores: 1.0
Average of max per entry across top 5 scores: 1.0
Average of max per entry across top 8 scores: 1.0
Average of max per entry across top 9999 scores: 1.0
19 candidate programs found.


In [22]:
from dspy.evaluate import Evaluate

# Set up the evaluator, which can be used multiple times.
evaluate = Evaluate(devset=gsm8k_devset, metric=gsm8k_metric, num_threads=4, display_progress=True, display_table=0)

# Evaluate our `optimized_cot` program.
evaluate(optimized_cot)

Average Metric: 9 / 10  (90.0): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1830.22it/s]

Average Metric: 9 / 10  (90.0%)





90.0

In [41]:
optimized_cot

prog = ChainOfThought(Signature(question -> answer
    instructions='Given the fields `question`, produce the fields `answer`.'
    question = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'input', 'prefix': 'Question:', 'desc': '${question}'})
    answer = Field(annotation=str required=True json_schema_extra={'__dspy_field_type': 'output', 'prefix': 'Answer:', 'desc': '${answer}'})
))

In [29]:
dir(optimized_cot)

['__call__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__slotnames__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_assert_failures',
 '_base_init',
 '_compiled',
 '_suggest_failures',
 'activate_assertions',
 'candidate_programs',
 'deepcopy',
 'dump_state',
 'forward',
 'load',
 'load_state',
 'map_named_predictors',
 'named_parameters',
 'named_predictors',
 'parameters',
 'predictors',
 'prog',
 'reset_copy',
 'save']

In [23]:
turbo.inspect_history(n=10)





Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: ${answer}

---

Question: The average score on last week's Spanish test was 90. Marco scored 10% less than the average test score and Margaret received 5 more points than Marco. What score did Margaret receive on her test?
Reasoning: Let's think step by step in order to find Margaret's score. We know that the average score was 90, so Marco's score was 10% less than 90, which is 90 * 0.9 = 81. Then, Margaret's score was 5 more points than Marco's, which is 81 + 5 = 86.
Answer: 86

---

Question: Amaya scored 20 marks fewer in Maths than she scored in Arts. She also got 10 marks more in Social Studies than she got in Music. If she scored 70 in Music and scored 1/10 less in Maths, what's the total number of marks she scored in all the subjects?
Answer: 296

---

Question: Megan pays $16