# Review Summarization Prompt Optimization with DSPy (SIMBA/MIPROv2)

This script demonstrates how to optimize prompts in a DSPy program for summarizing user reviews.

## References:
- https://dspy.ai/learn/optimization/optimizers/#__tabbed_1_3
- https://dspy.ai/tutorials/classification_finetuning/
- https://dspy.ai/tutorials/math/

In [1]:
# 1. Setup
# Import required libraries and configure the language model.
import dspy
from typing import List, Dict
from app.llm.review_summarizer import SummarizeSignature, ReviewSummarizer
import os

# Configure the language model (replace with your preferred model)
# dspy.configure(lm=dspy.LM('gpt-4.1-nano'))

In [2]:
# 3. Prepare Training Data
# Each example is a list of reviews and a reference summary.
trainset = [
    dspy.Example(
        reviews=[
            "Room was clean and spacious.",
            "Excellent location, but noisy at night.",
            "Staff was helpful and check-in was quick."
        ],
        # summary="The hotel has clean, spacious rooms and helpful staff, but can be noisy at night due to its location.",
        reference="Users like the cleanliness of the hotel. The adjecent streets are noisy."
    ).with_inputs("reviews"),
    dspy.Example(
        reviews=[
            "Battery life is impressive.",
            "Screen quality is not as good as expected.",
            "Affordable price for the features offered."
        ],
        # summary="The product offers great battery life and features for its price, but the screen quality could be better.",
        reference="Users like that the product offers greaat battery life. The screen quality could be better though."
    ).with_inputs("reviews"),
]

In [3]:
from typing import Literal

# 4. Define LLM as Judge Metric
class JudgeSignature(dspy.Signature):
    """Judge if the summary is a good, faithful, and persuasive summary of the reviews."""
    reviews = dspy.InputField()
    reference = dspy.InputField()
    score: Literal[0,1] = dspy.OutputField(desc="1 if the summary is a good, faithful, and persuasive summary of the reviews, 0 otherwise.")

llm_judge = dspy.ChainOfThought(JudgeSignature)


def llm_judge_metric(example, prediction, trace=None):
    return prediction.summary == example.reference

def llm_judge_metric(example, prediction, trace=None):
    result = llm_judge(
        reviews="\n".join(example.reviews),
        reference=example.reference
    )
    return float(result.score)

In [4]:
# 5. Run the Optimizer (SIMBA or MIPROv2)
optimizer = dspy.MIPROv2(
    metric=llm_judge_metric,
    num_threads=4,
    max_bootstrapped_demos=2
)

# Compile (optimize) the program
optimized_summarizer = optimizer.compile(
    ReviewSummarizer(),
    trainset=trainset,
    requires_permission_to_run=False
)

2025/05/25 18:18:50 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 10
minibatch: False
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 1

2025/05/25 18:18:50 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/25 18:18:50 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/05/25 18:18:50 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...


Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


100%|██████████| 1/1 [00:02<00:00,  2.18s/it]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 4/6


100%|██████████| 1/1 [00:00<00:00, 401.14it/s]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/6


100%|██████████| 1/1 [00:00<00:00, 540.57it/s]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 6/6


100%|██████████| 1/1 [00:00<00:00, 398.55it/s]
2025/05/25 18:18:52 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/05/25 18:18:52 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
class SummarizeSignature(dspy.Signature):
    """Summarize the provided user reviews to maximize purchase intent.
    # Role
    You are an expert review summarizer. You know what makes people tick and buy when they read a review summary.
    # Instructions
    - Summarize the provided reviews
    - Adjust the format so that people are more likely to purchase
    """
    reviews = dspy.InputField(desc="All user reviews as a markdown list.")
    summary = dspy.OutputField(desc="A persuasive summary of the reviews.")



2025/05/25 18:18:55 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...

2025/05/25 18:19:10 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/05/25 18:19:10 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Summarize the provided user reviews to maximize purchase intent.
# Role
You are an expert review summarizer. You know what makes people tick and buy when they read a review summary.
# Instructions
- Summarize the provided reviews
- Adjust the format so that people are more likely to purchase

2025/05/25 18:19:10 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are an expert review summarizer tasked with creating persuasive summaries of user reviews to boost purchase intent. Given a list of reviews formatted as a markdown list, generate a concise, compelling summary that highlights positive features and addresses potential concerns in a way that encourages potential customers to make a purchase or engage with the product or service. F

  0%|          | 0/1 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:04<00:00,  4.19s/it]

2025/05/25 18:19:14 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:14 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 100.0

2025/05/25 18:19:14 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 10 =====



  0%|          | 0/1 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:01<00:00,  1.16s/it]

2025/05/25 18:19:16 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:16 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2025/05/25 18:19:16 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0]
2025/05/25 18:19:16 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:16 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 10 =====



  0%|          | 0/1 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:02<00:00,  2.11s/it]

2025/05/25 18:19:18 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0]
2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 10 =====







Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 541.41it/s]

2025/05/25 18:19:18 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:18 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 10 =====



  0%|          | 0/1 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:01<00:00,  1.30s/it]

2025/05/25 18:19:19 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2'].
2025/05/25 18:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:19 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 10 =====



  0%|          | 0/1 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:01<00:00,  1.11s/it]

2025/05/25 18:19:20 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 10 =====







Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 1658.48it/s]

2025/05/25 18:19:20 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 10 =====



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 1597.22it/s]

2025/05/25 18:19:20 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 10 =====



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 805.67it/s]

2025/05/25 18:19:20 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 10 =====



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 793.02it/s]

2025/05/25 18:19:20 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 10 =====



Average Metric: 1.00 / 1 (100.0%): 100%|██████████| 1/1 [00:00<00:00, 3344.74it/s]

2025/05/25 18:19:20 INFO dspy.evaluate.evaluate: Average Metric: 1.0 / 1 (100.0%)
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 18:19:20 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 100.0!





In [5]:
# 6. Evaluate the Optimized Program
# Test the optimized summarizer on a new set of reviews.
test_reviews = [
    "The app is easy to use and very intuitive.",
    "Customer support was quick to respond.",
    "Some features are missing compared to competitors."
]
result = optimized_summarizer(reviews=test_reviews)
print("Summary:", result.summary)

Summary: Customers love how user-friendly and intuitive the app is, making it easy to get started right away. They also appreciate the quick and responsive customer support. While some users wish for additional features found in competitors, the overall experience is positive, making this app a reliable choice for those seeking simplicity and excellent support.


In [11]:
dspy.inspect_history(n=2)





[34m[2025-05-25T18:19:20.719894][0m

[31mSystem message:[0m

Your input fields are:
1. `reviews` (str)
2. `reference` (str)
Your output fields are:
1. `reasoning` (str)
2. `score` (Literal[0, 1]): 1 if the summary is a good, faithful, and persuasive summary of the reviews, 0 otherwise.
All interactions will be structured in the following way, with the appropriate values filled in.

Inputs will have the following structure:

[[ ## reviews ## ]]
{reviews}

[[ ## reference ## ]]
{reference}

Outputs will be a JSON object with the following fields.

{
  "reasoning": "{reasoning}",
  "score": "{score}        # note: the value you produce must exactly match (no extra characters) one of: 0; 1"
}
In adhering to this structure, your objective is: 
        Judge if the summary is a good, faithful, and persuasive summary of the reviews.


[31mUser message:[0m

[[ ## reviews ## ]]
Battery life is impressive.
Screen quality is not as good as expected.
Affordable price for the features off

In [6]:
# 7. Save and Load the Optimized Program
# Ensure the output directory exists
output_dir = os.path.abspath('../prompt')
os.makedirs(output_dir, exist_ok=True)
output_path = os.path.join(output_dir, "optimized_summarizer.json")

optimized_summarizer.save(output_path)

loaded_summarizer = ReviewSummarizer()
loaded_summarizer.load(path=output_path) 