# Review Summarization Prompt Optimization with DSPy (SIMBA/MIPROv2)

This script demonstrates how to optimize prompts in a DSPy program for summarizing user reviews.

## References:
- https://dspy.ai/learn/optimization/optimizers/#__tabbed_1_3
- https://dspy.ai/tutorials/classification_finetuning/
- https://dspy.ai/tutorials/math/

In [1]:
# 1. Setup
# Import required libraries and configure the language model.
import dspy
from typing import List, Dict
from app.llm.review_summarizer import SummarizeSignature, ReviewSummarizer
from app.llm.language import SupportedLanguage
import os

# Configure the language model (replace with your preferred model)
# dspy.configure(lm=dspy.LM('gpt-4.1-nano'))

In [2]:
# 2. Prepare Training Data
trainset = [
    # Hotel reviews in Czech
    dspy.Example(
        reviews=[
            "Room was clean and spacious.",
            "Excellent location, but noisy at night.",
            "Staff was helpful and check-in was quick."
        ],
        language="cs",  # Simple string
        reference="Uživatelé oceňují čistotu hotelu. Přilehlé ulice jsou hlučné."
    ).with_inputs("reviews", "language"),
    
    # Hotel reviews in Slovak
    dspy.Example(
        reviews=[
            "Room was clean and spacious.",
            "Excellent location, but noisy at night.",
            "Staff was helpful and check-in was quick."
        ],
        language="sk",  # Simple string
        reference="Používatelia oceňujú čistotu hotela. Priľahlé ulice sú hlučné."
    ).with_inputs("reviews", "language"),
    
    # # Product reviews in Czech
    # dspy.Example(
    #     reviews=[
    #         "Battery life is impressive.",
    #         "Screen quality is not as good as expected.",
    #         "Affordable price for the features offered."
    #     ],
    #     language="cs",  # Simple string
    #     reference="Uživatelé oceňují dlouhou výdrž baterie. Kvalita obrazovky by mohla být lepší."
    # ).with_inputs("reviews", "language"),
    
    # Product reviews in Slovak
    dspy.Example(
        reviews=[
            "Battery life is impressive.",
            "Screen quality is not as good as expected.",
            "Affordable price for the features offered."
        ],
        language="sk",  # Simple string
        reference="Používatelia oceňujú dlhú výdrž batérie. Kvalita obrazovky by mohla byť lepšia."
    ).with_inputs("reviews", "language")
]

In [3]:
from typing import Literal

# 4. Define LLM as Judge Metric
class JudgeSignature(dspy.Signature):
    """Judge if the summary is a good, faithful, and persuasive summary of the reviews in the specified language."""
    reviews = dspy.InputField()
    language = dspy.InputField(desc="Language code (cs = czech or sk = slovak)")
    reference = dspy.InputField()
    prediction = dspy.InputField()
    score: Literal[0,1] = dspy.OutputField(
        desc="1 if the prediction is a good, faithful, and persuasive summary of the reviews in the specified language, 0 otherwise."
    )

llm_judge = dspy.ChainOfThought(JudgeSignature)


def llm_judge_metric(example, prediction, trace=None):
    """Judge if the generated summary matches the reference in the specified language."""
    result = llm_judge(
        reviews="\n".join(example.reviews),
        language=example.language,  # Just pass the string directly
        reference=example.reference,
        prediction=prediction.summary
    )
    return float(result.score)

In [4]:
# 5. Run the Optimizer (SIMBA or MIPROv2)
optimizer = dspy.MIPROv2(
    metric=llm_judge_metric,
    num_threads=4,
    max_bootstrapped_demos=2
)

# Compile (optimize) the program
optimized_summarizer = optimizer.compile(
    ReviewSummarizer(),
    trainset=trainset,
    requires_permission_to_run=False
)

2025/05/25 19:04:08 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 10
minibatch: False
num_fewshot_candidates: 6
num_instruct_candidates: 3
valset size: 2

2025/05/25 19:04:08 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/25 19:04:08 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/05/25 19:04:08 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=6 sets of demonstrations...


Bootstrapping set 1/6
Bootstrapping set 2/6
Bootstrapping set 3/6


100%|██████████| 1/1 [00:06<00:00,  6.44s/it]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 4/6


100%|██████████| 1/1 [00:00<00:00, 429.57it/s]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/6


100%|██████████| 1/1 [00:00<00:00, 495.02it/s]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 6/6


100%|██████████| 1/1 [00:00<00:00, 545.00it/s]
2025/05/25 19:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/05/25 19:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/05/25 19:04:14 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing N=3 instructions...



Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
class SummarizeSignature(dspy.Signature):
    """Summarize the provided user reviews to maximize purchase intent.
    # Role
    You are an expert review summarizer. You know what makes people tick and buy when they read a review summary.
    # Instructions
    - Summarize the provided reviews
    - Adjust the format so that people are more likely to purchase
    - Provide the summary in the specified language
    """
    reviews = dspy.InputField(desc="All user reviews as a markdown list.")
    language = dspy.InputField(desc="The language code for the output summary (cs = czech or sk = slovak).")
    summary = dspy.OutputField(desc="A persuasive summary of the reviews in the specified language.")



2025/05/25 19:04:30 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/05/25 19:04:30 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Summarize the provided user reviews to maximize purchase intent.
# Role
You are an expert review summarizer. You know what makes people tick and buy when they read a review summary.
# Instructions
- Summarize the provided reviews
- Adjust the format so that people are more likely to purchase
- Provide the summary in the specified language

2025/05/25 19:04:30 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are an expert in crafting persuasive summaries of user reviews, especially in high-stakes scenarios where convincing potential customers is critical. Your task is to analyze the provided list of reviews and generate a concise, compelling summary that emphasizes the positive aspects and addresses potential concerns to maximize the likelihood of purchase. The summary must be tailored to the specified language, capturing the

  0%|          | 0/2 [00:00<?, ?it/s]



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:04<00:00,  2.48s/it]

2025/05/25 19:04:35 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:04:35 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 100.0

2025/05/25 19:04:35 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 10 =====



  0%|          | 0/2 [00:00<?, ?it/s]



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:04<00:00,  2.16s/it]

2025/05/25 19:04:40 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:04:40 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 3'].
2025/05/25 19:04:40 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0]
2025/05/25 19:04:40 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:04:40 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 10 =====



  0%|          | 0/2 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%):  50%|█████     | 1/2 [00:04<00:04,  4.46s/it]



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:13<00:00,  6.68s/it]

2025/05/25 19:04:53 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0]
2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 10 =====



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:00<00:00, 1294.54it/s]

2025/05/25 19:04:53 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0]
2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:04:53 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 10 =====



  0%|          | 0/2 [00:00<?, ?it/s]



Average Metric: 1.00 / 1 (100.0%):  50%|█████     | 1/2 [00:02<00:02,  2.06s/it]



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:05<00:00,  2.65s/it]

2025/05/25 19:04:58 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:04:58 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 2'].
2025/05/25 19:04:58 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:04:58 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:04:58 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 10 =====



  0%|          | 0/2 [00:00<?, ?it/s]



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:04<00:00,  2.08s/it]

2025/05/25 19:05:02 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 10 =====



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:00<00:00, 2732.45it/s]

2025/05/25 19:05:02 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 0'].
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 8 / 10 =====







Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:00<00:00, 3049.29it/s]

2025/05/25 19:05:02 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 9 / 10 =====



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:00<00:00, 3788.89it/s]

2025/05/25 19:05:02 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 4'].
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 10 / 10 =====



Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:00<00:00, 2209.85it/s]

2025/05/25 19:05:02 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 5'].
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 11 / 10 =====







Average Metric: 2.00 / 2 (100.0%): 100%|██████████| 2/2 [00:00<00:00, 3283.21it/s]

2025/05/25 19:05:02 INFO dspy.evaluate.evaluate: Average Metric: 2.0 / 2 (100.0%)
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 0'].
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/05/25 19:05:02 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 100.0!





In [5]:
# 4. Test the Optimized Program
test_reviews = [
    "The app is easy to use and very intuitive.",
    "Customer support was quick to respond.",
    "Some features are missing compared to competitors."
]

# Test Czech
czech_result = optimized_summarizer(
    reviews="\n".join(test_reviews),
    language=SupportedLanguage.CZECH
)
print("Czech Summary:", czech_result.summary)

# Test Slovak
slovak_result = optimized_summarizer(
    reviews="\n".join(test_reviews),
    language=SupportedLanguage.SLOVAK
)
print("\nSlovak Summary:", slovak_result.summary)

Czech Summary: Tato aplikace je jednoduchá na používání a velmi intuitivní, což zajišťuje pohodlné ovládání. Podpora zákazníků je rychlá a spolehlivá, což zvyšuje vaši jistotu při používání. Přestože některé funkce chybí ve srovnání s konkurencí, celkově nabízí skvělý zážitek a je skvělou volbou pro ty, kteří hledají jednoduchost a rychlou podporu.

Slovak Summary: Táto aplikácia je jednoduchá na používanie a veľmi intuitívna, čo z nej robí ideálnu voľbu pre každého. Rýchla a efektívna zákaznícka podpora zabezpečuje, že vaše otázky budú vždy rýchlo vyriešené. Hoci niektoré funkcie chýbajú v porovnaní s konkurenciou, jej jednoduché ovládanie a spoľahlivosť ju robia skvelou voľbou pre tých, ktorí hľadajú efektívne riešenie s ľahkým ovládaním.


In [6]:
dspy.inspect_history(n=2)





[34m[2025-05-25T19:05:09.286843][0m

[31mSystem message:[0m

Your input fields are:
1. `reviews` (str): All user reviews as a markdown list.
2. `language` (str): The language code for the output summary (cs = czech or sk = slovak).
Your output fields are:
1. `summary` (str): A persuasive summary of the reviews in the specified language.
All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## reviews ## ]]
{reviews}

[[ ## language ## ]]
{language}

[[ ## summary ## ]]
{summary}

[[ ## completed ## ]]
In adhering to this structure, your objective is: 
        Summarize the provided user reviews to maximize purchase intent.
        # Role
        You are an expert review summarizer. You know what makes people tick and buy when they read a review summary.
        # Instructions
        - Summarize the provided reviews
        - Adjust the format so that people are more likely to purchase
        - Provide the summary in the specified

In [7]:
# 7. Save and Load the Optimized Program
# Ensure the output directory exists
output_dir = os.path.abspath('../prompt')
os.makedirs(output_dir, exist_ok=True)
output_path = os.path.join(output_dir, "optimized_summarizer.json")

optimized_summarizer.save(output_path)

loaded_summarizer = ReviewSummarizer()
loaded_summarizer.load(path=output_path) 