<a href="https://colab.research.google.com/github/wesslen/llm-examples/blob/main/notebooks/dspy_intro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
%%capture
!uv pip install --system dspy==2.6.15
!uv pip install --system sacrebleu

In [2]:
import dspy

dspy.__version__

'2.6.15'

In [3]:
from google.colab import userdata

lm = dspy.LM("gemini/gemini-2.5-flash-lite", api_key = userdata.get('GOOGLE_API_KEY'))
dspy.configure(lm = lm)

In [4]:
lm("What is 2+2")

['2 + 2 = 4']

### dspy.Predict and dspy.ChainOfThought

In [17]:
# Predict = run inference with default prompt template
# Change to ChainOfThought for reasoning
translate = dspy.Predict("source -> translation")

translate(source="$200 de bonificación de recompensas en efectivo")

Prediction(
    reasoning='The user wants to translate the Spanish phrase "$200 de bonificación de recompensas en efectivo" into English. The phrase translates to "$200 cash rewards bonus".',
    translation='$200 cash rewards bonus'
)

In [6]:
lm.inspect_history(n=1)





[34m[2025-08-16T20:41:08.238140][0m

[31mSystem message:[0m

Your input fields are:
1. `source` (str)

Your output fields are:
1. `translation` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## source ## ]]
{source}

[[ ## translation ## ]]
{translation}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `source`, produce the fields `translation`.


[31mUser message:[0m

[[ ## source ## ]]
$200 de bonificación de recompensas en efectivo

Respond with the corresponding output fields, starting with the field `[[ ## translation ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## translation ## ]]
$200 cash rewards bonus

[[ ## completed ## ]][0m







### Custom dspy.Signature

In [54]:
class Translation(dspy.Signature):
    """Translate to English"""
    # lang = dspy.InputField(desc="language to translate from")
    source = dspy.InputField(desc="a sentence for translation")
    translation = dspy.OutputField(desc="translation in english")

translate = dspy.ChainOfThought(Translation)

translate

predict = Predict(StringSignature(source -> reasoning, translation
    instructions='Translate to English'
    source = Field(annotation=str required=True json_schema_extra={'desc': 'a sentence for translation', '__dspy_field_type': 'input', 'prefix': 'Source:'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    translation = Field(annotation=str required=True json_schema_extra={'desc': 'translation in english', '__dspy_field_type': 'output', 'prefix': 'Translation:'})
))

### dspy.ProgramOfThought

In [46]:
# tool -- then change answer: str

def parse_integer_answer(answer, only_first_line=True):
    try:
        if only_first_line:
            answer = answer.strip().split('\n')[0]

        # find the last token that has a number in it
        answer = [token for token in answer.split() if any(c.isdigit() for c in token)][-1]
        answer = answer.split('.')[0]
        answer = ''.join([c for c in answer if c.isdigit()])
        answer = int(answer)

    except (ValueError, IndexError):
        # print(answer)
        answer = 0

    return answer

#
react_module = dspy.ReAct("question -> answer", tools=[parse_integer_answer], max_iters=2)

question = 'Sarah has 5 apples. She buys 7 more apples from the store. How many apples does Sarah have now?'
result = react_module(question=question)

print(f"Question: {question}")
print(f"Final Predicted Answer (after ReAct process): {result.answer}")

Question: Sarah has 5 apples. She buys 7 more apples from the store. How many apples does Sarah have now?
Final Predicted Answer (after ReAct process): 12


In [47]:
lm.inspect_history(n=1)





[34m[2025-08-16T21:29:27.648520][0m

[31mSystem message:[0m

Your input fields are:
1. `question` (str)
2. `trajectory` (str)

Your output fields are:
1. `reasoning` (str)
2. `answer` (str)

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## question ## ]]
{question}

[[ ## trajectory ## ]]
{trajectory}

[[ ## reasoning ## ]]
{reasoning}

[[ ## answer ## ]]
{answer}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Given the fields `question`, produce the fields `answer`.


[31mUser message:[0m

[[ ## question ## ]]
Sarah has 5 apples. She buys 7 more apples from the store. How many apples does Sarah have now?

[[ ## trajectory ## ]]
[[ ## thought_0 ## ]]
The user is asking a simple arithmetic question. I need to calculate the total number of apples Sarah has. Sarah starts with 5 apples and buys 7 more. So, the total is 5 + 7 = 12. I should use the `parse_integer_answer` tool to provide the 

### dspy.CodeAct

In [49]:
# requires dspy > 3.0.0 -- see https://github.com/stanfordnlp/dspy/releases/tag/3.0.0
# from dspy import CodeAct

# def factorial(n):
#     """Calculate factorial of n"""
#     if n == 1:
#         return 1
#     return n * factorial(n-1)

# act = CodeAct("n->factorial", tools=[factorial])
# result = act(n=5)
# result # Returns 120

### dspy.Parallel

In [50]:
import dspy

parallel = dspy.Parallel(num_threads=2)
predict = dspy.Predict("question -> answer")
result = parallel(
    [
        (predict, dspy.Example(question="1+1").with_inputs("question")),
        (predict, dspy.Example(question="2+2").with_inputs("question"))
    ]
)
result

Processed 2 / 2 examples: 100%|██████████| 2/2 [00:01<00:00,  1.90it/s]


[Prediction(
     answer='2'
 ),
 Prediction(
     answer='4'
 )]

## Evaluation

### Metrics

In [91]:
# Recall

class Translation(dspy.Signature):
    """Translate to English"""
    lang = dspy.InputField(desc="language to translate from")
    source = dspy.InputField(desc="a sentence for translation")
    translation = dspy.OutputField(desc="translation in english")

translate = dspy.ChainOfThought(Translation)

In [83]:
from sacrebleu.metrics import BLEU, CHRF
import dspy

def bleu_metric(example, pred, trace=None):
    """
    DSPy metric for BLEU score using sacreBLEU.

    Args:
        example: Training/dev example containing reference text(s)
        pred: Prediction from DSPy program containing generated text

    Returns:
        float: BLEU score (0-100 scale) normalized to 0-1 for DSPy

    Expected format:
        - example.reference: reference text (string)
        - pred.translation: generated text (string)
    """
    # sacrebleu assumes lists
    references = [[example.reference]] # assumes list of list
    hypotheses = [pred.translation ]

    # Calculate BLEU score
    bleu = BLEU()
    score = bleu.corpus_score(hypotheses, references)

    # Return normalized score (BLEU is 0-100, normalize to 0-1)
    return score.score / 100.0

example = dspy.Example(
    source="¿Puedo domiciliar la nómina?",
    reference="Can I set up direct deposit for my salary?",
    lang="Spanish"
    ).with_inputs("source", "reference", "lang")

pred = translate(source=example.source, reference=example.reference, lang=example.lang)

print(pred)
print(f"Ground Truth: {example.reference}")
print(f"BLEU Score: {bleu_metric(example, pred):.3f}")

Prediction(
    reasoning='The user wants to translate the Spanish sentence "¿Puedo domiciliar la nómina?" into English.\nThe sentence translates to "Can I direct deposit my salary?" or "Can I have my salary paid into my account?". "Domiciliar la nómina" specifically refers to setting up direct deposit for one\'s salary.',
    translation='Can I direct deposit my salary?'
)
Ground Truth: Can I set up direct deposit for my salary?
BLEU Score: 0.234


### LLM-as-Judge

In [74]:
class FactJudge(dspy.Signature):
    """Judge if the answer is translated correctly based on the reference (gold) standard."""

    source = dspy.InputField(desc="Source text for translation")
    reference = dspy.InputField(desc="Reference (gold) translation for the source text")
    translation = dspy.InputField(desc="Translation of the source text")
    factually_correct: bool = dspy.OutputField(desc="Is the translation correct based on the reference (gold) standard?")

judge = dspy.ChainOfThought(FactJudge)

judgment = judge(source=example.source, reference=example.reference, translation=pred.translation)

judgment

Prediction(
    reasoning='The translation "Can I direct deposit my salary?" is a good and accurate translation of the source text "¿Puedo domiciliar la nómina?". It captures the meaning of setting up direct deposit for a salary. The reference translation "Can I set up direct deposit for my salary?" is also accurate and conveys the same meaning. The provided translation is slightly more concise but equally correct.',
    factually_correct=True
)

### Development Dataset

In [84]:
import pandas as pd

# Assume our ground truth eval dataset is a pandas dataframe
trainset_df = pd.DataFrame({
    'source': [
        "¿Puedo domiciliar la nómina?",
        "Je veux transférer de l'argent.",
        "Quel est le solde de mon compte chèques?",
        "Quiero pagar mi factura.",
        "Necesito un extracto bancario."
    ],
    'reference': [
        "Can I set up direct deposit for my salary?",
        "I want to transfer money.",
        "What is the balance of my checking account?",
        "I want to pay my bill.",
        "I need a bank statement."
    ],
    'lang': [
        "Spanish",
        "French Canadian",
        "French Canadian",
        "Spanish",
        "Spanish"
    ]
})

devset_df = pd.DataFrame({
    'source': [
        "¿Me pueden abonar los intereses del plazo fijo que vence hoy?",
        "Je veux faire un virement par Interac e-Transfer à mon conjoint.",
        "¿Cuándo se me cargará la comisión por sobregiro?",
        "Mon hypothèque est-elle assurable par la SCHL?",
        "Necesito desbloquear mi tarjeta porque salté el PIN tres veces."
    ],
    'reference': [
        "Can you credit me the interest from my term deposit that matures today?",
        "I want to make an Interac e-Transfer to my spouse.",
        "When will I be charged the overdraft fee?",
        "Is my mortgage insurable by CMHC?",
        "I need to unlock my card because I entered the wrong PIN three times."
    ],
    'lang': [
        "Spanish",
        "French Canadian",
        "Spanish",
        "French Canadian",
        "Spanish"
    ]
})

devset = [
    dspy.Example(source=row.source, reference=row.reference, lang=row.lang).with_inputs("source", "reference", "lang")
    for _, row in devset_df.iterrows()
]

trainset = [
    dspy.Example(source=row.source, reference=row.reference, lang=row.lang).with_inputs("source", "reference", "lang")
    for _, row in trainset_df.iterrows()
]

### Run Evaluation

In [97]:
from dspy.evaluate import Evaluate

evaluate_program = Evaluate(devset=devset, metric=bleu_metric, num_threads=2, display_progress=True, display_table=5)

evaluate_program(translate)

Average Metric: 3.67 / 5 (73.4%): 100%|██████████| 5/5 [00:00<00:00, 988.94it/s]

2025/08/16 22:22:54 INFO dspy.evaluate.evaluate: Average Metric: 3.671174247859164 / 5 (73.4%)





Unnamed: 0,source,reference,lang,reasoning,translation,bleu_metric
0,¿Me pueden abonar los intereses del plazo fijo que vence hoy?,Can you credit me the interest from my term deposit that matures t...,Spanish,The user wants to translate a Spanish sentence into English. The s...,Can I be credited with the interest from the fixed-term deposit th...,✔️ [0.339]
1,Je veux faire un virement par Interac e-Transfer à mon conjoint.,I want to make an Interac e-Transfer to my spouse.,French Canadian,The user wants to translate a sentence from French Canadian to Eng...,I want to make an Interac e-Transfer to my spouse.,✔️ [1.000]
2,¿Cuándo se me cargará la comisión por sobregiro?,When will I be charged the overdraft fee?,Spanish,The user wants to translate a Spanish sentence into English. The s...,When will I be charged the overdraft fee?,✔️ [1.000]
3,Mon hypothèque est-elle assurable par la SCHL?,Is my mortgage insurable by CMHC?,French Canadian,The user wants to translate a question from French Canadian to Eng...,Is my mortgage insurable by the CMHC?,✔️ [0.595]
4,Necesito desbloquear mi tarjeta porque salté el PIN tres veces.,I need to unlock my card because I entered the wrong PIN three times.,Spanish,The user wants to translate a Spanish sentence into English. The s...,I need to unlock my card because I entered the PIN incorrectly thr...,✔️ [0.738]


73.42

In [98]:
lm.inspect_history(n=1)





[34m[2025-08-16T22:22:54.778838][0m

[31mSystem message:[0m

Your input fields are:
1. `lang` (str): language to translate from
2. `source` (str): a sentence for translation

Your output fields are:
1. `reasoning` (str)
2. `translation` (str): translation in english

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## lang ## ]]
{lang}

[[ ## source ## ]]
{source}

[[ ## reasoning ## ]]
{reasoning}

[[ ## translation ## ]]
{translation}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Translate to English


[31mUser message:[0m

[[ ## lang ## ]]
Spanish

[[ ## source ## ]]
Necesito desbloquear mi tarjeta porque salté el PIN tres veces.

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## translation ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## reasoning ## ]]
The user wants to tran

In [None]:
# exercise -- loop through the devset_eval with your judge

## DSPy Optimizers

### LabeledFewShot

In [99]:
from dspy.teleprompt import LabeledFewShot

labeled_fewshot_optimizer = LabeledFewShot(k=3)
labeled_fewshot_program_compiled = labeled_fewshot_optimizer.compile(student=translate, trainset=trainset, sample=True)

In [100]:
evaluate_program(labeled_fewshot_program_compiled)

Average Metric: 3.67 / 5 (73.4%): 100%|██████████| 5/5 [00:00<00:00, 600.83it/s]

2025/08/16 22:23:45 INFO dspy.evaluate.evaluate: Average Metric: 3.671174247859164 / 5 (73.4%)





Unnamed: 0,source,reference,lang,reasoning,translation,bleu_metric
0,¿Me pueden abonar los intereses del plazo fijo que vence hoy?,Can you credit me the interest from my term deposit that matures t...,Spanish,The user wants to translate a Spanish sentence into English. The s...,Can I be credited with the interest from the fixed-term deposit th...,✔️ [0.339]
1,Je veux faire un virement par Interac e-Transfer à mon conjoint.,I want to make an Interac e-Transfer to my spouse.,French Canadian,The user wants to translate a sentence from French Canadian to Eng...,I want to make an Interac e-Transfer to my spouse.,✔️ [1.000]
2,¿Cuándo se me cargará la comisión por sobregiro?,When will I be charged the overdraft fee?,Spanish,The user wants to translate a Spanish sentence into English. The s...,When will I be charged the overdraft fee?,✔️ [1.000]
3,Mon hypothèque est-elle assurable par la SCHL?,Is my mortgage insurable by CMHC?,French Canadian,The user wants to translate a question from French Canadian to Eng...,Is my mortgage insurable by the CMHC?,✔️ [0.595]
4,Necesito desbloquear mi tarjeta porque salté el PIN tres veces.,I need to unlock my card because I entered the wrong PIN three times.,Spanish,The user wants to translate a Spanish sentence into English. The s...,I need to unlock my card because I entered the PIN incorrectly thr...,✔️ [0.738]


73.42

In [101]:
lm.inspect_history(n=1)





[34m[2025-08-16T22:23:45.381296][0m

[31mSystem message:[0m

Your input fields are:
1. `lang` (str): language to translate from
2. `source` (str): a sentence for translation

Your output fields are:
1. `reasoning` (str)
2. `translation` (str): translation in english

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## lang ## ]]
{lang}

[[ ## source ## ]]
{source}

[[ ## reasoning ## ]]
{reasoning}

[[ ## translation ## ]]
{translation}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Translate to English


[31mUser message:[0m

[[ ## lang ## ]]
Spanish

[[ ## source ## ]]
Necesito desbloquear mi tarjeta porque salté el PIN tres veces.

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## translation ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mResponse:[0m

[32m[[ ## reasoning ## ]]
The user wants to tran

In [102]:
from dspy.teleprompt import BootstrapFewShot

fewshot_optimizer = BootstrapFewShot(metric=bleu_metric, max_bootstrapped_demos=4, max_labeled_demos=16, max_rounds=1, max_errors=10)

bootstrap_fewshot_program_compiled = fewshot_optimizer.compile(student = translate, trainset=trainset)


 80%|████████  | 4/5 [00:01<00:00,  3.54it/s]

Bootstrapped 4 full traces after 4 examples for up to 1 rounds, amounting to 4 attempts.





In [103]:
evaluate_program(bootstrap_fewshot_program_compiled)

Average Metric: 4.18 / 5 (83.6%): 100%|██████████| 5/5 [00:03<00:00,  1.45it/s]

2025/08/16 22:26:22 INFO dspy.evaluate.evaluate: Average Metric: 4.1783394171316885 / 5 (83.6%)





Unnamed: 0,source,reference,lang,reasoning,translation,bleu_metric
0,¿Me pueden abonar los intereses del plazo fijo que vence hoy?,Can you credit me the interest from my term deposit that matures t...,Spanish,"The user wants to translate the Spanish sentence ""¿Me pueden abona...",Can you credit the interest from the fixed-term deposit that matur...,✔️ [0.441]
1,Je veux faire un virement par Interac e-Transfer à mon conjoint.,I want to make an Interac e-Transfer to my spouse.,French Canadian,The user wants to translate a sentence from French Canadian to Eng...,I want to make an Interac e-Transfer to my spouse.,✔️ [1.000]
2,¿Cuándo se me cargará la comisión por sobregiro?,When will I be charged the overdraft fee?,Spanish,"The user wants to translate the Spanish sentence ""¿Cuándo se me ca...",When will I be charged the overdraft fee?,✔️ [1.000]
3,Mon hypothèque est-elle assurable par la SCHL?,Is my mortgage insurable by CMHC?,French Canadian,The user wants to translate a sentence from French Canadian to Eng...,Is my mortgage insurable by CMHC?,✔️ [1.000]
4,Necesito desbloquear mi tarjeta porque salté el PIN tres veces.,I need to unlock my card because I entered the wrong PIN three times.,Spanish,"The user wants to translate the Spanish sentence ""Necesito desbloq...",I need to unlock my card because I entered the PIN incorrectly thr...,✔️ [0.738]


83.57

In [104]:
lm.inspect_history(n=1)





[34m[2025-08-16T22:26:22.637068][0m

[31mSystem message:[0m

Your input fields are:
1. `lang` (str): language to translate from
2. `source` (str): a sentence for translation

Your output fields are:
1. `reasoning` (str)
2. `translation` (str): translation in english

All interactions will be structured in the following way, with the appropriate values filled in.

[[ ## lang ## ]]
{lang}

[[ ## source ## ]]
{source}

[[ ## reasoning ## ]]
{reasoning}

[[ ## translation ## ]]
{translation}

[[ ## completed ## ]]

In adhering to this structure, your objective is: 
        Translate to English


[31mUser message:[0m

[[ ## lang ## ]]
Spanish

[[ ## source ## ]]
¿Puedo domiciliar la nómina?

Respond with the corresponding output fields, starting with the field `[[ ## reasoning ## ]]`, then `[[ ## translation ## ]]`, and then ending with the marker for `[[ ## completed ## ]]`.


[31mAssistant message:[0m

[[ ## reasoning ## ]]
The user wants to translate the Spanish sentence "¿Pu

In [106]:
# Import the optimizer
from dspy.teleprompt import MIPROv2

# Initialize optimizer
teleprompter = MIPROv2(
    metric=bleu_metric,
    auto="light", # Can choose between light, medium, and heavy optimization runs
)

# Optimize program
print(f"Optimizing program with MIPRO...")
optimized_program = teleprompter.compile(
    translate.deepcopy(),
    trainset=trainset,
    max_bootstrapped_demos=2,
    max_labeled_demos=3,
)

# Save optimize program for future use
optimized_program.save(f"./mipro_optimized_v1.json")

# Evaluate optimized program
print(f"Evaluate optimized program...")
evaluate_program(optimized_program, devset=devset[:])

2025/08/16 22:30:59 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: False
num_candidates: 5
valset size: 4



Optimizing program with MIPRO...
[93m[1mProjected Language Model (LM) Calls[0m

Based on the parameters you have set, the maximum number of LM calls is projected as follows:

[93m- Prompt Generation: [94m[1m10[0m[93m data summarizer calls + [94m[1m5[0m[93m * [94m[1m1[0m[93m lm calls in program + ([94m[1m2[0m[93m) lm calls in program-aware proposer = [94m[1m17[0m[93m prompt model calls[0m
[93m- Program Evaluation: [94m[1m4[0m[93m examples in val set * [94m[1m7[0m[93m batches = [94m[1m28[0m[93m LM program calls[0m

[93m[1mEstimated Cost Calculation:[0m

[93mTotal Cost = (Number of calls to task model * (Avg Input Token Length per Call * Task Model Price per Input Token + Avg Output Token Length per Call * Task Model Price per Output Token)
            + (Number of program calls * (Avg Input Token Length per Call * Task Prompt Price per Input Token + Avg Output Token Length per Call * Prompt Model Price per Output Token).[0m

For a preliminary e

2025/08/16 22:31:01 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/08/16 22:31:01 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates for our program and for creating instructions.

2025/08/16 22:31:01 INFO dspy.teleprompt.mipro_optimizer_v2: Bootstrapping N=5 sets of demonstrations...


Bootstrapping set 1/5
Bootstrapping set 2/5
Bootstrapping set 3/5


100%|██████████| 1/1 [00:00<00:00, 448.11it/s]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 4/5


100%|██████████| 1/1 [00:00<00:00, 492.12it/s]


Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.
Bootstrapping set 5/5


100%|██████████| 1/1 [00:00<00:00, 493.91it/s]
2025/08/16 22:31:01 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 2: PROPOSE INSTRUCTION CANDIDATES <==
2025/08/16 22:31:01 INFO dspy.teleprompt.mipro_optimizer_v2: We will use the few-shot examples from the previous step, a generated dataset summary, a summary of the program code, and a randomly selected prompting tip to propose instructions.
2025/08/16 22:31:01 INFO dspy.teleprompt.mipro_optimizer_v2: 
Proposing instructions...



Bootstrapped 1 full traces after 0 examples for up to 1 rounds, amounting to 1 attempts.


2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: Proposed Instructions for Predictor 0:

2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: 0: Translate to English

2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: 1: You are a professional translator tasked with translating Spanish questions into English. Your translation must be accurate and natural-sounding. Provide a step-by-step reasoning process for your translation before giving the final English translation.

Lang: Spanish
Source: ¿Cuál es el plazo de entrega?
Reasoning: Let's think step by step in order to The user is asking about a delivery timeframe in Spanish. The phrase "¿Cuál es el plazo de entrega?" directly translates to "What is the delivery time?" or "What is the delivery deadline?". In a business context, "delivery time" is more common and natural.
Translation: What is the delivery time?

2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: 2: You are a highly skilled translat

Average Metric: 4.00 / 4 (100.0%): 100%|██████████| 4/4 [00:00<00:00, 527.60it/s]

2025/08/16 22:31:02 INFO dspy.evaluate.evaluate: Average Metric: 4.000000000000002 / 4 (100.0%)
2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: Default program score: 100.0

2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 2 / 7 =====



Average Metric: 3.66 / 4 (91.5%): 100%|██████████| 4/4 [00:00<00:00, 885.67it/s]

2025/08/16 22:31:02 INFO dspy.evaluate.evaluate: Average Metric: 3.6606328636027627 / 4 (91.5%)
2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 91.52 with parameters ['Predictor 0: Instruction 1', 'Predictor 0: Few-Shot Set 1'].
2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 91.52]
2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/16 22:31:02 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 3 / 7 =====



Average Metric: 2.10 / 4 (52.5%): 100%|██████████| 4/4 [00:00<00:00, 607.36it/s]

2025/08/16 22:31:03 INFO dspy.evaluate.evaluate: Average Metric: 2.0998009940387377 / 4 (52.5%)
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 52.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1'].
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 91.52, 52.5]
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 4 / 7 =====



Average Metric: 3.25 / 4 (81.3%): 100%|██████████| 4/4 [00:00<00:00, 1135.05it/s]

2025/08/16 22:31:03 INFO dspy.evaluate.evaluate: Average Metric: 3.251271577077013 / 4 (81.3%)
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 81.28 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 1'].
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 91.52, 52.5, 81.28]
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 5 / 7 =====



Average Metric: 2.10 / 4 (52.5%): 100%|██████████| 4/4 [00:00<00:00, 601.85it/s]

2025/08/16 22:31:03 INFO dspy.evaluate.evaluate: Average Metric: 2.0998009940387377 / 4 (52.5%)
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 52.5 with parameters ['Predictor 0: Instruction 2', 'Predictor 0: Few-Shot Set 1'].
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 91.52, 52.5, 81.28, 52.5]
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 6 / 7 =====



Average Metric: 4.00 / 4 (100.0%): 100%|██████████| 4/4 [00:00<00:00, 1238.99it/s]

2025/08/16 22:31:03 INFO dspy.evaluate.evaluate: Average Metric: 4.000000000000002 / 4 (100.0%)
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 4', 'Predictor 0: Few-Shot Set 3'].
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 91.52, 52.5, 81.28, 52.5, 100.0]
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: ===== Trial 7 / 7 =====



Average Metric: 4.00 / 4 (100.0%): 100%|██████████| 4/4 [00:00<00:00, 903.02it/s]

2025/08/16 22:31:03 INFO dspy.evaluate.evaluate: Average Metric: 4.000000000000002 / 4 (100.0%)
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Score: 100.0 with parameters ['Predictor 0: Instruction 0', 'Predictor 0: Few-Shot Set 1'].
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Scores so far: [100.0, 91.52, 52.5, 81.28, 52.5, 100.0, 100.0]
2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Best score so far: 100.0


2025/08/16 22:31:03 INFO dspy.teleprompt.mipro_optimizer_v2: Returning best identified program with score 100.0!



Evaluate optimized program...
Average Metric: 3.67 / 5 (73.4%): 100%|██████████| 5/5 [00:00<00:00, 959.97it/s]

2025/08/16 22:31:03 INFO dspy.evaluate.evaluate: Average Metric: 3.671174247859164 / 5 (73.4%)





Unnamed: 0,source,reference,lang,reasoning,translation,bleu_metric
0,¿Me pueden abonar los intereses del plazo fijo que vence hoy?,Can you credit me the interest from my term deposit that matures t...,Spanish,The user wants to translate a Spanish sentence into English. The s...,Can I be credited with the interest from the fixed-term deposit th...,✔️ [0.339]
1,Je veux faire un virement par Interac e-Transfer à mon conjoint.,I want to make an Interac e-Transfer to my spouse.,French Canadian,The user wants to translate a sentence from French Canadian to Eng...,I want to make an Interac e-Transfer to my spouse.,✔️ [1.000]
2,¿Cuándo se me cargará la comisión por sobregiro?,When will I be charged the overdraft fee?,Spanish,The user wants to translate a Spanish sentence into English. The s...,When will I be charged the overdraft fee?,✔️ [1.000]
3,Mon hypothèque est-elle assurable par la SCHL?,Is my mortgage insurable by CMHC?,French Canadian,The user wants to translate a question from French Canadian to Eng...,Is my mortgage insurable by the CMHC?,✔️ [0.595]
4,Necesito desbloquear mi tarjeta porque salté el PIN tres veces.,I need to unlock my card because I entered the wrong PIN three times.,Spanish,The user wants to translate a Spanish sentence into English. The s...,I need to unlock my card because I entered the PIN incorrectly thr...,✔️ [0.738]


73.42

In [107]:
optimized_program

predict = Predict(StringSignature(lang, source -> reasoning, translation
    instructions='Translate to English'
    lang = Field(annotation=str required=True json_schema_extra={'desc': 'language to translate from', '__dspy_field_type': 'input', 'prefix': 'Lang:'})
    source = Field(annotation=str required=True json_schema_extra={'desc': 'a sentence for translation', '__dspy_field_type': 'input', 'prefix': 'Source:'})
    reasoning = Field(annotation=str required=True json_schema_extra={'prefix': "Reasoning: Let's think step by step in order to", 'desc': '${reasoning}', '__dspy_field_type': 'output'})
    translation = Field(annotation=str required=True json_schema_extra={'desc': 'translation in english', '__dspy_field_type': 'output', 'prefix': 'Translation:'})
))