## Tutorial: Optimizing a Prompt

![TextGrad](https://github.com/vinid/data/blob/master/logo_full.png?raw=true)

An autograd engine -- for textual gradients!

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/zou-group/TextGrad/blob/main/examples/notebooks/Prompt-Optimization.ipynb)
[![GitHub license](https://img.shields.io/badge/License-MIT-blue.svg)](https://lbesson.mit-license.org/)
[![Arxiv](https://img.shields.io/badge/arXiv-2406.07496-B31B1B.svg)](https://arxiv.org/abs/2406.07496)
[![Documentation Status](https://readthedocs.org/projects/textgrad/badge/?version=latest)](https://textgrad.readthedocs.io/en/latest/?badge=latest)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/textgrad)](https://pypi.org/project/textgrad/)
[![PyPI](https://img.shields.io/pypi/v/textgrad)](https://pypi.org/project/textgrad/)

**Objectives:**

* In this tutorial, we will run prompt optimization.

**Requirements:**

* You need to have an OpenAI API key to run this tutorial. This should be set as an environment variable as OPENAI_API_KEY.


In [2]:
!pip install textgrad # you might need to restart the notebook after installing textgrad
!pip install anthropic
!pip install litellm --upgrade
from google.colab import userdata
import os
os.environ['ANTHROPIC_API_KEY'] = userdata.get('ANTHROPIC_API_KEY')
from anthropic import Anthropic
import argparse
import concurrent
from dotenv import load_dotenv
from tqdm import tqdm
import textgrad as tg
from textgrad.tasks import load_task
import numpy as np
import random

load_dotenv(override=True)




False

Let's first define some support functions

In [3]:
def set_seed(seed):
    np.random.seed(seed)
    random.seed(seed)

In [4]:
def eval_sample(item, eval_fn, model):
    """
    This function allows us to evaluate if an answer to a question in the prompt is a good answer.

    """
    x, y = item
    x = tg.Variable(x, requires_grad=False, role_description="query to the language model")
    y = tg.Variable(y, requires_grad=False, role_description="correct answer for the query")
    response = model(x)
    try:
        eval_output_variable = eval_fn(inputs=dict(prediction=response, ground_truth_answer=y))
        return int(eval_output_variable.value)
    except:
        eval_output_variable = eval_fn([x, y, response])
        eval_output_parsed = eval_fn.parse_output(eval_output_variable)
        return int(eval_output_parsed)

In [5]:
def eval_dataset(test_set, eval_fn, model, max_samples: int=None):
    if max_samples is None:
        max_samples = len(test_set)
    accuracy_list = []
    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
        futures = []
        for _, sample in enumerate(test_set):

            future = executor.submit(eval_sample, sample, eval_fn, model)
            futures.append(future)
            if len(futures) >= max_samples:
                break
        tqdm_loader = tqdm(concurrent.futures.as_completed(futures), total=len(futures), position=0)
        for future in tqdm_loader:
            acc_item = future.result()
            accuracy_list.append(acc_item)
            tqdm_loader.set_description(f"Accuracy: {np.mean(accuracy_list)}")
    return accuracy_list

In [6]:
def run_validation_revert(system_prompt: tg.Variable, results, model, eval_fn, val_set):
    val_performance = np.mean(eval_dataset(val_set, eval_fn, model))
    previous_performance = np.mean(results["validation_acc"][-1])
    print("val_performance: ", val_performance)
    print("previous_performance: ", previous_performance)
    previous_prompt = results["prompt"][-1]

    if val_performance < previous_performance:
        print(f"rejected prompt: {system_prompt.value}")
        system_prompt.set_value(previous_prompt)
        val_performance = previous_performance

    results["validation_acc"].append(val_performance)

In [8]:
set_seed(12)
llm_api_eval = tg.get_engine(engine_name="claude-3-opus-20240229")
llm_api_test = tg.get_engine(engine_name="claude-3-opus-20240229")
tg.set_backward_engine(llm_api_eval, override=True)

# Load the data and the evaluation function
train_set, val_set, test_set, eval_fn = load_task("BBH_object_counting", evaluation_api=llm_api_eval)
print("Train/Val/Test Set Lengths: ", len(train_set), len(val_set), len(test_set))
STARTING_SYSTEM_PROMPT = train_set.get_task_description()


Train/Val/Test Set Lengths:  50 100 100


This is the system prompt we are going to start from:

In [9]:
print(STARTING_SYSTEM_PROMPT)


You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


In [12]:
import numpy as np

# 定义一个清洗函数
def clean_dataset(dataset):
    cleaned = []
    for x, y in dataset:
        # 如果 x 是 numpy 类型，转成标准 int 或 str
        if isinstance(x, (np.integer, np.int64)):
            x = int(x)
        # 如果 y 是 numpy 类型，也转一下
        if isinstance(y, (np.integer, np.int64)):
            y = int(y)
        cleaned.append((x, y))
    return cleaned

# 1. 清洗你的数据集 (把 numpy.int64 变成 int)
test_set = clean_dataset(test_set)
val_set = clean_dataset(val_set)

# 2. 打印确认一下 (应该显示 <class 'int'>)
print("清洗后数据类型:", type(test_set[0][0]))

清洗后数据类型: <class 'str'>


In [13]:
train_loader = tg.tasks.DataLoader(train_set, batch_size=3, shuffle=True)


# Testing the 0-shot performance of the evaluation engine
system_prompt = tg.Variable(STARTING_SYSTEM_PROMPT,
                            requires_grad=True,
                            role_description="system prompt to the language model")
model_evaluation = tg.BlackboxLLM(llm_api_eval, system_prompt)

system_prompt = tg.Variable(STARTING_SYSTEM_PROMPT,
                            requires_grad=True,
                            role_description="structured system prompt to a somewhat capable language model that specifies the behavior and strategies for the QA task")
model = tg.BlackboxLLM(llm_api_test, system_prompt)

optimizer = tg.TextualGradientDescent(engine=llm_api_eval, parameters=[system_prompt])

results = {"test_acc": [], "prompt": [], "validation_acc": []}
results["test_acc"].append(eval_dataset(test_set, eval_fn, model))
results["validation_acc"].append(eval_dataset(val_set, eval_fn, model))
results["prompt"].append(system_prompt.get_value())


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   1%|          | 1/100 [00:05<08:15,  5.00s/it]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   2%|▏         | 2/100 [00:07<05:24,  3.31s/it]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   3%|▎         | 3/100 [00:09<04:55,  3.05s/it]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   4%|▍         | 4/100 [00:11<03:50,  2.40s/it]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   5%|▌         | 5/100 [00:14<04:26,  2.80s/it]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   6%|▌         | 6/100 [00:15<03:08,  2.01s/it]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   7%|▋         | 7/100 [00:20<05:00,  3.23s/it]INFO:textgrad:L

In [15]:
for epoch in range(3):
    for steps, (batch_x, batch_y) in enumerate((pbar := tqdm(train_loader, position=0))):
        pbar.set_description(f"Training step {steps}. Epoch {epoch}")
        optimizer.zero_grad()
        losses = []
        for (x, y) in zip(batch_x, batch_y):
            if hasattr(x, 'item'): x = x.item()
            if hasattr(y, 'item'): y = y.item()

            x = str(x)
            y = str(y)
            x = tg.Variable(x, requires_grad=False, role_description="query to the language model")
            y = tg.Variable(y, requires_grad=False, role_description="correct answer for the query")
            response = model(x)
            try:
                eval_output_variable = eval_fn(inputs=dict(prediction=response, ground_truth_answer=y))
            except:
                eval_output_variable = eval_fn([x, y, response])
            losses.append(eval_output_variable)
        total_loss = tg.sum(losses)
        total_loss.backward()
        optimizer.step()

        run_validation_revert(system_prompt, results, model, eval_fn, val_set)

        print("sys prompt: ", system_prompt)
        test_acc = eval_dataset(test_set, eval_fn, model)
        results["test_acc"].append(test_acc)
        results["prompt"].append(system_prompt.get_value())
        if steps == 3:
            break

Training step 0. Epoch 0: : 0it [00:00, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:Idempotent backward
INFO:textgrad:Idempotent backward
INFO:textgrad:Idempotent backward
INFO:textgrad:_backward_through_string_fn prompt
INFO:textgrad:_backward_through_string_fn gradient
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:_backward_through_string_fn prompt
INFO:textgrad:_backward_through_string_fn gradient
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:_backward_through_string_fn prompt
INFO:textgrad:_backward_through_string_fn gradient
INFO:textgrad:_backward_through_llm prompt
INFO:textgrad:_backward_through_llm gradient
INFO:textgrad:TextualGradientDescent prompt for update
INFO:textgrad:

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


INFO:textgrad:StringBasedFunction
  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMC

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall f

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:Stri

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:Stri

val_performance:  0.99
previous_performance:  1.0
rejected prompt: You will answer a reasoning question. Think step by step. (...) following format: 'Answer: $VALUE' where VALUE is a numerical value.
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


INFO:textgrad:LLMCall function forward
  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


INFO:textgrad:LLMCall function forward
  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:Stri

val_performance:  1.0
previous_performance:  1.0
sys prompt:  You will answer a reasoning question. Think step by step. The last line of your response should be of the following format: 'Answer: $VALUE' where VALUE is a numerical value.


INFO:textgrad:StringBasedFunction
  0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:StringBasedFunction
INFO:textgrad:LLMCall function forward
Accuracy: 1.0:   0%|          | 0/100 [00:00<?, ?it/s]INFO:textgrad:LLMCall function forward
INFO:textgrad:StringBasedFunction
INFO:textgrad:StringBasedFunction
Accuracy: 1.0:   0%|          | 0/100 