# Introduction

[<img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />](https://colab.research.google.com/github/marshmellow77/automated-prompt-engineering/blob/main/automated-prompt-engineering.ipynb)


This notebook demonstrates how to use Google's Gemini model to automate prompt engineering.

Prompt engineering is a powerful way to improve the responses og large language models (LLMs). Bit it is also a manual, tedious, iterative process and it quickly accumulates technical debt and waste since each handcrafted prompt is specific to a model (and its version) as well as the task at hand.

In this notebook we will learn how to use the DSPy library to autonomously and automatically create prompts that are optimised for a specific model and the task at hand.


# Manual Prompt Engineering

Manual prompt engineering is very tedious - let's look at an example where we carefully handcraft a prompt for our task and model.

## Setup

In [1]:
# As of 3 April 2024, VertexAI is not yet integrated into DSPy. But there already exists a PR for it which we can leverage.
!pip install -U git+https://github.com/marshmellow77/dspy.git@seedstart-random-search#egg=dspy-ai

Collecting dspy-ai
  Cloning https://github.com/marshmellow77/dspy.git (to revision seedstart-random-search) to /tmp/pip-install-d1pr11ly/dspy-ai_be82bc32eedd4963a2a90688cf0acdd3
  Running command git clone --filter=blob:none --quiet https://github.com/marshmellow77/dspy.git /tmp/pip-install-d1pr11ly/dspy-ai_be82bc32eedd4963a2a90688cf0acdd3
[0m  Running command git checkout -q seedstart-random-search
  error: pathspec 'seedstart-random-search' did not match any file(s) known to git
  [1;31merror[0m: [1msubprocess-exited-with-error[0m
  
  [31m×[0m [32mgit checkout -q seedstart-random-search[0m did not run successfully.
  [31m│[0m exit code: [1;36m1[0m
  [31m╰─>[0m See above for output.
  
  [1;35mnote[0m: This error originates from a subprocess, and is likely not a problem with pip.
[1;31merror[0m: [1msubprocess-exited-with-error[0m

[31m×[0m [32mgit checkout -q seedstart-random-search[0m did not run successfully.
[31m│[0m exit code: [1;36m1[0m
[31m╰─>[0m

In [2]:
!pip install --upgrade google-cloud-aiplatform
!pip install Jinja2

Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.74.0-py2.py3-none-any.whl.metadata (31 kB)
Downloading google_cloud_aiplatform-1.74.0-py2.py3-none-any.whl (6.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.5/6.5 MB[0m [31m22.0 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: google-cloud-aiplatform
  Attempting uninstall: google-cloud-aiplatform
    Found existing installation: google-cloud-aiplatform 1.73.0
    Uninstalling google-cloud-aiplatform-1.73.0:
      Successfully uninstalled google-cloud-aiplatform-1.73.0
Successfully installed google-cloud-aiplatform-1.74.0




In [1]:
import os
import sys

IS_COLAB = "google.colab" in sys.modules
if not IS_COLAB:
    raise ValueError("This notebook should be run using Google Colab.")

if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

In [2]:
import vertexai

project_id = "my-project-269206"
vertexai.init(project=project_id)

In [3]:
from vertexai.generative_models import GenerativeModel

gemini_pro = GenerativeModel("gemini-1.0-pro")

## Zero shot attempt

Let's first try to use Gemini Pro for a mathematical text question

In [4]:
prompt = """Given the fields `question`, produce the fields `answer`.

Question: Heather is going to sew 150 aprons that are to be used for a kiddie crew program.
She already was able to sew 13 aprons, and today, she sewed three times as many aprons.
How many aprons should she sew tomorrow if she wants to sew half of the remaining number of aprons needed?

Answer:"""

# The correct answer is 49.

In [5]:
config = {"temperature": 0.1}

In [6]:
response = gemini_pro.generate_content(contents=prompt, generation_config=config)
print(response.text)

## Answer

Heather needs to sew 150 - 13 = 137 aprons in total.

Today, she sewed 3 x 13 = 39 aprons.

So, she still needs to sew 137 - 39 = 98 aprons.

If she wants to sew half of the remaining number of aprons tomorrow, she needs to sew 98 / 2 = 49 aprons.

Therefore, Heather should sew **49 aprons** tomorrow. 



We can see that Gemini Pro got this one wrong. Let's use best practices including Chain of thought and few shot prompting to improve Gemini's performance!

## Few shot prompting with Chain of Thought

In [7]:
prompt = """Given the fields `question`, produce the fields `answer`.

---

Follow the following format.

Question: <Question>
Rationale: Let's think step by step ...
Answer: <Answer>

---

Question: A gumball machine has red, green, and blue gumballs. The machine has half as many blue gumballs as red gumballs.
For each blue gumball, the machine has 4 times as many green gumballs. If the machine has 16 red gumballs how many gumballs are in the machine?
Rationale: Let's think step by step.
First, we can find the number of blue gumballs in the machine.
Since the machine has half as many blue gumballs as red gumballs, and there are 16 red gumballs, there must be 16 / 2 = 8 blue gumballs.
Next, we can find the number of green gumballs in the machine.
Since the machine has 4 times as many green gumballs as blue gumballs, there must be 8 x 4 = 32 green gumballs.
Finally, we can add up the number of red, blue, and green gumballs to find the total number of gumballs in the machine: 16 + 8 + 32 = 56.
Answer: 56

---

Question: Rachel makes $12.00 as a waitress in a coffee shop. In one hour, she serves 20 different people and they all leave her a $1.25 tip. How much money did she make in that hour?
Rationale: Let's think step by step.
First, we need to find out how much money Rachel made from tips. She served 20 people and each person left her a $1.25 tip, so she made 20 * $1.25 = $25.00 in tips.
Next, we need to add her hourly wage to the money she made from tips to find out how much money she made in total. She made $12.00 per hour, so in one hour she made $12.00 + $25.00 = $37.00.
Answer: 37

---

Question: Heather is going to sew 150 aprons that are to be used for a kiddie crew program. She already was able to sew 13 aprons, and today, she sewed three times as many aprons. How many aprons should she sew tomorrow if she wants to sew half of the remaining number of aprons needed?
Rationale:"""

In [8]:
response = gemini_pro.generate_content(contents=prompt, generation_config=config)
print(response.text)

Let's think step by step.

First, we need to find out how many aprons Heather still needs to sew. She needs to sew 150 aprons in total, and she has already sewn 13 aprons, so she still needs to sew 150 - 13 = 137 aprons.

Next, we need to find out how many aprons Heather sewed today. She sewed three times as many aprons as she already had sewn, so she sewed 3 * 13 = 39 aprons today.

Now, we need to find out how many aprons Heather still needs to sew after sewing 39 aprons today. She still needs to sew 137 aprons, and she has already sewn 39 aprons today, so she still needs to sew 137 - 39 = 98 aprons.

Finally, we need to find out how many aprons Heather should sew tomorrow if she wants to sew half of the remaining number of aprons needed. She still needs to sew 98 aprons, and she wants to sew half of them tomorrow, so she should sew 98 / 2 = 49 aprons tomorrow.

Answer: 49


Nice, this worked!

Now we have a good a good prompt for our model and the task at hand (mathematical text questions). But there are a few issues:
* Our prompt works well on our model, but what if we want to use another model or another version (e.g. Gemini Ultra of Gemini 1.5)? Will it still work for those models?
* We had to develop a few examples, and especially coming up with the rationale for each example was tedious

The question is, could we automate this process so that next time we need to repeat this exercise we can just automatically create few shot examples that are optimised for our model and the task at hand?

# Automated prompt engineering with DSPy

DSPy is a library that allows us to automate this process. Let's see how it works.

## Setup

In [10]:
pip install dspy

Collecting dspy
  Downloading dspy-2.5.43-py3-none-any.whl.metadata (7.3 kB)
Collecting asyncer==0.0.8 (from dspy)
  Downloading asyncer-0.0.8-py3-none-any.whl.metadata (6.7 kB)
Collecting backoff (from dspy)
  Downloading backoff-2.2.1-py3-none-any.whl.metadata (14 kB)
Collecting datasets (from dspy)
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting diskcache (from dspy)
  Downloading diskcache-5.6.3-py3-none-any.whl.metadata (20 kB)
Collecting json-repair (from dspy)
  Downloading json_repair-0.31.0-py3-none-any.whl.metadata (11 kB)
Collecting litellm==1.53.7 (from litellm[proxy]==1.53.7->dspy)
  Downloading litellm-1.53.7-py3-none-any.whl.metadata (33 kB)
Collecting magicattr~=0.1.6 (from dspy)
  Downloading magicattr-0.1.6-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting optuna (from dspy)
  Downloading optuna-4.1.0-py3-none-any.whl.metadata (16 kB)
Collecting ujson (from dspy)
  Downloading ujson-5.10.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_6

In [11]:
import dspy

In [26]:
dspy_gemini_pro = dspy.GoogleVertexAI(
    "gemini-1.0-pro-002",
    temperature=0,
)

dspy.settings.configure(lm=dspy_gemini_pro)

## Dataset

We will use the [GSM8K dataset](https://paperswithcode.com/dataset/gsm8k) which consists of inguistically diverse grade school math word problems.

In [27]:
from dspy.datasets.gsm8k import GSM8K, gsm8k_metric

gms8k = GSM8K()


  0%|          | 0/7473 [00:00<?, ?it/s][A
 19%|█▉        | 1428/7473 [00:00<00:00, 14276.44it/s][A
 41%|████      | 3064/7473 [00:00<00:00, 15498.97it/s][A
 62%|██████▏   | 4614/7473 [00:00<00:00, 15496.63it/s][A
100%|██████████| 7473/7473 [00:00<00:00, 14918.49it/s]

100%|██████████| 1319/1319 [00:00<00:00, 13844.22it/s]


In [28]:
train, val, test = gms8k.train[:60], gms8k.dev[:20], gms8k.test[:20]

In [29]:
train[0]

Example({'question': "The result from the 40-item Statistics exam Marion and Ella took already came out. Ella got 4 incorrect answers while Marion got 6 more than half the score of Ella. What is Marion's score?", 'gold_reasoning': "Ella's score is 40 items - 4 items = <<40-4=36>>36 items. Half of Ella's score is 36 items / 2 = <<36/2=18>>18 items. So, Marion's score is 18 items + 6 items = <<18+6=24>>24 items.", 'answer': '24'}) (input_keys={'question'})

In [30]:
train[0].gold_reasoning

"Ella's score is 40 items - 4 items = <<40-4=36>>36 items. Half of Ella's score is 36 items / 2 = <<36/2=18>>18 items. So, Marion's score is 18 items + 6 items = <<18+6=24>>24 items."

We can see that the dataset has a field `gold_resoning`, which already provides reasoning. Since this is what we want to automate, let's delete these for the training and validation datasets.

In [31]:
# Iterate through datasets and modify the dicts
for dataset in [train, val]:
    for example in dataset:
        example["gold_reasoning"] = ""

In [32]:
train[0].gold_reasoning

''

## Defining the signature

Signatures allow you tell the LM what it needs to do, rather than specify how we should ask the LM to do it.

In [33]:
class GSM8KSignature(dspy.Signature):
    """Answer math problems with numbers or short phrases."""

    question = dspy.InputField()
    answer = dspy.OutputField(desc="Usually a number or short phrase.")

Now we can use this signature to run a test with Gemini.

In [34]:
generate_answer = dspy.Predict(GSM8KSignature)
pred = generate_answer(question=test[0].question)

print(f"Question: {test[0].question}")
print(f"Predicted Answer: {pred.answer}")
print(f"Actual Answer: {test[0].answer}")

Question: Amber, Micah, and Ahito ran 52 miles in total. Amber ran 8 miles. Micah ran 3.5 times what Amber ran. How many miles did Ahito run?
Predicted Answer: Answer: 27 miles
Actual Answer: 16


In [35]:
dspy_gemini_pro.inspect_history(n=1)




Answer math problems with numbers or short phrases.

---

Follow the following format.

Question: ${question}
Answer: Usually a number or short phrase.

---

Question: Amber, Micah, and Ahito ran 52 miles in total. Amber ran 8 miles. Micah ran 3.5 times what Amber ran. How many miles did Ahito run?
Answer:[32mAnswer: 27 miles[0m





'\n\n\nAnswer math problems with numbers or short phrases.\n\n---\n\nFollow the following format.\n\nQuestion: ${question}\nAnswer: Usually a number or short phrase.\n\n---\n\nQuestion: Amber, Micah, and Ahito ran 52 miles in total. Amber ran 8 miles. Micah ran 3.5 times what Amber ran. How many miles did Ahito run?\nAnswer:\x1b[32mAnswer: 27 miles\x1b[0m\n\n\n'

Similar to above Gemini didn't get this one right. Let's evaluate Gemini of the test dataset to establish a baseline.

## Model evaluation with zero shot

To run the evaluation programmatically we define a DSPy module These modules abstract a prompting technique (like chain of thought or ReAct). Crucially, they are generalized to handle any DSPy Signature.

In [36]:
class GSM8KModule(dspy.Module):
    def __init__(self):
        super().__init__()
        # here we use the dspy.Predict module which uses zero shot prompting to generate answers
        self.prog = dspy.Predict(GSM8KSignature)

    def forward(self, question):
        return self.prog(question=question)

In [37]:
gsm8k_zero_shot = GSM8KModule()

In [38]:
from dspy.evaluate import Evaluate

NUM_THREADS = 4 # number of threads to use for parallel processing
evaluate = Evaluate(
    devset=test, # the test set
    metric=gsm8k_metric, # the metric to use -> this will convert responses to integers to compare with the gold answers
    num_threads=NUM_THREADS,
    display_progress=True,
    display_table=20, # how many rows to display
)

In [39]:
evaluate(gsm8k_zero_shot)

Average Metric: 5.00 / 20 (25.0%): 100%|██████████| 20/20 [00:12<00:00,  1.63it/s]

2024/12/16 05:05:48 INFO dspy.evaluate.evaluate: Average Metric: 5 / 20 (25.0%)





Unnamed: 0,question,gold_reasoning,example_answer,pred_answer,gsm8k_metric
0,"Amber, Micah, and Ahito ran 52 miles in total. Amber ran 8 miles. ...",Amber ran <<8=8>>8 miles. Micah ran 3.5 * 8 miles = <<3.5*8=28>>28...,16,"Question: Amber, Micah, and Ahito ran 52 miles in total. Amber ran...",
1,Miguel uses 2 pads of paper a week for his drawing. If there are 3...,Miguel uses 30 x 2 = <<30*2=60>>60 sheets of paper every week. The...,240,Question: Miguel uses 2 pads of paper a week for his drawing. If t...,
2,"At a certain grade level, three-fourths of students have a desktop...",Twenty students represent 1 - 3/4 = 1/4 of the students at that le...,80,"Question: At a certain grade level, three-fourths of students have...",
3,Comet Halley orbits the sun every 75 years. Bill's dad saw the Com...,Bill saw the Comet for the second time when he was 30 years * 3= <...,15,Answer: 75,
4,Tom plants 10 trees a year. Every year he also chops down 2 trees ...,He gets 10-2=<<10-2=8>>8 new trees a year After 10 years he has 8*...,91,Answer: 100 trees. Here's the breakdown: * Starts with 50 trees. *...,
5,John picks 4 bananas on Wednesday. Then he picks 6 bananas on Thur...,"Combining Wednesday and Thursday, John has 4 bananas + 6 bananas =...",22,Answer: 20,
6,Peyton scheduled after-work activities of a one hour yoga class on...,Peyton’s cooking class will last 3 * 1 = <<3*1=3>>3 hours. The mus...,8,Answer: 8.5 hours,✔️ [True]
7,Ben has 4 tubes of blue paint and 3 tubes of yellow paint. Jasper ...,Jasper has 4/2= <<4/2=2>>2 tubes of blue paint Jasper has 3*3=<<3*...,11,Answer: 8,
8,Elaina is holding the final concert in her tour. To celebrate her ...,"The concert, minus the encore, lasted for 65-minute concert – 15-m...",25,25 minutes,✔️ [True]
9,Hannah slips on a banana peel and breaks her arm. The doctor charg...,First find the length of the visit in hours: 30 minutes / 60 minut...,482,$760,


25.0

# Bootstrapping few shot examples

Now we will leverage Gemini Ultra to bootstrap few shot examples which will (hopefully) improve Gemini Pro's performance on the test dataset. With Gemini Ultra we will create a few reasoning examples which we can include in the prompt that we will eventually send to Gemini Pro. Ultra will produce a few candidates and test them on a validation dataset using the `gsm8k_metric`, i.e. the metric we want to optimise for. Once the best candidates have been identified these examples will then be used to create a few shot prompt.

First we define a Chain of Thought module:

In [40]:
class ZeroShotCoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought(
            GSM8KSignature,
        )

    def forward(self, question):
        return self.prog(question=question)

In [41]:
from dspy.teleprompt import BootstrapFewShotWithRandomSearch

Now we can start the bootstrapping:

In [42]:
from datetime import datetime

RUN_FROM_SCRATCH = True
bootstrapped_demos = 8 # how many examples are randomly being used from the training dataset
labeled_demos = 3 # how many examples will be in final prompt
candidate_programs = 2 # how many candidates will be created and evaluated (equivalent to epochs)
teacher_model_id = "gemini-1.0-pro"

if RUN_FROM_SCRATCH:
    dspy_gemini_ultra = dspy.GoogleVertexAI(
        teacher_model_id,
        temperature=0,
    )
    dspy.settings.configure(lm=dspy_gemini_ultra, timeout=0)
    config = dict(
        max_bootstrapped_demos=bootstrapped_demos,
        max_labeled_demos=labeled_demos,
        num_candidate_programs=candidate_programs,
        num_threads=4,
        stop_at_score=100.0,
    )
    bootstrap_optimizer = BootstrapFewShotWithRandomSearch(
        metric=gsm8k_metric, **config
    )
    cot_fewshot = bootstrap_optimizer.compile(ZeroShotCoT(), trainset=train, valset=val)

    # save the bootstrap demonstrations for future use
    timestamp_str = datetime.now().strftime("%Y%m%d-%H%M%S")
    filename = f"{timestamp_str}_{teacher_model_id}_{bootstrapped_demos}_{labeled_demos}_{candidate_programs}.json"
    cot_fewshot.save(filename)
else:
    cot_fewshot = ZeroShotCoT()
    cot_fewshot.load("20240403-173150_gemini-1.0-ultra_8_3_2.json")

Going to sample between 1 and 8 traces per predictor.
Will attempt to bootstrap 2 candidate sets.
Average Metric: 15.00 / 20 (75.0%): 100%|██████████| 20/20 [00:11<00:00,  1.75it/s]

2024/12/16 05:28:18 INFO dspy.evaluate.evaluate: Average Metric: 15 / 20 (75.0%)



New best score: 75.0 for seed -3
Scores so far: [75.0]
Best score so far: 75.0
Average Metric: 8.00 / 20 (40.0%): 100%|██████████| 20/20 [00:07<00:00,  2.55it/s]

2024/12/16 05:28:26 INFO dspy.evaluate.evaluate: Average Metric: 8 / 20 (40.0%)



Scores so far: [75.0, 40.0]
Best score so far: 75.0


 32%|███▏      | 19/60 [00:28<01:01,  1.51s/it]


Bootstrapped 8 full traces after 19 examples for up to 1 rounds, amounting to 19 attempts.
Average Metric: 13.00 / 20 (65.0%): 100%|██████████| 20/20 [00:08<00:00,  2.39it/s]

2024/12/16 05:29:03 INFO dspy.evaluate.evaluate: Average Metric: 13 / 20 (65.0%)



Scores so far: [75.0, 40.0, 65.0]
Best score so far: 75.0


 42%|████▏     | 25/60 [00:37<00:52,  1.49s/it]


Bootstrapped 7 full traces after 25 examples for up to 1 rounds, amounting to 25 attempts.
Average Metric: 16.00 / 20 (80.0%): 100%|██████████| 20/20 [00:08<00:00,  2.38it/s]

2024/12/16 05:29:49 INFO dspy.evaluate.evaluate: Average Metric: 16 / 20 (80.0%)



New best score: 80.0 for seed 0
Scores so far: [75.0, 40.0, 65.0, 80.0]
Best score so far: 80.0


 17%|█▋        | 10/60 [00:14<01:13,  1.48s/it]


Bootstrapped 3 full traces after 10 examples for up to 1 rounds, amounting to 10 attempts.
Average Metric: 16.00 / 20 (80.0%): 100%|██████████| 20/20 [00:08<00:00,  2.36it/s]

2024/12/16 05:30:12 INFO dspy.evaluate.evaluate: Average Metric: 16 / 20 (80.0%)



Scores so far: [75.0, 40.0, 65.0, 80.0, 80.0]
Best score so far: 80.0
5 candidate programs found.


After this step we have our examples ready, and we can test Gemini Pro on the same test dataset as above.

In [43]:
dspy.settings.configure(lm=dspy_gemini_pro, timeout=0)

In [44]:
evaluate(cot_fewshot)

Average Metric: 18.00 / 20 (90.0%): 100%|██████████| 20/20 [00:08<00:00,  2.23it/s]

2024/12/16 05:30:21 INFO dspy.evaluate.evaluate: Average Metric: 18 / 20 (90.0%)





Unnamed: 0,question,gold_reasoning,example_answer,rationale,pred_answer,gsm8k_metric
0,"Amber, Micah, and Ahito ran 52 miles in total. Amber ran 8 miles. ...",Amber ran <<8=8>>8 miles. Micah ran 3.5 * 8 miles = <<3.5*8=28>>28...,16,calculate how many miles Ahito ran. We know that Amber ran 8 miles...,16,✔️ [True]
1,Miguel uses 2 pads of paper a week for his drawing. If there are 3...,Miguel uses 30 x 2 = <<30*2=60>>60 sheets of paper every week. The...,240,calculate the number of sheets of paper Miguel uses every month. W...,240,✔️ [True]
2,"At a certain grade level, three-fourths of students have a desktop...",Twenty students represent 1 - 3/4 = 1/4 of the students at that le...,80,calculate the total number of students. We know that three-fourths...,80 students,✔️ [True]
3,Comet Halley orbits the sun every 75 years. Bill's dad saw the Com...,Bill saw the Comet for the second time when he was 30 years * 3= <...,15,calculate Bill's age when he saw the Comet for the first time. We ...,315,
4,Tom plants 10 trees a year. Every year he also chops down 2 trees ...,He gets 10-2=<<10-2=8>>8 new trees a year After 10 years he has 8*...,91,calculate the number of trees Tom has left after 10 years. 1. Afte...,91,✔️ [True]
5,John picks 4 bananas on Wednesday. Then he picks 6 bananas on Thur...,"Combining Wednesday and Thursday, John has 4 bananas + 6 bananas =...",22,"calculate the total number of bananas John has. On Wednesday, John...",22,✔️ [True]
6,Peyton scheduled after-work activities of a one hour yoga class on...,Peyton’s cooking class will last 3 * 1 = <<3*1=3>>3 hours. The mus...,8,calculate the total number of hours Peyton's after-work activities...,8 hours,✔️ [True]
7,Ben has 4 tubes of blue paint and 3 tubes of yellow paint. Jasper ...,Jasper has 4/2= <<4/2=2>>2 tubes of blue paint Jasper has 3*3=<<3*...,11,calculate the number of tubes of paint Jasper has. Ben has 4 tubes...,11,✔️ [True]
8,Elaina is holding the final concert in her tour. To celebrate her ...,"The concert, minus the encore, lasted for 65-minute concert – 15-m...",25,calculate the runtime of Elaina's usual concerts. We know that the...,25,✔️ [True]
9,Hannah slips on a banana peel and breaks her arm. The doctor charg...,First find the length of the visit in hours: 30 minutes / 60 minut...,482,calculate the total cost of the doctor's visit. We first need to c...,482,✔️ [True]


90.0

Nice, this improved Gemini Pro's performance significantly from 55% :)

In [45]:
dspy_gemini_pro.inspect_history(n=1)




Answer math problems with numbers or short phrases.

---

Follow the following format.

Question: ${question}
Reasoning: Let's think step by step in order to ${produce the answer}. We ...
Answer: Usually a number or short phrase.

---

Question: A tank contains 6000 liters of water, 2000 liters evaporated, and then 3500 liters were drained by Bob. How many liters are in the tank if it now rains for 30 minutes and every 10 minutes 350 liters of rain are added to the tank?
Reasoning: Let's think step by step in order to calculate the number of liters in the tank. We start with 6000 liters, then subtract the 2000 liters that evaporated, and then subtract the 3500 liters that were drained by Bob. This leaves us with 500 liters. Then, we add the rain that fell for 30 minutes. Every 10 minutes, 350 liters of rain are added to the tank, so for 30 minutes, 1050 liters of rain are added to the tank. This gives us a final total of 1550 liters in the tank.
Answer: 1550

---

Question: Louise i

"\n\n\nAnswer math problems with numbers or short phrases.\n\n---\n\nFollow the following format.\n\nQuestion: ${question}\nReasoning: Let's think step by step in order to ${produce the answer}. We ...\nAnswer: Usually a number or short phrase.\n\n---\n\nQuestion: A tank contains 6000 liters of water, 2000 liters evaporated, and then 3500 liters were drained by Bob. How many liters are in the tank if it now rains for 30 minutes and every 10 minutes 350 liters of rain are added to the tank?\nReasoning: Let's think step by step in order to calculate the number of liters in the tank. We start with 6000 liters, then subtract the 2000 liters that evaporated, and then subtract the 3500 liters that were drained by Bob. This leaves us with 500 liters. Then, we add the rain that fell for 30 minutes. Every 10 minutes, 350 liters of rain are added to the tank, so for 30 minutes, 1050 liters of rain are added to the tank. This gives us a final total of 1550 liters in the tank.\nAnswer: 1550\n\n---