In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/llama-3/transformers/8b-chat-hf/1/model.safetensors.index.json
/kaggle/input/llama-3/transformers/8b-chat-hf/1/model-00003-of-00004.safetensors
/kaggle/input/llama-3/transformers/8b-chat-hf/1/config.json
/kaggle/input/llama-3/transformers/8b-chat-hf/1/LICENSE
/kaggle/input/llama-3/transformers/8b-chat-hf/1/model-00001-of-00004.safetensors
/kaggle/input/llama-3/transformers/8b-chat-hf/1/model.py
/kaggle/input/llama-3/transformers/8b-chat-hf/1/USE_POLICY.md
/kaggle/input/llama-3/transformers/8b-chat-hf/1/tokenizer.json
/kaggle/input/llama-3/transformers/8b-chat-hf/1/tokenizer_config.json
/kaggle/input/llama-3/transformers/8b-chat-hf/1/example_text_completion.py
/kaggle/input/llama-3/transformers/8b-chat-hf/1/test_tokenizer.py
/kaggle/input/llama-3/transformers/8b-chat-hf/1/requirements.txt
/kaggle/input/llama-3/transformers/8b-chat-hf/1/tokenizer.py
/kaggle/input/llama-3/transformers/8b-chat-hf/1/model-00004-of-00004.safetensors
/kaggle/input/llama-3/transformers/8b-chat-hf

In [2]:
!pip install -q -U -i https://pypi.org/simple/ bitsandbytes
!pip install -q -U accelerate

In [3]:
import kagglehub

# Download latest version
path = kagglehub.model_download("metaresearch/llama-3/transformers/8b-chat-hf")

print("Path to model files:", path)

Attaching model 'metaresearch/llama-3/transformers/8b-chat-hf' to your Kaggle notebook...


Path to model files: /kaggle/input/llama-3/transformers/8b-chat-hf/1


In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=False,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)
device = torch.device("cuda:0")
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.float16, device_map=device, quantization_config=bnb_config)
tokenizer = AutoTokenizer.from_pretrained(path)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [5]:
import random
from datasets import load_dataset

# Load the GSM8K dataset
gsm8k = load_dataset("gsm8k", "main", split="train")

# Sample 5 random examples
random.seed(42)  # Set seed for reproducibility
sampled_examples = random.sample(list(gsm8k), 5)

Downloading readme:   0%|          | 0.00/7.94k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.31M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/419k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/7473 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1319 [00:00<?, ? examples/s]

In [6]:
# Define the problem texts and answers
problems = [example['question'] for example in sampled_examples]
actual_answers = [int(example['answer'].split()[-1]) for example in sampled_examples]

In [7]:
problems, actual_answers

(['For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?',
  'Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?',
  'Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?',
  "James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in th

In [8]:
eot_token_id = tokenizer.encode('<|eot_id|>')[0] 
eot_token_id

128009

## Basic Prompting

In [9]:
generation_params = {
    "max_new_tokens": 512,
    #"max_length": 50,
    "pad_token_id": tokenizer.pad_token_id,
    "num_return_sequences": 1,
}
correct = 0
for problem, answer in zip(problems, actual_answers):
    message = [
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            {problem}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    generated_ids = model.generate(
        input_ids=input_ids.to(model.device),
        **generation_params,
        eos_token_id=eot_token_id
    )
    
    response = tokenizer.decode(generated_ids[0])[len(prompt) :]
    predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
    correct += int(predicted_answer == answer)
    
    print(f"Problem: {problem}")
    print(f"Generated Text: {response}\n")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/5")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
2024-07-10 08:10:46.621279: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-07-10 08:10:46.621404: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-07-10 08:10:46.748033: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Plea

Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?
Generated Text: Let's break down the problem step by step:

* For every 12 cans, you receive $0.50. To find out how many times 12 goes into 144, divide 144 by 12: 144 ÷ 12 = 12. So, you would receive $0.50 × 12 = $6.00 for recycling cans.
* For every 5 kilograms of newspapers, you receive $1.50. To find out how many times 5 goes into 20, divide 20 by 5: 20 ÷ 5 = 4. So, you would receive $1.50 × 4 = $6.00 for collecting newspapers.
* Add the money received from recycling cans and collecting newspapers: $6.00 + $6.00 = $12.00

ANSWER: 12<|eot_id|>

True answer: 12


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?
Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so if Matthew picked 36 strawberries, Natalie picked half of that, which is 36/2 = 18 strawberries.
4. In total, they picked 16 (Betty) + 36 (Matthew) + 18 (Natalie) = 70 strawberries.
5. They used their strawberries to make jam. One jar of jam uses 7 strawberries, so they can make 70/7 = 10 jars of jam.
6. They sold each jar of jam at $4, so they made 10 x $4 = $40.

ANSWER: 40<|eot_id|>

True answer: 40


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?
Generated Text: Let's break down the problem step by step:

1. The stack of books is 12 inches thick, and each inch is equivalent to 80 pages. So, the total number of pages in the stack is:

12 inches x 80 pages/inch = 960 pages

2. Jack has 6 books. To find the average number of pages per book, we can divide the total number of pages by the number of books:

960 pages ÷ 6 books = 160 pages/book

So, each book has an average of 160 pages.

ANSWER: 160<|eot_id|>

True answer: 160


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?
Generated Text: Let's break down the problem step by step:

1. James starts with 500 Legos.
2. He uses half of them, which is 500 / 2 = 250 Legos.
3. He puts the remaining Legos back in the box, except for 5 missing pieces.
4. To find the number of Legos in the box, subtract the 5 missing pieces from the total number of Legos used: 500 - 250 - 5 = 245 Legos.

ANSWER: 245<|eot_id|>

True answer: 245
Problem: Ines had $20 in her purse. She bought 3 pounds of peaches, which are $2 per pound at the local farmers’ market. How much did she have left?
Generated Text: Let's solve the problem step by step!

Ines had $20 initially. She bought 3 pounds of pe

### Analysis

**Accuracy**

Although 5 samples may be not enough to make strong conclusions, but, according to the results, LLaMA-8b managed to solve all 5 mathematical tasks correctly

**Reasoning**

For each problem, the model gives a clear reasoning path. All intermediate steps are logical and lead to correct intermediate results

**Consistency**

Although the prompt clearly states in which format the final answer should be generated (to make the parsing easy), but the model starts every response with "Let's break down the problem step by step". However, the consistency of the format of how the intermediate steps are written, may be improved (sometimes, it is a bullteted list, sometimes it is a numbered list, sometimes it is a plain text)

**Conclusion**
LLaMA-8b is too powerful model to properly evaluate the strength of prompt engeeniring techniques.

## Prompt Engineering with Context

In [10]:
correct = 0
contexts = [
    "Imagine you're part of a community initiative focused on recycling. ",
    "Consider a scenario where three friends, Betty, Matthew, and Natalie, engage in a fun activity of picking strawberries. ",
    "Envision a scenario where Jack is an avid reader. ",
    "Picture James, who loves building with Legos. ",
    "Ines goes to the local farmers’ market intending to buy some fresh peaches."
]
measurement_units = ["$", "$", "pages", "Lego", "$"]
for context, problem, unit, answer in zip(contexts, problems, measurement_units, actual_answers):
    message = [
        {
            "role": "user",
            "content": f"""
            {context}
            Please solve the following math problem:
            {problem}
            Measurement unit: {unit}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    generated_ids = model.generate(
        input_ids=input_ids.to(model.device),
        **generation_params,
        eos_token_id=eot_token_id
    )
    
    response = tokenizer.decode(generated_ids[0])[len(prompt) :]
    predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
    correct += int(predicted_answer == answer)
    
    print(f"Problem: {problem}")
    print(f"Generated Text: {response}\n")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/5")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?
Generated Text: Let's break down the problem step by step!

For the cans, we have 144 cans. Since we receive $0.50 for every 12 cans, we can divide 144 by 12 to find the number of times we can recycle 12 cans:

144 ÷ 12 = 12

So, we can recycle 12 times. Since we receive $0.50 for each recycling, we multiply the number of times we can recycle (12) by the amount we receive ($0.50):

12 × $0.50 = $6.00

For the newspapers, we have 20 kilograms. Since we receive $1.50 for every 5 kilograms, we can divide 20 by 5 to find the number of times we can collect 5 kilograms:

20 ÷ 5 = 4

So, we can collect 4 times. Since we receive $1.50 for each collection, we multiply the number of times we can collect (4) by the amount we receive ($1.50):

4 × $1.50 = $6.00

Now, let's add the m

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?
Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so he picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so if Matthew picked x strawberries, Natalie picked x/2 strawberries. Since Matthew picked 36 strawberries, we can set up the equation x = 36 and solve for x/2, which gives us Natalie's strawberry count as 36/2 = 18 strawberries.
4. Total strawberries picked = Betty's strawberries + Matthew's strawberries + Natalie's strawberries = 16 + 36 + 18 = 70 strawberries.
5. To make jam, they used 7 strawberries per jar. To find the number of jars they can make, divid

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?
Generated Text: Let's break down the problem step by step!

Jack has a stack of books that is 12 inches thick. Since 80 pages is one inch thick, we can calculate the total number of pages in the stack:

12 inches × 80 pages/inch = 960 pages

Jack has 6 books, so to find the average number of pages in each book, we'll divide the total number of pages by 6:

960 pages ÷ 6 books = 160 pages/book

So, each book has an average of 160 pages.

ANSWER: 160<|eot_id|>

True answer: 160


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?
Generated Text: Let's break down the problem step by step:

1. James starts with 500 Legos.
2. He uses half of them, which is 500 / 2 = 250 Legos.
3. He puts the remaining Legos back in the box, except for 5 missing pieces.
4. To find the number of Legos in the box, subtract the number of Legos used (250) from the original number (500), and then subtract the 5 missing pieces: 500 - 250 - 5 = 245 Legos.

ANSWER: 245<|eot_id|>

True answer: 245
Problem: Ines had $20 in her purse. She bought 3 pounds of peaches, which are $2 per pound at the local farmers’ market. How much did she have left?
Generated Text: Let's solve the problem step by step!

Ines

### Analysis

**Accuracy**

Although 5 samples may be not enough to make strong conclusions, but, according to the results, LLaMA-8b managed to solve all 5 mathematical tasks correctly

**Reasoning**

For each problem, the model gives a clear reasoning path. All intermediate steps are logical and lead to correct intermediate results

**Consistency**

Although the prompt clearly states in which format the final answer should be generated (to make the parsing easy), but the model starts every response with "Let's break down the problem step by step". However, the consistency of the format of how the intermediate steps are written, may be improved (sometimes, it is a bullteted list, sometimes it is a numbered list, sometimes it is a plain text)

**Conclusion**
Seems that this technique did not change anything

## Self-Consistency

In [11]:
import numpy as np
generation_params = {
    "max_new_tokens": 512,
    #"max_length": 50,
    "pad_token_id": tokenizer.pad_token_id,
    "num_return_sequences": 1,
    "top_p": 0.95,  # Nucleus sampling
    "top_k": 50,    # Top-k sampling
    "do_sample": True  # Enable sampling
}
num_responses = 5
correct = 0
for problem, answer in zip(problems, actual_answers):
    message = [
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            {problem}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    preds = []
    print(f"Problem: {problem}")
    for _ in range(num_responses):
        generated_ids = model.generate(
            input_ids=input_ids.to(model.device),
            **generation_params,
            eos_token_id=eot_token_id
        )
        response = tokenizer.decode(generated_ids[0])[len(prompt) :]
        predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
        correct += int(predicted_answer == answer)
        print(f"Generated Text: {response}\n")
        preds.append(predicted_answer)
    print(f"Mean: {np.mean(preds)}")
    print(f"Std: {np.std(preds)}")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/25")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's solve the problem step by step.

For the cans, you receive $0.50 for every 12 cans. Since you have 144 cans, you can divide 144 by 12 to find the number of groups of 12 cans:

144 ÷ 12 = 12

Since you have 12 groups of 12 cans, you can multiply the number of groups by the amount you receive per group:

12 × $0.50 = $6.00

For the newspapers, you receive $1.50 for every 5 kilograms. Since you have 20 kilograms, you can divide 20 by 5 to find the number of groups of 5 kilograms:

20 ÷ 5 = 4

Since you have 4 groups of 5 kilograms, you can multiply the number of groups by the amount you receive per group:

4 × $1.50 = $6.00

To find the total amount of money you receive, add the amount for the cans and the amount for the newspapers:

$6.00 + $6.00 = $12.00

ANSWER: 12<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem:

For every 12 cans recycled, you receive $0.50. To find the total amount of money received from cans, you can divide the total number of cans (144) by 12 and multiply by $0.50:

144 cans ÷ 12 = 12 groups of 12 cans = 12 x $0.50 = $6.00

For every 5 kilograms of newspapers, you receive $1.50. To find the total amount of money received from newspapers, you can divide the total weight of newspapers (20 kg) by 5 and multiply by $1.50:

20 kg ÷ 5 = 4 groups of 5 kg = 4 x $1.50 = $6.00

Now, add the money received from cans and newspapers:

$6.00 + $6.00 = $12.00

ANSWER: 12<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

* For every 12 cans recycled, you receive $0.50. To find the total amount of money received from recycling cans, you can divide the total number of cans (144) by 12 and multiply by $0.50:
144 ÷ 12 = 12 (cans per group)
12 × $0.50 = $6.00

* For every 5 kilograms of newspapers, you receive $1.50. To find the total amount of money received from collecting newspapers, you can divide the total number of kilograms (20) by 5 and multiply by $1.50:
20 ÷ 5 = 4 (kilograms per group)
4 × $1.50 = $6.00

* Add the total amounts received from recycling cans and collecting newspapers:
$6.00 + $6.00 = $12.00

ANSWER: 12<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break this problem down step by step:

1. For every 12 cans recycled, you receive $0.50. To find out how many times 12 goes into 144 cans, divide 144 by 12:

144 ÷ 12 = 12

Since 12 cans are equal to $0.50, you would receive:

12 x $0.50 = $6.00

2. For every 5 kilograms of newspapers, you receive $1.50. To find out how many times 5 goes into 20 kilograms, divide 20 by 5:

20 ÷ 5 = 4

Since 5 kilograms are equal to $1.50, you would receive:

4 x $1.50 = $6.00

3. To find the total amount of money you would receive, add the amounts from recycling cans and newspapers:

$6.00 (from cans) + $6.00 (from newspapers) = $12.00

ANSWER: 12<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

* For every 12 cans, you receive $0.50. To find out how many times 12 goes into 144, divide 144 by 12: 144 ÷ 12 = 12. Since 12 cans gives you $0.50, you would get 0.50 × 12 = $6.00 for recycling 144 cans.
* For every 5 kilograms of newspapers, you receive $1.50. You have 20 kilograms of newspapers, so multiply 20 by 1.50 to get: 20 × 1.50 = $30.00
* To find the total amount of money you would receive, add the amounts: $6.00 + $30.00 = $36.00

ANSWER: 36<|eot_id|>

Mean: 16.8
Std: 9.6
True answer: 12
Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Matthew also picked twice as many strawberries as Natalie, so if Matthew picked x strawberries, Natalie picked x/2 strawberries. Since Matthew picked 36 strawberries, we can set up the equation:
x = 36
x/2 = 36/2 = 18
So, Natalie picked 18 strawberries.
4. Total number of strawberries picked by Betty, Matthew, and Natalie is:
16 + 36 + 18 = 70
5. Each jar of jam uses 7 strawberries, so the total number of jars they can make is:
70 / 7 = 10 jars
6. They sold each jar at $4, so the total amount of money they made is:
10 jars x $4/jar = $40

ANSWER: 40<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so if Matthew picked x strawberries, Natalie picked x/2 strawberries. Since Matthew picked 36 strawberries, we can set up the equation x/2 = 36/2, which simplifies to x = 36.
4. So, Matthew and Natalie picked 36 and 18 strawberries, respectively (since 36/2 = 18).
5. In total, they picked 16 + 36 + 18 = 70 strawberries.
6. Since one jar of jam uses 7 strawberries, they can make 70/7 = 10 jars of jam.
7. They sold each jar of jam for $4, so they made 10 x 4 = 40 dollars.

ANSWER: 40<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step!

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Since Matthew picked twice as many strawberries as Natalie, Natalie picked half of what Matthew picked. So, Natalie picked 36 ÷ 2 = 18 strawberries.
4. In total, they picked 16 + 36 + 18 = 70 strawberries.
5. One jar of jam uses 7 strawberries. Let's divide the total number of strawberries by 7 to find out how many jars of jam they can make: 70 ÷ 7 = 10 jars.
6. Each jar is sold at $4, so they can make 10 × $4 = $40 from the strawberries.

ANSWER: 40<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so if Matthew picked 36 strawberries, Natalie picked half of that, which is 36 / 2 = 18 strawberries.
4. Total strawberries picked: 16 (Betty) + 36 (Matthew) + 18 (Natalie) = 70 strawberries.
5. To make jam, they need 7 strawberries per jar. They can make 70 / 7 = 10 jars of jam.
6. Each jar is sold at $4, so they can make 10 x $4 = $40 from the jam.

ANSWER: 40<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so he picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so if we call the number of strawberries Natalie picked "x", we can set up the equation: x × 2 = 36. Solving for x, we get x = 18.
4. The total number of strawberries picked is: 16 (Betty) + 36 (Matthew) + 18 (Natalie) = 70 strawberries.
5. One jar of jam uses 7 strawberries, so the total number of jars they can make is: 70 strawberries ÷ 7 strawberries per jar = 10 jars.
6. They sold each jar at $4, so the total amount of money they made is: 10 jars × $4 per jar = $40.

ANSWER: 40<|eot_id|>

Mean: 40.0
Std: 0.0
True answer: 40
Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break this down step by step!

1. We know that the stack of books is 12 inches thick.
2. We know that 80 pages is equal to 1 inch thick.
3. To find the total number of pages in the stack, we can multiply the thickness of the stack (12 inches) by the number of pages per inch (80 pages/inch):

12 inches × 80 pages/inch = 960 pages

So, the stack has 960 pages.

4. Jack has 6 books. To find the average number of pages per book, we can divide the total number of pages (960) by the number of books (6):

960 pages ÷ 6 books = 160 pages/book

So, each book has 160 pages on average.

ANSWER: 160<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break it down step by step!

1. The stack of books is 12 inches thick, which means it has a total thickness of 12 x 80 = 960 pages.
2. Jack has 6 books, so to find the total number of pages, we can divide the total thickness by 6: 960 pages ÷ 6 books = 160 pages per book.

So, the average number of pages in each book is 160.

ANSWER: 160<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. The stack of books is 12 inches thick.
2. 80 pages is equal to 1 inch thick.
3. To find the total number of pages in the stack, we can multiply the thickness of the stack (12 inches) by the number of pages per inch (80):

12 inches × 80 pages/inch = 960 pages
4. Since Jack has 6 books, we need to divide the total number of pages (960) by the number of books (6) to find the average number of pages per book:

960 pages ÷ 6 books = 160 pages/book

So, the final answer is:

ANSWER: 160<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. Jack has a stack of books that is 12 inches thick.
2. 80 pages is one inch thick, so the total number of pages in the stack is:
12 inches x 80 pages/inch = 960 pages
3. Jack has 6 books, so to find the average number of pages in each book, we divide the total number of pages by 6:
960 pages ÷ 6 books = 160 pages/book

So, each book has an average of 160 pages.

ANSWER: 160<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. Jack's stack of books is 12 inches thick, which means it contains 12 * 80 = 960 pages (since 80 pages is one inch thick).
2. Jack has 6 books, so to find the total number of pages, multiply the number of pages in the stack by 1 (since the stack contains 1 of the books): 960 * 1 = 960 pages.
3. To find the average number of pages per book, divide the total number of pages by the number of books: 960 / 6 = 160.

ANSWER: 160<|eot_id|>

Mean: 160.0
Std: 0.0
True answer: 160
Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break this problem down step by step!

1. James starts with 500 Legos and uses half of them. So, he uses 500/2 = 250 pieces.
2. The remaining pieces are put away in the box, except for 5 missing pieces. So, the number of leftover pieces is:
500 - 250 - 5 = 245

The answer is:
ANSWER: 245<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break this problem down step by step!

James starts with 500 Legos. He uses half of them to build his castle, which means he uses 500/2 = 250 pieces.

The remaining pieces are the leftover pieces that he puts back in the box. However, he loses 5 pieces along the way, so the number of leftover pieces that he puts back in the box is:

500 - 250 = 250 pieces (initial number of pieces minus the number used)

minus 5 missing pieces = 250 - 5 = 245 pieces

The final answer is:

ANSWER: 245<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. James starts with 500 Legos.
2. He uses half of them, which is 500 / 2 = 250 Legos.
3. He puts away the rest of the Legos, except for 5 missing pieces. So, he puts away 500 - 250 - 5 = 245 Legos.

The answer is:
ANSWER: 245<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem:

James starts with 500 Legos. He uses half of them, which is:

500 ÷ 2 = 250

So, he uses 250 Legos.

The leftover pieces are:

500 - 250 = 250

James puts all of the leftover pieces back in the box, except for 5 missing pieces. Therefore, the number of Legos in the box at the end is:

250 - 5 = 245

ANSWER: 245<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem step by step:

1. James starts with 500 Legos and uses half of them. To find half of 500, we can divide 500 by 2:

500 ÷ 2 = 250

So, James uses 250 pieces to build the castle.

2. This leaves 500 - 250 = 250 pieces remaining.

3. James puts all the leftover pieces back in the box, except for the 5 missing pieces. To find the total number of pieces in the box, we subtract the 5 missing pieces from the remaining 250 pieces:

250 - 5 = 245

ANSWER: 245<|eot_id|>

Mean: 245.0
Std: 0.0
True answer: 245
Problem: Ines had $20 in her purse. She bought 3 pounds of peaches, which are $2 per pound at the local farmers’ market. How much did she have left?


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's solve the problem!

Ines started with $20.

She bought 3 pounds of peaches, which cost $2 per pound. To find the total cost of the peaches, multiply the number of pounds by the price per pound:

3 pounds × $2 per pound = $6

So, Ines spent $6 on peaches. To find out how much she has left, subtract the cost of the peaches from the initial amount of money she had:

$20 - $6 = $14

ANSWER: 14<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break down the problem:

Ines had $20 initially.
She bought 3 pounds of peaches at $2 per pound. To find the total cost, multiply the number of pounds by the price per pound:

3 pounds x $2/pound = $6

So, she spent $6 on peaches.
To find out how much she has left, subtract the cost from her initial amount:

$20 - $6 = $14

ANSWER: 14<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's solve the problem step by step:

Ines had $20 initially.

She bought 3 pounds of peaches at $2 per pound, so the total cost is:

3 pounds x $2 per pound = $6

She spent $6 on the peaches, so she now has:

$20 (initial amount) - $6 (amount spent) = $14

ANSWER: 14<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Generated Text: Let's break it down step by step!

Ines had $20 initially.
She bought 3 pounds of peaches at $2 per pound, so the total cost is:
3 pounds x $2 per pound = $6

Ines spent $6 on peaches, so she has:
$20 (initial amount) - $6 (spent) = $14

ANSWER: 14<|eot_id|>

Generated Text: Let's solve the problem step by step:

Ines starts with $20.

She buys 3 pounds of peaches, which cost $2 per pound. So, the total cost of the peaches is:

3 pounds x $2 per pound = $6

Ines pays for the peaches with the $20 she has, so she is left with:

$20 - $6 = $14

So, Ines has $14 left after buying the peaches.

ANSWER: 14<|eot_id|>

Mean: 14.0
Std: 0.0
True answer: 14
Correct answer: 24/25


### Analysis

**Accuracy**

Even with the randomness added, the model still got 25/25 correct answers.

**Reasoning**

For each problem, the model gives a clear reasoning path. All intermediate steps are logical and lead to correct intermediate results

**Consistency**

Although the prompt clearly states in which format the final answer should be generated (to make the parsing easy), but the model starts every response with "Let's break down the problem step by step".
However, with the randomness added, the model gives different format of response (reasoning) even for the same problem

**Conclusion**
Seems that this technique did not improve the accuracy (100% can not be improved), but it added more randomness which may not convenient when we want a fixed format of response, but may be beneficial when the user wants a creativity or variety in responses

## Zero-Shot and Few-Shot Learning

In [12]:
generation_params = {
    "max_new_tokens": 512,
    #"max_length": 50,
    "pad_token_id": tokenizer.pad_token_id,
    "num_return_sequences": 1,
}
correct = 0
for problem, answer in zip(problems, actual_answers):
    message = [
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            The price of buying a wooden toy at the new Craftee And Best store is $20, and the cost of buying a hat is $10. If Kendra went to the shop with a $100 bill and bought two wooden toys and three hats, calculate the change she received.
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
        {
        "role": "assistant",
        "content": "ANSWER: 30",
        },
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            James is trying to create a new breed of kittens with extra-long tails. Each generation of kittens he breeds has a tail 25% longer than the last generation. If the first generation has tails 16 cm long, how long are the third generation's tails?
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
        {
        "role": "assistant",
        "content": "ANSWER: 25",
        },
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            {problem}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    generated_ids = model.generate(
        input_ids=input_ids.to(model.device),
        **generation_params,
        eos_token_id=eot_token_id
    )
    
    response = tokenizer.decode(generated_ids[0])[len(prompt) :]
    predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
    correct += int(predicted_answer == answer)
    
    print(f"Problem: {problem}")
    print(f"Generated Text: {response}\n")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/5")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?
Generated Text: Let's calculate the amount of money you would receive:

Cans: 144 cans / 12 cans = 12 groups of 12 cans. You would receive 12 x $0.50 = $6.00

Newspapers: 20 kg / 5 kg = 4 groups of 5 kg. You would receive 4 x $1.50 = $6.00

Total amount of money received: $6.00 + $6.00 = $12.00

ANSWER: 12<|eot_id|>

True answer: 12


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?
Generated Text: Let's break down the problem step by step:

1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so Natalie picked 36 / 2 = 18 strawberries.
4. Total strawberries picked = 16 (Betty) + 36 (Matthew) + 18 (Natalie) = 70 strawberries.
5. One jar of jam uses 7 strawberries, so the total number of jars they can make is 70 / 7 = 10 jars.
6. Each jar is sold at $4, so the total money made is 10 jars x $4/jar = $40.

ANSWER: 40<|eot_id|>

True answer: 40


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?
Generated Text: Let's break it down step by step:

1. The stack of books is 12 inches thick, and each inch is 80 pages thick, so the total number of pages is:
12 inches x 80 pages/inch = 960 pages
2. Jack has 6 books, so to find the average number of pages per book, we divide the total number of pages by 6:
960 pages ÷ 6 books = 160 pages/book

ANSWER: 160<|eot_id|>

True answer: 160


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?
Generated Text: ANSWER: 245<|eot_id|>

True answer: 245
Problem: Ines had $20 in her purse. She bought 3 pounds of peaches, which are $2 per pound at the local farmers’ market. How much did she have left?
Generated Text: ANSWER: 10<|eot_id|>

True answer: 14
Correct answer: 4/5


### Analysis

**Accuracy**

Actually, the accuracy of the model became worse because the 2 examples do not have any reasonig, and in the last problem the model did not generate any reasoning which lead to the incorrect result

**Reasoning**

For almost all problems, the model gives a clear reasoning path. However, for the last problem the model does not reason at all because the chat template says so.

**Consistency**

The model gives different format of response for each problem.

**Conclusion**
Providing the the chat template which tells to return the answer directly only make the performance worse.

In [13]:
generation_params = {
    "max_new_tokens": 512,
    #"max_length": 50,
    "pad_token_id": tokenizer.pad_token_id,
    "num_return_sequences": 1,
}
correct = 0
for problem, answer in zip(problems, actual_answers):
    message = [
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            The price of buying a wooden toy at the new Craftee And Best store is $20, and the cost of buying a hat is $10. If Kendra went to the shop with a $100 bill and bought two wooden toys and three hats, calculate the change she received.
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
        {
        "role": "assistant",
        "content": """
1. The total price of hats bought is $10*3=$30
2. The total price of wooden toys bought is $20*2=$40
3. The total money spent is $30+$40=$70.
4. The change Kendra received is $100-$70=$30
ANSWER: 30
        """,
        },
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            James is trying to create a new breed of kittens with extra-long tails. Each generation of kittens he breeds has a tail 25% longer than the last generation. If the first generation has tails 16 cm long, how long are the third generation's tails?
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
        {
        "role": "assistant",
        "content": """
1. The second generation will have tails of length 16+0.25*16=20 cm
2. The third generation will have tails of length 20+0.25*20=25 cm
ANSWER: 25
        """,
        },
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            {problem}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    generated_ids = model.generate(
        input_ids=input_ids.to(model.device),
        **generation_params,
        eos_token_id=eot_token_id
    )
    
    response = tokenizer.decode(generated_ids[0])[len(prompt) :]
    predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
    correct += int(predicted_answer == answer)
    
    print(f"Problem: {problem}")
    print(f"Generated Text: {response}\n")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/5")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?
Generated Text: 1. The number of cans recycled is 144/12=12, so the money received from cans is 12*0.50=6
2. The number of kilograms of newspapers collected is 20/5=4, so the money received from newspapers is 4*1.50=6
3. The total money received is 6+6=12
ANSWER: 12<|eot_id|>

True answer: 12


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?
Generated Text: 1. Betty picked 16 strawberries.
2. Matthew picked 20 more strawberries than Betty, so Matthew picked 16 + 20 = 36 strawberries.
3. Matthew picked twice as many strawberries as Natalie, so Natalie picked 36/2 = 18 strawberries.
4. Total strawberries picked = 16 + 36 + 18 = 70 strawberries.
5. Jam jars used = 70/7 = 10 jars.
6. Money made = 10 jars * $4/jar = $40.
ANSWER: 40<|eot_id|>

True answer: 40


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?
Generated Text: 1. The total number of pages in the stack is 12 inches * 80 pages/inch = 960 pages
2. The average number of pages per book is 960 pages / 6 books = 160 pages
ANSWER: 160<|eot_id|>

True answer: 160


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?
Generated Text: 1. James uses half of the 500 Legos, which is 500/2=250 Legos.
2. The number of Legos left is 500-250=250 Legos.
3. He puts all of the leftover pieces back in the box, except for 5 missing pieces. So, the number of Legos in the box is 250-5=245 Legos.
ANSWER: 245<|eot_id|>

True answer: 245
Problem: Ines had $20 in her purse. She bought 3 pounds of peaches, which are $2 per pound at the local farmers’ market. How much did she have left?
Generated Text: 1. The cost of 3 pounds of peaches is 3*2=$6
2. Ines had $20 initially and spent $6, so she has $20-$6=$14 left
ANSWER: 14<|eot_id|>

True answer: 14
Correct answer: 5/5


### Analysis

**Accuracy**

When the 2 examples provided to the model contain the reasoning template, the accuracy of the model came back to 100%

**Reasoning**

For all problem, the model gives a clear reasoning path. All intermediate steps are logical and lead to correct intermediate results

**Consistency**

Providing a reasoing in the examples made the response format consistent across all the models

**Conclusion**
If you want a specific format for the response, few samples may be very beneficial

## Chain of Thought

In [18]:
reasoning_steps = [
    [
        "1. Calculate the total money earned from recycling cans",
        "2. Calculate the total money earned from recycling newspapers:",
        "3. Add the money earned from both recycling cans and newspapers to find the total amount.",
    ],
    [
        "1. Calculate how many strawberries Matthew picked",
        "2. Calculate how many strawberries Natalie picked",
        "3. Calculate the total number of strawberries picked",
        "4. Calculate the number of jars of jam produced",
        "5. Calculate the money earned"
    ],
    [
        "1. Calculate the number of pages in the stack of books",
        "2. Calculate the number of pages in each book on average",
    ],
    [
        "1. Calculate the number of legos that was put away",
        "2. Calculate the number of legos that was put in the box",
    ],
    [
        "1. Calculate the money spent on peaches",
        "2. Calculate the money left"
    ],
]
['For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?',
  'Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?',
  'Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?',
  "James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?",
  'Ines had $20 in her purse. She bought 3 pounds of peaches, which are $2 per pound at the local farmers’ market. How much did she have left?']
generation_params = {
    "max_new_tokens": 512,
    #"max_length": 50,
    "pad_token_id": tokenizer.pad_token_id,
    "num_return_sequences": 1,
}
correct = 0
for problem, reasoning, answer in zip(problems, reasoning_steps, actual_answers):
    reasoning_joined = '\n'.join(reasoning)
    message = [
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem:
            {problem}
            
            Here are the reasoning steps to solve this problem:
            {reasoning_joined}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    generated_ids = model.generate(
        input_ids=input_ids.to(model.device),
        **generation_params,
        eos_token_id=eot_token_id
    )
    
    response = tokenizer.decode(generated_ids[0])[len(prompt) :]
    predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
    correct += int(predicted_answer == answer)
    
    print(f"Problem: {problem}")
    print(f"Generated Text: {response}\n")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/5")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?
Generated Text: Let's solve the problem step by step:

1. Calculate the total money earned from recycling cans:
You recycle 144 cans, and for every 12 cans, you receive $0.50. To find the total money earned, divide the number of cans by 12 and multiply by $0.50:

144 cans ÷ 12 = 12 groups of 12 cans
12 groups × $0.50 = $6.00

So, you earn $6.00 from recycling cans.

2. Calculate the total money earned from recycling newspapers:
You recycle 20 kilograms of newspapers, and for every 5 kilograms, you receive $1.50. To find the total money earned, divide the number of kilograms by 5 and multiply by $1.50:

20 kilograms ÷ 5 = 4 groups of 5 kilograms
4 groups × $1.50 = $6.00

So, you earn $6.00 from recycling newspapers.

3. Add the money earned from both recycling cans and ne

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?
Generated Text: Let's break down the problem step by step:

1. Matthew picked 20 more strawberries than Betty, who picked 16 strawberries. So, Matthew picked:
Matthew = Betty + 20
Matthew = 16 + 20
Matthew = 36

2. Matthew picked twice as many strawberries as Natalie. So, if Matthew picked 36 strawberries, Natalie picked:
Natalie = Matthew / 2
Natalie = 36 / 2
Natalie = 18

3. The total number of strawberries picked is the sum of the strawberries picked by Betty, Matthew, and Natalie:
Total strawberries = Betty + Matthew + Natalie
Total strawberries = 16 + 36 + 18
Total strawberries = 70

4. One jar of jam uses 7 strawberries. To find the number of jars of jam produced, divide the total number o

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?
Generated Text: Let's break down the problem step by step:

1. Calculate the number of pages in the stack of books:
Since the stack is 12 inches thick, and 80 pages is one inch thick, we can multiply the thickness of the stack (12 inches) by the number of pages per inch (80) to get the total number of pages:

12 inches × 80 pages/inch = 960 pages

2. Calculate the number of pages in each book on average:
Since Jack has 6 books, we can divide the total number of pages (960) by the number of books (6) to get the average number of pages per book:

960 pages ÷ 6 books = 160 pages/book

ANSWER: 160<|eot_id|>

True answer: 160


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?
Generated Text: Let's break down the problem step by step:

1. James uses half the pieces before finishing and is told to put the rest away. This means he uses 500/2 = 250 pieces to build the castle.

The number of pieces he puts away is the total number of pieces minus the number of pieces he used: 500 - 250 = 250 pieces.

2. He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find. This means he puts away 250 - 5 = 245 pieces.

The number of pieces he puts in the box is 245.

ANSWER: 245<|eot_id|>

True answer: 245
Problem: Ines had $20 in her purse. She bought 3 pounds of peaches, which a

### Analysis

**Accuracy**

The accuracy of the model is still 100%

**Reasoning**

For all problem, the model gives a clear reasoning path. All intermediate steps are logical and lead to correct intermediate results.
The model follows the reasoning steps provided to the model (if provided)

**Consistency**

Providing a reasoing in the examples made the response format consistent across all the models. The model follows the reasoning steps provided to the model (if provided)

**Conclusion**
If you want a specific reasoning path in the response, giving it to the model directly is beneficial. However, such an approach requires manual setup of the reasoning for each problem which may be not convenient.

## Zero-Shot Chain of Thought

In [20]:
generation_params = {
    "max_new_tokens": 512,
    #"max_length": 50,
    "pad_token_id": tokenizer.pad_token_id,
    "num_return_sequences": 1,
}
correct = 0
for problem, answer in zip(problems, actual_answers):
    message = [
        {
            "role": "user",
            "content": f"""
            Please solve the following math problem step by step. Break down your reasoning into small, logical steps and clearly show each calculation.
            Problem:
            {problem}
            
            End your response with the final answer in the following form:
            ANSWER: VALUE
            Make sure that VALUE is the number without any measurement units
            """,
        },
    ]
    prompt = tokenizer.apply_chat_template(
        message, tokenize=False, add_generation_prompt=True
    )
    input_ids = tokenizer.encode(
        prompt, add_special_tokens=False, return_tensors="pt"
    )
    generated_ids = model.generate(
        input_ids=input_ids.to(model.device),
        **generation_params,
        eos_token_id=eot_token_id
    )
    
    response = tokenizer.decode(generated_ids[0])[len(prompt) :]
    predicted_answer = int(response.split()[-1].replace("<|eot_id|>",""))
    correct += int(predicted_answer == answer)
    
    print(f"Problem: {problem}")
    print(f"Generated Text: {response}\n")
    print(f"True answer: {answer}")

print(f"Correct answer: {correct}/5")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: For every 12 cans you recycle, you receive $0.50, and for every 5 kilograms of newspapers, you receive $1.50. If your family collected 144 cans and 20 kilograms of newspapers, how much money would you receive?
Generated Text: Let's break down the problem step by step:

Step 1: Calculate the number of times 12 cans can be recycled from the total number of cans collected.

144 cans ÷ 12 cans = 12 times

Step 2: Calculate the total amount of money received for recycling cans.

12 times × $0.50 per 12 cans = 12 × $0.50 = $6.00

Step 3: Calculate the number of times 5 kilograms of newspapers can be collected from the total amount of newspapers collected.

20 kilograms ÷ 5 kilograms = 4 times

Step 4: Calculate the total amount of money received for collecting newspapers.

4 times × $1.50 per 5 kilograms = 4 × $1.50 = $6.00

Step 5: Add the total amount of money received for recycling cans and collecting newspapers.

$6.00 (cans) + $6.00 (newspapers) = $12.00

ANSWER: 12<|eot_id|>



The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Betty picked 16 strawberries. Matthew picked 20 more strawberries than Betty and twice as many as Natalie. They used their strawberries to make jam. One jar of jam used 7 strawberries and they sold each jar at $4. How much money were they able to make from the strawberries they picked?
Generated Text: Let's break down the problem step by step:

Step 1: Find the number of strawberries Matthew picked.

Matthew picked 20 more strawberries than Betty, who picked 16 strawberries. So, Matthew picked:

16 (Betty's strawberries) + 20 = 36

Step 2: Find the number of strawberries Natalie picked.

Matthew picked twice as many strawberries as Natalie. So, if Matthew picked 36 strawberries, Natalie picked:

36 (Matthew's strawberries) ÷ 2 = 18

Step 3: Find the total number of strawberries picked by Betty, Matthew, and Natalie.

Betty picked 16 strawberries, Matthew picked 36 strawberries, and Natalie picked 18 strawberries. The total number of strawberries is:

16 (Betty's strawberries) 

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: Jack has a stack of books that is 12 inches thick. He knows from experience that 80 pages is one inch thick. If he has 6 books, how many pages is each one on average?
Generated Text: Let's break down the problem step by step:

Step 1: Convert the thickness of the stack of books from inches to pages.

We know that 80 pages is equal to 1 inch, so to find the number of pages in the stack, we can multiply the thickness of the stack (12 inches) by 80 pages/inch:

12 inches × 80 pages/inch = 960 pages

Step 2: Divide the total number of pages by the number of books to find the average number of pages per book.

We have 6 books, so we can divide the total number of pages (960) by 6:

960 pages ÷ 6 books = 160 pages/book

Step 3: Simplify the result to get the average number of pages per book.

The result is 160 pages/book, which is the average number of pages per book.

ANSWER: 160<|eot_id|>

True answer: 160


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128009 for open-end generation.


Problem: James dumps his whole collection of 500 Legos on the floor and starts building a castle out of them.  He uses half the pieces before finishing and is told to put the rest away.  He puts all of the leftover pieces back in the box they came from, except for 5 missing pieces that he can't find.  How many Legos are in the box at the end?
Generated Text: Let's break down the problem step by step:

Step 1: James starts with 500 Legos.

Step 2: He uses half of the pieces to build the castle. To find half of 500, we can divide 500 by 2:

500 ÷ 2 = 250

So, James uses 250 pieces to build the castle.

Step 3: James puts the remaining pieces back in the box, except for the 5 missing pieces. To find the number of remaining pieces, we subtract the 250 pieces used from the original 500:

500 - 250 = 250

So, James puts 250 pieces back in the box.

Step 4: However, 5 pieces are missing, so we subtract these from the total number of remaining pieces:

250 - 5 = 245

Therefore, there are 245 L

### Analysis

**Accuracy**

The accuracy of the model is still 100%

**Reasoning**

For all problem, the model gives a clear reasoning path. All intermediate steps are logical and lead to correct intermediate results.

**Consistency**

Providing a reasoing in the examples made the response format consistent across all the models. The structure of the response is the same for all responses

**Conclusion**
If you want a specific response structure to make it easy to parse, such tecnhqiue may help