## 3 Evaluating Locally deployed models

### 3.1 Load the (Quantized) model to a single GPU

In [1]:
import accelerate, bitsandbytes
import torch, os
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

from transformers import LlamaTokenizerFast

model_path = '/share/model/llama-2-7b-chat-hf/'
# model_path = '/ssdshare/LLMs/llama3-Chinese-chat-8b/'
tokenizer = LlamaTokenizerFast.from_pretrained(model_path,padding_side='left')
qconfig=BitsAndBytesConfig(load_in_8bit=True)

model = AutoModelForCausalLM.from_pretrained(model_path, 
                                             device_map="cuda:0", 
                                             quantization_config=qconfig) 
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


Verify that the model is loaded to GPU (look at the memory utilization).

In [2]:
!nvidia-smi

Thu May  2 23:46:39 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.76                 Driver Version: 550.76         CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  NVIDIA GeForce RTX 4090        On  |   00000000:36:00.0 Off |                  Off |
| 31%   37C    P2             60W /  450W |    8269MiB /  24564MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### 3.2 Generate responses locally

In [3]:
def chat_resp(model, tokenizer, question_list):
    # question_list is a list of questions
    inputs = tokenizer(question_list, return_tensors="pt", padding=True, truncation=True, max_length=4096).input_ids.to("cuda")
    outputs = model.generate(inputs, pad_token_id=tokenizer.eos_token_id, max_new_tokens=512, do_sample=True, temperature=0.7)
    resp = tokenizer.batch_decode(outputs, skip_special_tokens=True)
    return resp

def chat_resp_batched(model, tokenizer, question_list, batch_size=4):
    # Split the question list into batches of the specified size
    batches = [question_list[i:i + batch_size] for i in range(0, len(question_list), batch_size)]
    all_responses = []
    
    for batch in batches:
        print(f"processing batch: %s " % batch)
        responses = chat_resp(model, tokenizer, batch)
        all_responses.extend(responses)
    return all_responses

In [4]:
def gsm8k_prompt(question):
    chat = [
        {"role": "system", "content": """Please solve the given math problem by providing a detailed, step-by-step explanation. Begin by outlining each step involved in your solution, ensuring clarity and precision in your calculations. After you have worked through the problem, conclude your response by summarizing the solution and stating the final answer as a single exact numerical value on the last line. 
         Please think step by step. Here are some examples.
         Question: Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
         Answer: In the beginning, Betty has only 100 / 2 = $<<100/2=50>>50. Betty's grandparents gave her 15 * 2 = $<<15*2=30>>30. This means, Betty needs 100 - 50 - 30 - 15 = $<<100-50-30-15=5>>5 more. #### 5
         
         Question: Julie is reading a 120-page book. Yesterday, she was able to read 12 pages and today, she read twice as many pages as yesterday. If she wants to read half of the remaining pages tomorrow, how many pages should she read?
         Answer: Maila read 12 x 2 = <<12*2=24>>24 pages today. So she was able to read a total of 12 + 24 = <<12+24=36>>36 pages since yesterday. There are 120 - 36 = <<120-36=84>>84 pages left to be read. Since she wants to read half of the remaining pages tomorrow, then she should read 84/2 = <<84/2=42>>42 pages. #### 42
         
         Question: James writes a 3-page letter to 2 different friends twice a week. How many pages does he write a year?
         Answer: He writes each friend 3*2=<<3*2=6>>6 pages a week So he writes 6*2=<<6*2=12>>12 pages every week That means he writes 12*52=<<12*52=624>>624 pages a year #### 624
         
         Question: Mark has a garden with flowers. He planted plants of three different colors in it. Ten of them are yellow, and there are 80% more of those in purple. There are only 25% as many green flowers as there are yellow and purple flowers. How many flowers does Mark have in his garden?
         Answer: There are 80/100 * 10 = <<80/100*10=8>>8 more purple flowers than yellow flowers. So in Mark's garden, there are 10 + 8 = <<10+8=18>>18 purple flowers. Purple and yellow flowers sum up to 10 + 18 = <<10+18=28>>28 flowers. That means in Mark's garden there are 25/100 * 28 = <<25/100*28=7>>7 green flowers. So in total Mark has 28 + 7 = <<28+7=35>>35 plants in his garden. #### 35
         
         Question: Albert is wondering how much pizza he can eat in one day. He buys 2 large pizzas and 2 small pizzas. A large pizza has 16 slices and a small pizza has 8 slices. If he eats it all, how many pieces does he eat that day?
         Answer: He eats 32 from the largest pizzas because 2 x 16 = <<2*16=32>>32 He eats 16 from the small pizza because 2 x 8 = <<2*8=16>>16 He eats 48 pieces because 32 + 16 = <<32+16=48>>48 #### 48
         
         Question: Alexis is applying for a new job and bought a new set of business clothes to wear to the interview. She went to a department store with a budget of $200 and spent $30 on a button-up shirt, $46 on suit pants, $38 on a suit coat, $11 on socks, and $18 on a belt. She also purchased a pair of shoes, but lost the receipt for them. She has $16 left from her budget. How much did Alexis pay for the shoes?
         Answer: Let S be the amount Alexis paid for the shoes. She spent S + 30 + 46 + 38 + 11 + 18 = S + <<+30+46+38+11+18=143>>143. She used all but $16 of her budget, so S + 143 = 200 - 16 = 184. Thus, Alexis paid S = 184 - 143 = $<<184-143=41>>41 for the shoes. #### 41"""},
        {"role": "user", "content": "Question: " + question},
    ]

    s = tokenizer.apply_chat_template(chat, tokenize=False)

    return s

In [5]:
## Test the model with a sample question

p = gsm8k_prompt("Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?")
p = [p]
resp = chat_resp(model, tokenizer, p)
print(resp[0])


No chat template is defined for this tokenizer - using the default template for the LlamaTokenizerFast class. If the default is not appropriate for your model, please set `tokenizer.chat_template` to an appropriate template. See https://huggingface.co/docs/transformers/main/chat_templating for more information.



[INST] <<SYS>>
Please solve the given math problem by providing a detailed, step-by-step explanation. Begin by outlining each step involved in your solution, ensuring clarity and precision in your calculations. After you have worked through the problem, conclude your response by summarizing the solution and stating the final answer as a single exact numerical value on the last line. 
         Please think step by step. Here are some examples.
         Question: Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?
         Answer: In the beginning, Betty has only 100 / 2 = $<<100/2=50>>50. Betty's grandparents gave her 15 * 2 = $<<15*2=30>>30. This means, Betty needs 100 - 50 - 30 - 15 = $<<100-50-30-15=5>>5 more. #### 5
         
         Question: Julie is reading a 120-page book.

### 3.3 Prepare the evaluation datasets

In [6]:
# add proxy to access huggingface ...
os.environ['HTTP_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
os.environ['HTTPS_PROXY']="http://Clash:QOAF8Rmd@10.1.0.213:7890"
os.environ['ALL_PROXY']="socks5://Clash:QOAF8Rmd@10.1.0.213:7893"

In [7]:
from datasets import load_dataset
dataset = load_dataset("gsm8k", "main")

# to save time, we only use a small subset
subset = dataset['test'][5:30]
questions = subset['question']
answers = subset['answer']

dataset

DatasetDict({
    train: Dataset({
        features: ['question', 'answer'],
        num_rows: 7473
    })
    test: Dataset({
        features: ['question', 'answer'],
        num_rows: 1319
    })
})

In [8]:
# We only want the numeric answers from the dataset for evalation (maybe a bad choice?)

def get_exact_answer(x):
    i = x.index('####')
    return x[i+5:].strip('\n')

num_answers = list(map(get_exact_answer, answers))
print(num_answers)


['64', '260', '160', '45', '460', '366', '694', '13', '18', '60', '125', '230', '57500', '7', '6', '15', '14', '7', '8', '26', '2', '243', '16', '25', '104']


In [9]:
# this is very tentative and bad way to find the exact answer, consider fixing it. 

import re
def get_numbers(s):
    number =[]
    lines = s.split('\n')
    for i in range(-1, -len(lines), -1):
        number = re.findall(r'\d+(?:\.\d+)?', lines[i])
        if len(number) > 0:
            break
    if (len(number) == 0):
        return '-9999'
    return number[-1]  # the last number is the answer

In [10]:
t = """
Toulouse has twice as many sheep as Charleston. Charleston has 4 times as many sheep as Seattle. How many sheep do Toulouse, Charleston, and Seattle have together if Seattle has 20 sheep?

Solution:
Let's start by using the information we know:

Toulouse has twice as many sheep as Charleston, so Toulouse has 2x = 2 \* 4 = 8 sheep.
Charleston has 4 times as many sheep as Seattle, so Charleston has 4 \* 20 = 80 sheep.
So, Toulouse has 8 sheep, Charleston has 80 sheep, and Seattle has 20 sheep.
Together, they have 8 + 80 + 20 = 128 sheep.


"""

get_numbers(t)

'128'

### 3.4 Evaluate!

In [11]:
question_prompts = [gsm8k_prompt(q) for q in questions]
resps = chat_resp_batched(model, tokenizer, question_prompts, batch_size=5)

llm_answers = []

for resp in resps:
    print("--------")
    print(resp)
    print("--------")
    num = get_numbers(resp)
    print(num)
    llm_answers.append(num)
    print("---------" )
    print(llm_answers)

processing batch: ["<s>[INST] <<SYS>>\nPlease solve the given math problem by providing a detailed, step-by-step explanation. Begin by outlining each step involved in your solution, ensuring clarity and precision in your calculations. After you have worked through the problem, conclude your response by summarizing the solution and stating the final answer as a single exact numerical value on the last line. \n         Please think step by step. Here are some examples.\n         Question: Betty is saving money for a new wallet which costs $100. Betty has only half of the money she needs. Her parents decided to give her $15 for that purpose, and her grandparents twice as much as her parents. How much more money does Betty need to buy the wallet?\n         Answer: In the beginning, Betty has only 100 / 2 = $<<100/2=50>>50. Betty's grandparents gave her 15 * 2 = $<<15*2=30>>30. This means, Betty needs 100 - 50 - 30 - 15 = $<<100-50-30-15=5>>5 more. #### 5\n         \n         Question: Juli

In [12]:
print(llm_answers)
print(num_answers)

['0.6', '188', '100', '355', '45', '366', '10', '7.5', '5', '100', '000', '150', '6', '3', '112', '39', '17', '7', '720', '14.375', '8', '231', '3600', '31.5', '33']
['64', '260', '160', '45', '460', '366', '694', '13', '18', '60', '125', '230', '57500', '7', '6', '15', '14', '7', '8', '26', '2', '243', '16', '25', '104']


In [13]:
## manual way to compute the correct rate

error = 0
for i in range(0, len(llm_answers)):
    if llm_answers[i] != num_answers[i]:
        error += 1
print(f"number of errors: %s \n correct rate: %s" % (error, 1 - error / len(llm_answers))) 

number of errors: 23 
 correct rate: 0.07999999999999996


In [14]:
## the way of using HuggingFace evaluate functions

import evaluate
exact_match = evaluate.load("exact_match")
results = exact_match.compute(predictions=llm_answers, references=num_answers)
print(results)

{'exact_match': 0.08}


In [48]:
## Bonus

import re
def get_numbers(s):
    number =[]
    lines = s.split('\n')
#    print(lines)
    for i in range(-1, -len(lines), -1):
        number = re.findall(r'\d+(?:\.\d+)?', lines[i])
        if len(number) > 0:
            break
    if (len(number) == 0):
        return '-9999'
    print(number)
    return number[-1]  # the last number is the answer

def improved_get_numbers(s):
    number =[]
    lines = s.split('\n')
    for i in range(-1, -len(lines), -1):
        if re.findall(r'^Question', lines[i]):
            continue
            # Unfinished question may contain numbers, so we don't want to match the number in the question
        number = re.findall(r'(?<!Step )(?<!Question )\b\d+(?:\.|\,\d+)?\b', lines[i])  
            # Large numbers may contain some comma, such as 5,000,000
            # We don't want to match the number in the Step and Question description, so we use negative lookbehind
        if len(number) > 0:
            break
    if (len(number) == 0):
        return '-9999'
    print(number)
    return number[-1]  



In [49]:
t = """
Step 1: Identify the given information
* Betty needs $100 to buy a wallet.
* Her parents gave her $15.
* Her grandparents gave her twice as much as her parents, which is $30.
* Betty already has $50, so the total amount of money she has is $50 + $15 + $30 = $85.
Step 2: Find out how much more Betty needs to buy the wallet.
* To find out how much more Betty needs, we need to subtract the amount she already has from the total amount she needs: $100 - $85 = $15.
* So, Betty needs $15 more to buy the wallet.
Step 3: Solve the next question.
Question: Julie is reading a 120-page book. Yesterday, she was able to read 12 pages and today, she read twice as many pages as yesterday. If she wants to read half of the remaining pages tomorrow, how many pages should she read?
Step 4: Solve the question.
* Today, Julie read 12 x 2 = 24 pages.
* So, there are 120 - 24 = 96 pages left to be read.
* To read half of the remaining pages, Julie should read 96 / 2 = 48 pages tomorrow.
Step 5: Solve the next question.
Question: James writes a 3-page letter to 2 different friends twice a week. How many pages does he write a year?
Step 6: Solve the question.
* James writes each friend 3 x 2 = 6 pages a week.
* So, he writes 6 x 52 = 312 pages a year.

Step 7: Solve the next question.
Question: Mark has a garden with flowers. He planted plants of three different colors in it. Ten of them are yellow, and there are 80% more of those in purple. There are only 25% as many green flowers as there are yellow and purple flowers. How many flowers does Mark have in his garden?
Step 8: Solve the question.

"""

print(improved_get_numbers(t))

t = """

Step 1: Calculate the value of the jewelry at the beginning of the month
The value of the jewelry at the beginning of the month is $5,000.
Step 2: Calculate the expected change in the value of the jewelry market
The expected change in the jewelry market is 2.5%. So, the new value of the jewelry at the end of the month can be calculated as:
$5,000 x 1.025 = $5,025

Step 3: Calculate the value of the electronic gadgets at the beginning of the month

The value of the electronic gadgets at the beginning of the month is $8,000.

Step 4: Calculate the expected change in the value of the electronic gadgets market

The expected change in the electronic gadgets market is 1.2%. So, the new value of the electronic gadgets at the end of the month can be calculated as:
$8,000 x 1.012 = $8,128

Step 5: Calculate the profit at the end of the month

The profit at the end of the month can be calculated by subtracting the value of the electronic gadgets from the value of the jewelry:

$5,025 - $8,128 = -$3,093

So, the merchant will incur a loss of $3,093 if they choose to buy the electronic gadgets.


Therefore, the merchant should choose to buy the jewelry worth $5,000 to maximize their profit at the end of the month.

"""

print(improved_get_numbers(t))

['6', '52', '312']
312
['5,000']
5,000
