## Llama13B Small on GSMK8 

The purpose of this is to measure the coherence of an LLM on the Cohence of solving math problems within GSMK8

# Model set up

In [1]:
# import the hugging face transformers library
import wandb
import os
from transformers import  Trainer, TrainingArguments, AutoModelForCausalLM, AutoTokenizer
from datasets import load_dataset

In [3]:
#Including code to get from token from environment
token= os.getenv('HF_TOKEN')

In [4]:


tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-13b-chat-hf",token=token)
model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-13b-chat-hf",token=token)



tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/587 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/33.4k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/9.95G [00:00<?, ?B/s]

KeyboardInterrupt: 

In [None]:
# Ensure the tokenizer has a padding token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

## Gather dataset

In [None]:

def preprocess_data(examples):
    # Combine the question and answer for training context
    inputs = [q + " Answer: " for q in examples['question']]
    targets = [a for a in examples['answer']]
    model_inputs = tokenizer(inputs, max_length=512, truncation=True, padding="max_length")
    # Setup the tokenizer outputs as labels for training
    with tokenizer.as_target_tokenizer():
        labels = tokenizer(targets, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs

# Load the datasets, here I'm using the main version and ignoring the socratic dataset for now.
train_dataset = load_dataset("GSM8K", 'main', split='train')
test_dataset = load_dataset("GSM8K", 'main', split='test')

# Apply preprocessing
encoded_train_dataset = train_dataset.map(preprocess_data, batched=True)
encoded_test_dataset = test_dataset.map(preprocess_data, batched=True)

Map:   0%|          | 0/7473 [00:00<?, ? examples/s]



Map:   0%|          | 0/1319 [00:00<?, ? examples/s]

In [None]:
encoded_train_dataset

Dataset({
    features: ['question', 'answer', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 7473
})

## Model Fine-tuning

In [None]:
# Initialize wandb
wandb.init(project="Coherence", config={
    "num_train_epochs": 3,
    "per_device_train_batch_size": 8,
    "per_device_eval_batch_size": 16,
    "warmup_steps": 500,
    "weight_decay": 0.01
})

In [None]:
# Set up training arguments, and now also logging to wandb
training_args = TrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # number of training epochs
    per_device_train_batch_size=8,   # batch size for training
    per_device_eval_batch_size=16,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs
    logging_steps=10,
    evaluation_strategy="steps",     # evaluate during training to observe progress
    report_to="wandb",
    run_name="llama-13b-hf-chat-gsm8k-finetuning",
    save_strategy="epoch"            # save the model at the end of each epoch
)

# Initialize the Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=encoded_train_dataset,
    eval_dataset = encoded_test_dataset
)

Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.


In [None]:
trainer.train()
wandb.finish()

In [None]:
#trainer.train()

In [None]:
for index in range(50):
    i = encoded_test_dataset['question'][index]
    print(str(i) + "\n")
    # print(tokenizer.decode(i) + "\n")

Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?

A robe takes 2 bolts of blue fiber and half that much white fiber.  How many bolts in total does it take?

Josh decides to try flipping a house.  He buys a house for $80,000 and then puts in $50,000 in repairs.  This increased the value of the house by 150%.  How much profit did he make?

James decides to run 3 sprints 3 times a week.  He runs 60 meters each sprint.  How many total meters does he run a week?

Every day, Wendi feeds each of her chickens three cups of mixed chicken feed, containing seeds, mealworms and vegetables to help keep them healthy.  She gives the chickens their feed in three separate meals. In the morning, she gives her flock of chickens 15 cups of feed.  In the afternoon, she gives her

In [None]:
# Load your fine-tuned model and tokenizer from the specified output directory
model = AutoModelForCausalLM.from_pretrained('results/checkpoint-2805')  # replace 'xxxx' with the appropriate checkpoint
#tokenizer = AutoTokenizer.from_pretrained('results/checkpoint-2805')

# Example math question from your dataset
question = "Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?"  
# replace with a real question from the eval_dataset

# Encode the question using the tokenizer
inputs = tokenizer(question, return_tensors='pt', padding=True, truncation=True, max_length=512)

# Generate the answer using the model
outputs = model.generate(inputs['input_ids'], max_length=100, num_return_sequences=1, pad_token_id=tokenizer.eos_token_id)
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Question: {question}")
print(f"Generated Answer: {answer}")

Question: Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
Generated Answer: Janet’s ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
