## Training for language model

This contains the code that I used to train llms (I am using the Llama 3.1 8B Base model) for recipie generation. The code in this notebook is adapted from the google collab notebook used in [this](https://www.youtube.com/watch?v=Us5ZFp16PaU) tutorial.

In [None]:
# Mount google drive

from google.colab import drive
drive.mount('/content/gdrive')
nbdir = "/content/gdrive/My Drive/"

Mounted at /content/gdrive


In [None]:
# Login to hugging face

from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [None]:
# Install modules

!pip install -q bitsandbytes datasets accelerate loralib
!pip install -q git+https://github.com/huggingface/transformers.git@main git+https://github.com/huggingface/peft.git

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m5.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m527.3/527.3 kB[0m [31m27.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m8.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m39.9/39.9 MB[0m [31m14.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m12.6 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pyarrow<15.0.0a0,>=14.0.1, but you have pyarrow 17.0.0 which is inc

In [None]:
# Import modules

import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

In [None]:
# Set up the model and load in 8 bit

model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B",
    load_in_8bit=True,
    device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

## Set up the model parameters and lora

In [None]:
# Freeze the original model weights

for param in model.parameters():
  param.requires_grad = False  # freeze the model - train adapters later
  if param.ndim == 1:
    # cast the small parameters (e.g. layernorm) to fp32 for stability
    param.data = param.data.to(torch.float32)

model.gradient_checkpointing_enable()  # reduce number of stored activations
model.enable_input_require_grads()

class CastOutputToFloat(nn.Sequential):
  def forward(self, x): return super().forward(x).to(torch.float32)
model.lm_head = CastOutputToFloat(model.lm_head)

In [None]:
# Set up the lora adapters

def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
from peft import LoraConfig, get_peft_model

config = LoraConfig(
    r=16, #attention heads
    lora_alpha=32, #alpha scaling
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, config)
print_trainable_parameters(model)

trainable params: 6815744 || all params: 8037076992 || trainable%: 0.08480376642881861


## Preprocess the data for training

I am using the "recipie_short" dataset from hugging face. This dataset contains 350 records of recipies. The first step of preprocessing is to select a subset of the dataset for training. The next step is to discard uncessary columns from the dataset.

In [None]:
# Data preprocessing

import transformers
from datasets import load_dataset
data = load_dataset("rk404/recipe_short")

Downloading data:   0%|          | 0.00/93.3M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/350000 [00:00<?, ? examples/s]

In [None]:
# Use a portion of the data for training

data = data['train'].select(range(len(data['train']) // 20))

data


Dataset({
    features: ['Unnamed: 0', 'title', 'ingredients', 'directions', 'link', 'source', 'NER'],
    num_rows: 17500
})

In [None]:
# Discard uncessary columns from the dataset

data = data.select_columns(["title", "ingredients", "directions", "NER"])

data

Dataset({
    features: ['title', 'ingredients', 'directions', 'NER'],
    num_rows: 17500
})

I then merge the columns and map them to the prediction.

In [None]:
def merge_columns(example):
    example["prediction"] = example["NER"] + " ->: " + "title: " + str(example["title"]) + "\n" + "ingredients: " + str(example["ingredients"]) + "\n" + "directions: " + str(example["directions"])
    return example

data = data.map(merge_columns)
data["prediction"][0]

Map:   0%|          | 0/17500 [00:00<?, ? examples/s]

'["brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuits"] ->: title: No-Bake Nut Cookies\ningredients: ["1 c. firmly packed brown sugar", "1/2 c. evaporated milk", "1/2 tsp. vanilla", "1/2 c. broken nuts (pecans)", "2 Tbsp. butter or margarine", "3 1/2 c. bite size shredded rice biscuits"]\ndirections: ["In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Stir over medium heat until mixture bubbles all over top.", "Boil and stir 5 minutes more. Take off heat.", "Stir in vanilla and cereal; mix well.", "Using 2 teaspoons, drop and shape into 30 clusters on wax paper.", "Let stand until firm, about 30 minutes."]'

In [None]:
data[0]

{'title': 'No-Bake Nut Cookies',
 'ingredients': '["1 c. firmly packed brown sugar", "1/2 c. evaporated milk", "1/2 tsp. vanilla", "1/2 c. broken nuts (pecans)", "2 Tbsp. butter or margarine", "3 1/2 c. bite size shredded rice biscuits"]',
 'directions': '["In a heavy 2-quart saucepan, mix brown sugar, nuts, evaporated milk and butter or margarine.", "Stir over medium heat until mixture bubbles all over top.", "Boil and stir 5 minutes more. Take off heat.", "Stir in vanilla and cereal; mix well.", "Using 2 teaspoons, drop and shape into 30 clusters on wax paper.", "Let stand until firm, about 30 minutes."]',
 'NER': '["brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuits"]',
 'prediction': '["brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuits"] ->: title: No-Bake Nut Cookies\ningredients: ["1 c. firmly packed brown sugar", "1/2 c. evaporated milk", "1/2 tsp. vanilla", "1/2 c. broken nuts (pecans)", "2 Tbsp. butter or mar

In [None]:
data = data.map(lambda samples: tokenizer(samples['prediction']), batched=True)

Map:   0%|          | 0/17500 [00:00<?, ? examples/s]

## Train the model

In [None]:
# Set the pad token

tokenizer.pad_token = tokenizer.eos_token


In [None]:
# Train the model

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=2,
        warmup_steps=50,
        max_steps=500,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=20,
        output_dir="/content/gdrive/MyDrive/recipie-lora-llama-2"
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
max_steps is given, it will override any value given in num_train_epochs
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
20,1.5776
40,1.2124
60,1.0997
80,1.0961
100,1.0499
120,1.0384
140,0.9881
160,1.0236
180,1.0004
200,0.9901


TrainOutput(global_step=500, training_loss=1.0259752540588378, metrics={'train_runtime': 3182.3773, 'train_samples_per_second': 0.628, 'train_steps_per_second': 0.157, 'total_flos': 1.881745058738995e+16, 'train_loss': 1.0259752540588378, 'epoch': 0.11428571428571428})

In [None]:
model.push_to_hub("yr82/recipie-lora-llama-2",
                  use_auth_token=True,
                  commit_message="lora training - llama 3.1",
                  private=True)



adapter_model.safetensors:   0%|          | 0.00/27.3M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/yr82/recipie-lora-llama-2/commit/00065e781c5ddcd1bfbe8b7734e56a33b8896f32', commit_message='lora training - llama 3.1', commit_description='', oid='00065e781c5ddcd1bfbe8b7734e56a33b8896f32', pr_url=None, pr_revision=None, pr_num=None)

## Infernce

In [None]:
# Load the adapter

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "yr82/recipie-lora-llama-2"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_8bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


adapter_config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/826 [00:00<?, ?B/s]

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/185 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/50.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/27.3M [00:00<?, ?B/s]

In [None]:
# Test with something from the training set

batch = tokenizer("[\"brown sugar\", \"milk\", \"vanilla\", \"nuts\", \"butter\", \"bite size shredded rice biscuits\"]" + " ->: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch,  max_new_tokens=512, min_new_tokens=64, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=False))

  with torch.cuda.amp.autocast():
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




 <|begin_of_text|>["brown sugar", "milk", "vanilla", "nuts", "butter", "bite size shredded rice biscuits"] ->:  title: Rice Krispie Treats
ingredients: ["1 c. brown sugar", "1 c. milk", "1 tsp. vanilla", "1 c. nuts", "1/2 c. butter", "bite size shredded rice biscuits"]
directions: ["Mix sugar, milk and butter in saucepan.", "Boil 1 minute.", "Add vanilla and nuts.", "Pour over rice Krispies.", "Mix well.", "Pour into buttered pan.", "Cool."]<|end_of_text|>


In [None]:
# Test with something random

batch = tokenizer("[\"chocolate\", \"flour\", \"eggs\"]" + " ->: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch,  max_new_tokens=512, min_new_tokens=64, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=False))

  with torch.cuda.amp.autocast():
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




 <|begin_of_text|>["chocolate", "flour", "eggs"] ->:  title: Chocolate Cake
ingredients: ["1 c. Hershey's cocoa", "1 c. flour", "3 eggs"]
directions: ["Mix all ingredients together.", "Pour into a greased 9 x 13-inch pan.", "Bake at 350\u00b0 for 30 minutes."]<|end_of_text|>


In [None]:
def format_list(items):
    formatted_list = ', '.join([f'"{item}"' for item in items])
    return f'[{formatted_list}]'

In [None]:
food = ["tomatoes",
 "tomatoes",
 "oatmeal",
 "tomatoes",
 "mozzarella",
 "eggs",
 "beans",
 "garbanzo",
 "sprouted",
 "bag_jazz",
 "bananas",
 "creany",
 "peanut_butter",
 "pita_bread"
]

food_list = format_list(food)

In [None]:
print(food_list)

["tomatoes", "tomatoes", "oatmeal", "tomatoes", "mozzarella", "eggs", "beans", "garbanzo", "sprouted", "bag_jazz", "bananas", "creany", "peanut_butter", "pita_bread"]


In [None]:
# Test some data from receipts

batch = tokenizer(food_list + " ->: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=True))

  with torch.cuda.amp.autocast():
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




 ["tomatoes", "tomatoes", "oatmeal", "tomatoes", "mozzarella", "eggs", "beans", "garbanzo", "sprouted", "bag_jazz", "bananas", "creany", "peanut_butter", "pita_bread"] ->:  title: The Best Breakfast
ingredients: ["tomatoes", "tomatoes", "oatmeal", "tomatoes", "mozzarella", "eggs", "beans", "garbanzo", "sprouted", "bag_jazz",


In [None]:
# Remove duplicates and use only the first 5 items from the list

unique_food = list(dict.fromkeys(food))

unique_food = unique_food[:5]

food_list = format_list(unique_food)

In [None]:
print(unique_food)

['tomatoes', 'oatmeal', 'mozzarella', 'eggs', 'beans']


In [None]:
batch = tokenizer(food_list + " ->: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=50)

print('\n\n', tokenizer.decode(output_tokens[0], skip_special_tokens=False, eos_token_id=tokenizer.eos_token_id))

  with torch.cuda.amp.autocast():
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.




 <|begin_of_text|>["tomatoes", "oatmeal", "mozzarella", "eggs", "beans"] ->:  title: Baked Tomatoes
ingredients: ["2 c. tomatoes", "1/2 c. oatmeal", "1/2 c. mozzarella cheese", "2 eggs", "1 can beans"]
directions: ["Cut tomatoes in


In [None]:
# Generate recipie

batch = tokenizer(food_list + " ->: ", return_tensors='pt')

with torch.cuda.amp.autocast():
  output_tokens = model.generate(**batch, max_new_tokens=512, min_new_tokens=64, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id)

output_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)

  with torch.cuda.amp.autocast():
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [None]:
sections = output_text.split("\n")

for section in sections:
  print(section)

["tomatoes", "oatmeal", "mozzarella", "eggs", "beans"] ->:  title: Baked Tomatoes
ingredients: ["2 c. tomatoes", "1/2 c. oatmeal", "1/2 c. mozzarella cheese", "2 eggs", "1 can beans"]
directions: ["Cut tomatoes in half.", "Mix oatmeal, cheese and eggs.", "Fill tomatoes with mixture.", "Bake at 350\u00b0 for 30 minutes.", "Top with beans and bake 10 minutes more."]
