This is a project to fine-tune Llama 2 for a travel chatbox. This notebook can be run in kaggle. Together with Phuoc Bui, we collected data from reddit for the task. The dataset reddit-more.csv consists of more than 8000 question-answer pairs from several subreddits related to travel.

Here are steps to fine-tune Llama 2 (or any LLM) with your own dataset:

Step 0.
- Install and import libraries: this sounds easy, but in some cases, it can be complicated due to the compatibility, versions, ...

Step 1.
- Bits and bytes configuration for quantization.
- Define the model with model_name and quantization_config.
- Define the tokenizer with model_name.

Step 2.
- Overview of this LLM, see how many layers and parameters it has,
- Get information about the last layer to fine-tune.

Step 3.
- Define lora_config with parameters r, lora_alpha, etc...
- Redefine the model with that lora config: this will freeze parameters of previous layers, and we update only parameters for the last layer.

Step 4.
- Define generation_config: this is for the result generation with parameters: max_new_tokens, temperature, top_p, etc..

Step 5.
- Test pretrained model by defining a prompt with a question-answer pair. 
- Define encoding = tokenizer(prompt)
- Generate output = model.generate(encoding)

Step 6.
- Define data for our LLM. Note that all question-answer pairs will be tokenized in this step.

Step 7.
- Define training_args with some parameters such as output_dir, learning rate, optimizer, ...
- Define trainer = transformer(model, data, training_args)
- Fine tune the model by calling trainer.train()

Step 8.
- Evaluate model on test set with an NLP metric.

Step 9.
- Save the pretrained model.

A very good reference can be found here by PHIL CULLITON: https://www.kaggle.com/code/philculliton/fine-tuning-with-llama-2-qlora

In [38]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/llama-2/pytorch/13b-chat-hf/1/model.safetensors.index.json
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/model-00003-of-00003.safetensors
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/config.json
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/pytorch_model-00003-of-00003.bin
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/Responsible-Use-Guide.pdf
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/pytorch_model-00002-of-00003.bin
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/README.md
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/USE_POLICY.md
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/tokenizer.json
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/model-00001-of-00003.safetensors
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/tokenizer_config.json
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/pytorch_model.bin.index.json
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/LICENSE.txt
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/model-00002-of-00003.safetensors
/kaggle/input/llama-2/pytorch/13b-chat-hf/1/specia

We will take a look on our dataset. Because we have limited resources, our data frame will consist only 20 rows for fine-tuning.

In [39]:
df = pd.read_csv('/kaggle/input/reddit-more/reddit_more.csv')
df = df.iloc[150:171]
df.reset_index(drop=True, inplace=True)
df.iloc[:10]

Unnamed: 0,Question,Answer
0,I went on ‘the lost city’ trek using G Adventu...,Tbh this applies to any tour site\n\nI look fo...
1,Thought I'd pass along the information and adv...,Try disputing it with your credit company
2,I made this [post](https://www.reddit.com/r/Tr...,I’m so glad to hear you made it safely!
3,"Here's the deal. The other day, I booked a tri...",Which airline did you book with? Some airlines...
4,Hi all! I’m visiting NYC this month and I wou...,Eat pizza slices.
5,"USA is such an interesting place, if I were ri...",I try to warn inexperienced travelers in US tr...
6,Hey guys as the title mentions I’ll be traveli...,Packing less products / clothes - packing more...
7,I’ll be traveling with a 1 1/2 year old this f...,I used to roll my eyes at folks who put leashe...
8,I’ve been told that it’s not a great idea to f...,I'm a retired Flight Paramedic who flew in hel...
9,Am I shit out of luck? I’m so choked that our ...,Who changed the itinerary? If it is the airlin...


In [40]:
# Remove NaN
print(df.isna().sum())
df = df.dropna()
print(df.isna().sum())

Question    0
Answer      0
dtype: int64
Question    0
Answer      0
dtype: int64


In [41]:
df["Question"].values[10:13]

array(['Going on my first red eye, international flight soon. I’ve only ever done short domestic flights. And I’ve never used the bathroom on a plane. \n\n\nI heard someone say that if you’re in a window seat you should try and use the bathroom when the person sitting next to you gets up to use the bathroom. \n\n\nBut other than that, if there any general etiquette that I should be aware of when in the window seat? \n\n\nEspecially when it comes to possibly having to wake someone up to use the bathroom.',
       'I’ve been looking for an app that could track which countries/cities I’ve been to, like an interactive map of some sort.\n\n\nIs there a good app for doing this and gradually updating it? I think it’d be pretty fun!\n\n\nEdit: I know about google maps. Looking for something more separate from my usual stuff',
       'Top of the list rn is most likely Italy, just need to decide on a city. Probably going around late June, looking for some place just to chill with lots of night l

In [42]:
df["Question"].values[10]

'Going on my first red eye, international flight soon. I’ve only ever done short domestic flights. And I’ve never used the bathroom on a plane. \n\n\nI heard someone say that if you’re in a window seat you should try and use the bathroom when the person sitting next to you gets up to use the bathroom. \n\n\nBut other than that, if there any general etiquette that I should be aware of when in the window seat? \n\n\nEspecially when it comes to possibly having to wake someone up to use the bathroom.'

In [43]:
# Remove \n
df['Question'] = df['Question'].str.replace('\n', '')
df['Answer'] = df['Answer'].str.replace('\n', '')

In [44]:
# Check
df['Question'].values[10:13]

array(['Going on my first red eye, international flight soon. I’ve only ever done short domestic flights. And I’ve never used the bathroom on a plane. I heard someone say that if you’re in a window seat you should try and use the bathroom when the person sitting next to you gets up to use the bathroom. But other than that, if there any general etiquette that I should be aware of when in the window seat? Especially when it comes to possibly having to wake someone up to use the bathroom.',
       'I’ve been looking for an app that could track which countries/cities I’ve been to, like an interactive map of some sort.Is there a good app for doing this and gradually updating it? I think it’d be pretty fun!Edit: I know about google maps. Looking for something more separate from my usual stuff',
       'Top of the list rn is most likely Italy, just need to decide on a city. Probably going around late June, looking for some place just to chill with lots of night life, a nice city centre, and s

Step 0.
- Install and import libraries

In [45]:
!pip install -qqq bitsandbytes
!pip install -qqq torch
!pip install  -qqq -U git+https://github.com/huggingface/transformers
!pip install -qqq -U git+https://github.com/huggingface/peft
!pip install -qqq -U git+https://github.com/huggingface/accelerate
!pip install -qqq datasets
!pip install -qqq loralib
!pip install -qqq einops

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

In [46]:
import json
import os
from pprint import pprint
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset, Dataset
from huggingface_hub import notebook_login

from peft import LoraConfig, PeftConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"

Step 1.
- Bits and bytes configuration for quantization.
- Define the model with model_name and quantization_config.
- Define the tokenizer with model_name.

In [68]:
model = "/kaggle/input/llama-2/pytorch/13b-chat-hf/1"
MODEL_NAME = model

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config
)

# PEFT wrapper the model for training / fine-tuning
model = prepare_model_for_kbit_training(model)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Step 2.
- Overview of this LLM, see how many layers and parameters it has,
- Get information about the last layer to fine-tune.

In [69]:
import re
# This is to get the numer of layers of our LLM
def get_num_layers(model):
    # We first define a set
    numbers = set()
    for name, _ in model.named_parameters():
        # Name is of this form: model.layers.2.post_attention_layernorm.weight
        # The number 2 means the index of the post attention layer norm
        # We use regular expression to parse the number 2
        for number in re.findall(r'\d+', name):
            numbers.add(int(number))
    # The number of layers is exactly maximum value of the set numbers
    return max(numbers)

# This is to get the number of parameters of our LLM
def get_num_params(model):
    num_params = 0
    for _, param in model.named_parameters():
        num_params += param.numel()
    return num_params

def get_last_layer_linears(model):
    names = []
    
    num_layers = get_num_layers(model)
    for name, module in model.named_modules():
        if str(num_layers) in name and not "encoder" in name:
            if isinstance(module, torch.nn.Linear):
                names.append(name)
    return names

In [49]:
# Test
name = "model.layers.2.post_attention_layernorm.weight"
number = re.findall(r'\d+', name)
number

['2']

In [50]:
# Total number of layers and parameters
print(get_num_layers(model))
# Total number of named params
print(get_num_params(model))
print(get_last_layer_linears(model))

39
6671979520
['model.layers.39.self_attn.q_proj', 'model.layers.39.self_attn.k_proj', 'model.layers.39.self_attn.v_proj', 'model.layers.39.self_attn.o_proj', 'model.layers.39.mlp.gate_proj', 'model.layers.39.mlp.up_proj', 'model.layers.39.mlp.down_proj']


Step 3.
- Define lora_config with parameters r, lora_alpha, etc...
- Redefine the model with that lora config: this will freeze parameters of previous layers, and we update only parameters for the last layer.

In [70]:
config = LoraConfig(
    r=2,
    lora_alpha=32,
    target_modules=get_last_layer_linears(model),
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Redefine the model with Lora config
model = get_peft_model(model, config)

Step 4.
- Define generation_config: this is for the result generation with parameters: max_new_tokens, temperature, top_p, etc..

In [72]:
generation_config = model.generation_config
# max_new_tokens is limited length of the answer
generation_config.max_new_tokens = 100
# low temperature (0.1) for more predictable/coherent text
# high temperature (0.9) for more creative/unpredictable text
generation_config.temperature = 0.9
# top_p = 0.7 means the next word must have at least 70% chance to appear
generation_config.top_p = 0.7
generation_config.do_sample = True
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id


Step 5.
- Test pretrained model by defining a prompt with a question-answer pair. 
- Define encoding = tokenizer(prompt)
- Generate output = model.generate(encoding)

In [73]:
prompt = df["Question"].values[10] + ". Answer: ".strip()
prompt

'Going on my first red eye, international flight soon. I’ve only ever done short domestic flights. And I’ve never used the bathroom on a plane. I heard someone say that if you’re in a window seat you should try and use the bathroom when the person sitting next to you gets up to use the bathroom. But other than that, if there any general etiquette that I should be aware of when in the window seat? Especially when it comes to possibly having to wake someone up to use the bathroom.. Answer:'

We will use this prompt to see the answer of the pretrained model.

In [74]:
%%time
device = "cuda"

encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.no_grad():
    outputs = model.generate(
        input_ids = encoding.input_ids,
        attention_mask = encoding.attention_mask,
        generation_config = generation_config
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Going on my first red eye, international flight soon. I’ve only ever done short domestic flights. And I’ve never used the bathroom on a plane. I heard someone say that if you’re in a window seat you should try and use the bathroom when the person sitting next to you gets up to use the bathroom. But other than that, if there any general etiquette that I should be aware of when in the window seat? Especially when it comes to possibly having to wake someone up to use the bathroom.. Answer: I can understand your concern. When it comes to using the bathroom on a plane, it's important to be mindful of your fellow passengers and follow some basic etiquette. Here are some tips that may be helpful:

1. Use the bathroom before the flight: It's a good idea to use the bathroom before the flight, especially if you know you'll be in a window seat. This way, you can avoid having to use the bathroom during
CPU times: user 16.4 s, sys: 600 ms, total: 17 s
Wall time: 17 s


The generated answer is "It's a good idea to use the bathroom before the flight". We will see the question again after fine-tuning.

Step 6.
- Define data for our LLM. Note that all question-answer pairs will be tokenized in this step.

In [55]:
data = Dataset.from_pandas(df)

In [75]:
def generate_prompt(data_point):
    return f"""
            {data_point["Question"]}. 
            Answer: {data_point["Answer"]}
            """.strip()


def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data.shuffle().map(generate_and_tokenize_prompt)

  0%|          | 0/21 [00:00<?, ?ex/s]

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.



Step 7.
- Define training_args with some parameters such as output_dir, learning rate, optimizer, ...
- Define trainer = transformer(model, data, training_args)
- Fine tune the model by calling trainer.train()

In [77]:
training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=1e-4,
    fp16=True,
    output_dir="finetune_reddit",
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.01,
    report_to="none"
)

trainer = transformers.Trainer(
    model=model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
model.config.use_cache = False
trainer.train()

Step,Training Loss


TrainOutput(global_step=5, training_loss=2.7560375213623045, metrics={'train_runtime': 72.7525, 'train_samples_per_second': 0.289, 'train_steps_per_second': 0.069, 'total_flos': 364360434278400.0, 'train_loss': 2.7560375213623045, 'epoch': 0.95})

Step 8.
- Evaluate

Because the size of the data is small, we can ommit the test step. But the pseudo-codes below show how it is done for ROUGE metric.

In [None]:
# !pip install evaluate
# import evaluate

# rouge = evaluate.load('rouge')

# original_response = append all generated answers with the original model
# finetuned_response = append all generated answers with the fine-tuned model
# human_response = append all answers in Anwer

# original_model_results = rouge.compute(
#     predictions=original_response,
#     references=human_response,
#     use_aggregator=True,
#     use_stemmer=True,
# )

# finetuned_model_results = rouge.compute(
#     predictions=finedtune_response,
#     references=human_response,
#     use_aggregator=True,
#     use_stemmer=True,
# )

# print('ORIGINAL MODEL:')
# print(original_model_results)
# print('FINE-TUNED MODEL:')
# print(finetuned_model_results)

We come the final step for fine-tuning.

Step 9.
- Save the pretrained model.

In [79]:
model.save_pretrained("trained-model")

Next step is about how to load and use a fine-tuned model.

Step 1.
- Load model: to do this, we need the fine-tuned file and we define our model again like the Step 1 in the fine-tuning process.

In [80]:
PEFT_MODEL = "/kaggle/working/trained-model"

config = PeftConfig.from_pretrained(PEFT_MODEL)
model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)

tokenizer=AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

model = PeftModel.from_pretrained(model, PEFT_MODEL)

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Step 2.
- Define generation config: this is also similar to the corresponding step in the fine-tuning process

In [81]:
generation_config = model.generation_config
generation_config.max_new_tokens = 100
generation_config.do_sample = True
generation_config.temperature = 0.9
generation_config.top_p = 0.7
generation_config.num_return_sequences = 1
generation_config.pad_token_id = tokenizer.eos_token_id
generation_config.eos_token_id = tokenizer.eos_token_id

Step 3.
- Test with a prompt to see the result.

In [82]:
prompt = prompt = df["Question"].values[10] + ". Answer: ".strip()

device = "cuda"
encoding = tokenizer(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
  outputs = model.generate(
      input_ids = encoding.input_ids,
      attention_mask = encoding.attention_mask,
      generation_config = generation_config
  )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Going on my first red eye, international flight soon. I’ve only ever done short domestic flights. And I’ve never used the bathroom on a plane. I heard someone say that if you’re in a window seat you should try and use the bathroom when the person sitting next to you gets up to use the bathroom. But other than that, if there any general etiquette that I should be aware of when in the window seat? Especially when it comes to possibly having to wake someone up to use the bathroom.. Answer: It's always a good idea to be mindful of your fellow passengers when using the bathroom on a flight, especially if you're in a window seat. Here are some general etiquette tips to keep in mind:

1. Be considerate of the person sitting next to you: If you need to use the bathroom, try to wait until the person sitting next to you is not sleeping or has their seatbelt on. You can usually tell if someone is
