This is a kaggle notebook to fine-tune Flan T5 small for a travel chatbot. The fine-tuning process is much faster than Llama 2, but the model is not very intelligent, and it may take a lot of time to fine-tune.

In [53]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/reddit-more/reddit_more.csv


In [54]:
!pip install -qqq bitsandbytes
!pip install -qqq torch
!pip install  -qqq -U git+https://github.com/huggingface/transformers
!pip install -qqq -U git+https://github.com/huggingface/peft
!pip install -qqq -U git+https://github.com/huggingface/accelerate
!pip install -qqq datasets
!pip install -qqq loralib
!pip install -qqq einops

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

In [55]:
import json
import os
from pprint import pprint
import bitsandbytes as bnb
import torch
import torch.nn as nn
import transformers
from datasets import load_dataset, Dataset
from huggingface_hub import notebook_login

from peft import LoraConfig, PeftConfig, PeftModel, get_peft_model, prepare_model_for_kbit_training
from transformers import AutoConfig, AutoModelForSeq2SeqLM, AutoTokenizer, BitsAndBytesConfig

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

In [56]:
MODEL_NAME = 'google/flan-t5-small'

In [57]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

t5_model = AutoModelForSeq2SeqLM.from_pretrained(
    MODEL_NAME,
    device_map="auto",
    trust_remote_code=True,
    quantization_config=bnb_config
)

# PEFT wrapper the model for training / fine-tuning
t5_model = prepare_model_for_kbit_training(t5_model)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token

We will perform the chatbot before fine-tuning.

In [58]:
device = "cuda"

def chat_with_bot(model, input_text):
    input_text = "chatbot: " + input_text + " </s>"
    input_ids = tokenizer.encode(input_text, return_tensors="pt").to(device)

    # Generate bot's response
    output = model.generate(input_ids=input_ids, max_length=200, num_return_sequences=1)
    
    # Decode and return the response
    bot_response = tokenizer.decode(output[0], skip_special_tokens=True)
    return bot_response

Question from reddit: I am traveling to Japan in the beginning of May with 5 other friends. We're visiting Tokyo, Osaka and Kyoto for two weeks. What are some travel hacks that you guys could recommend or things you would have done different when you travelled to Japan.   

In [59]:
print("T5 Chatbot: Hi there! How can I help you?")
user_input = input("You: ")
while user_input.lower() not in ['exit', 'quit']:
    response = chat_with_bot(t5_model, user_input)
    print("Chatbot:", response)
    user_input = input("You: ")

T5 Chatbot: Hi there! How can I help you?


You:  I am travelling to Japan. Are there some travel hacks you recommend?


Chatbot: no


You:  exit


We can fine-tune this and compare the answer. But first, we can take a look on the dataset.

In [60]:
df = pd.read_csv('/kaggle/input/reddit-more/reddit_more.csv')
# Remove NaN
print(df.isna().sum())
df = df.dropna()
print(df.isna().sum())
# We will fine-tune on a small subset
df = df.iloc[100:201]
df.reset_index(drop=True, inplace=True)

Question    119
Answer        0
dtype: int64
Question    0
Answer      0
dtype: int64


In [61]:
# Search for the question we asked chatbot
search_text = 'visiting Tokyo, Osaka and Kyoto'
indices = df[df['Question'].str.contains(search_text)].index
print(indices)

Index([56], dtype='int64')


In [62]:
# This is the question we asked chatbot
df["Question"].values[56]

"Hello m8s,  \n\n\nI am traveling to Japan in the beginning of May with 5 other friends. We're visiting Tokyo, Osaka and Kyoto for two weeks. What are some travel hacks that you guys could recommend or things you would have done different when you travelled to Japan.   \n\n\nThank you very much in advance any advice is greatly appreciated :)"

In [63]:
# Remove \n
df['Question'] = df['Question'].str.replace('\n', '')
df['Answer'] = df['Answer'].str.replace('\n', '')

In [64]:
# Check
df["Question"].values[56]

"Hello m8s,  I am traveling to Japan in the beginning of May with 5 other friends. We're visiting Tokyo, Osaka and Kyoto for two weeks. What are some travel hacks that you guys could recommend or things you would have done different when you travelled to Japan.   Thank you very much in advance any advice is greatly appreciated :)"

We will take a closer look to the model parameters and see how many parameters we will need to fine-tune the last layer.

In [65]:
import re
# This is to get the numer of layers of our LLM
def get_num_layers(model):
    # We first define a set
    numbers = set()
    for name, _ in model.named_parameters():
        # Name is of this form: model.layers.2.post_attention_layernorm.weight
        # The number 2 means the index of the post attention layer norm
        # We use regular expression to parse the number 2
        for number in re.findall(r'\d+', name):
            numbers.add(int(number))
    # The number of layers is exactly maximum value of the set numbers
    return max(numbers)

# This is to get the number of parameters of our LLM
def get_num_params(model):
    num_params = 0
    for _, param in model.named_parameters():
        num_params += param.numel()
    return num_params

def get_last_layer_linears(model):
    names = []
    
    num_layers = get_num_layers(model)
    for name, module in model.named_modules():
        if str(num_layers) in name and not "encoder" in name:
            if isinstance(module, torch.nn.Linear):
                names.append(name)
    return names

def get_n_last_layer_params(model):
    last_layer_params = 0

    num_layers = get_num_layers(model)
    for name, module in model.named_modules():
        if str(num_layers) in name and not "encoder" in name:
            if isinstance(module, torch.nn.Linear):
                last_layer_params += sum(p.numel() for p in module.parameters())

    return last_layer_params

In [66]:
# Total number of layers and parameters
print(get_num_layers(t5_model))
# Total number of named params
print(get_num_params(t5_model))
# Parameter names of the last layer
print(get_last_layer_linears(t5_model))
# Total number of parameters in the last layer
print(get_n_last_layer_params(t5_model))

7
59135360
['decoder.block.7.layer.0.SelfAttention.q', 'decoder.block.7.layer.0.SelfAttention.k', 'decoder.block.7.layer.0.SelfAttention.v', 'decoder.block.7.layer.0.SelfAttention.o', 'decoder.block.7.layer.1.EncDecAttention.q', 'decoder.block.7.layer.1.EncDecAttention.k', 'decoder.block.7.layer.1.EncDecAttention.v', 'decoder.block.7.layer.1.EncDecAttention.o', 'decoder.block.7.layer.2.DenseReluDense.wi_0', 'decoder.block.7.layer.2.DenseReluDense.wi_1', 'decoder.block.7.layer.2.DenseReluDense.wo']
1835008


So we will fine-tune 1.8M parameters. We will next config parameters for Lora.

In [67]:
from peft import TaskType

# device = torch.device('cuda:1')
# Define lora config
lora_config = LoraConfig(
    r=8, # Rank
    lora_alpha=32,
    target_modules=get_last_layer_linears(t5_model),
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
t5_model = get_peft_model(t5_model, lora_config)

And define the data for the fine-tuning with question-answer pairs.

In [68]:
# Define data
data = Dataset.from_pandas(df)

In [69]:
def generate_prompt(data_point):
    return f"""
            {data_point["Question"]}. 
            Answer: {data_point["Answer"]}
            """.strip()


def generate_and_tokenize_prompt(data_point):
    full_prompt = generate_prompt(data_point)
    tokenized_full_prompt = tokenizer(full_prompt, padding=True, truncation=True)
    return tokenized_full_prompt

data = data.shuffle().map(generate_and_tokenize_prompt)

  0%|          | 0/101 [00:00<?, ?ex/s]

Define training arguments, trainer and train.

In [70]:
output_dir = 'finetune-t5'

training_args = transformers.TrainingArguments(
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    num_train_epochs=1,
    learning_rate=1e-3,
    fp16=True,
    output_dir=output_dir,
    optim="paged_adamw_8bit",
    lr_scheduler_type="cosine",
    warmup_ratio=0.01,
    report_to="none"
)

trainer = transformers.Trainer(
    model=t5_model,
    train_dataset=data,
    args=training_args,
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)
)
t5_model.config.use_cache = False
trainer.train()

Step,Training Loss


TrainOutput(global_step=25, training_loss=0.0, metrics={'train_runtime': 20.0806, 'train_samples_per_second': 5.03, 'train_steps_per_second': 1.245, 'total_flos': 8859182008320.0, 'train_loss': 0.0, 'epoch': 0.99})

In [None]:
print("T5 Chatbot: Hi there! How can I help you?")
user_input = input("You: ")
while user_input.lower() not in ['exit', 'quit']:
    response = chat_with_bot(t5_model, user_input)
    print("Chatbot:", response)
    user_input = input("You: ")

T5 Chatbot: Hi there! How can I help you?


You:  I am travelling to Japan. Are there some travel hacks you recommend?


Chatbot: if you are a traveler, you can use a travel hack
