<a href="https://colab.research.google.com/github/tanvigadgil/happify-chatbot/blob/main/happify_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install and import

In [1]:
!pip install trl
!pip install transformers
!pip install accelerate -U
!pip install peft
!pip install datasets
!pip install bitsandbytes
!pip install einops
!pip install wandb
!pip install --upgrade nvidia-pyindex



In [2]:
import torch
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, TrainingArguments, GenerationConfig
from peft import LoraConfig, get_peft_model, PeftConfig, PeftModel, prepare_model_for_kbit_training
from trl import SFTTrainer
import wandb
import warnings
warnings.filterwarnings("ignore")

In [18]:
wandb.login()

[34m[1mwandb[0m: Currently logged in as: [33mtanvi-gadgil2000[0m ([33mhappify-chatbot[0m). Use [1m`wandb login --relogin`[0m to force relogin


True

## Import Dataset

In [3]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

> Dataset: https://huggingface.co/datasets/heliosbrahma/mental_health_chatbot_dataset

In [4]:
data = load_dataset("heliosbrahma/mental_health_chatbot_dataset")
data

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 172
    })
})

In [5]:
data["train"][0]["text"]

'<HUMAN>: What is a panic attack?\n<ASSISTANT>: Panic attacks come on suddenly and involve intense and often overwhelming fear. They’re accompanied by very challenging physical symptoms, like a racing heartbeat, shortness of breath, or nausea. Unexpected panic attacks occur without an obvious cause. Expected panic attacks are cued by external stressors, like phobias. Panic attacks can happen to anyone, but having more than one may be a sign of panic disorder, a mental health condition characterized by sudden and repeated panic attacks.'

# Model Training

In [6]:
!nvidia-smi

Tue Jun 25 02:16:43 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   61C    P8              11W /  70W |      3MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

A sharded model of Falcon-7b is used which is quantized in 4-bit NF format, loaded in BF16 format. PEFT QLoRA is used for fine-tuning.

In [7]:
model_name = "ybelkada/falcon-7b-sharded-bf16"

In [8]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_use_double_quant= True,
    bnb_4bit_compute_dtype= torch.bfloat16
)

In [9]:
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config= bnb_config,
    device_map= "auto",
    trust_remote_code= True
)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code= True)

tokenizer.pad_token = tokenizer.eos_token

In [11]:
model = prepare_model_for_kbit_training(model)

lora_alpha = 32 # scaling factor for weight matrices
lora_dropout = 0.05 # dropout probability
lora_rank = 32 # dimension of low-rank matrices

In [12]:
peft_config = LoraConfig(
    lora_alpha= lora_alpha,
    lora_dropout= lora_dropout,
    r= lora_rank,
    bias= "none",
    task_type= "CASUAL_LM",
    target_modules= [
        "query_key_value",
        "dense",
        "dense_h_to_4h",
        "dense_4h_to_4h"
    ]
)

peft_model = get_peft_model(model, peft_config)

In [32]:
output_dir = "./happify-chatbot"
batch_size = 8
gradient_steps = 8
optim = "paged_adamw_32bit" # activate paging
checkpoints = "steps"
num_of_steps = 10
logging_steps = 10
learning_rate = 2e-4
max_gradient_norm = 0.3 # For gradient clipping
max_steps = 100
warmup_ratio = 0.03 # linear warmup from 0 to learning_rate
scheduler_type = "cosine" # Learning rate scheduler

training_arguments = TrainingArguments(
    output_dir= output_dir,
    per_device_train_batch_size= batch_size,
    gradient_accumulation_steps= gradient_steps,
    optim= optim,
    save_steps= num_of_steps,
    logging_steps= logging_steps,
    learning_rate= learning_rate,
    bf16= True,
    max_grad_norm= max_gradient_norm,
    max_steps= max_steps,
    warmup_ratio= warmup_ratio,
    group_by_length= True,
    lr_scheduler_type= scheduler_type,
    # push_to_hub= True
)

In [33]:
trainer = SFTTrainer(
    model = peft_model,
    train_dataset= data['train'],
    peft_config= peft_config,
    dataset_text_field= "text",
    max_seq_length= 1024,
    tokenizer= tokenizer,
    args= training_arguments
)

max_steps is given, it will override any value given in num_train_epochs


In [34]:
# For stable training
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.bfloat16)

In [None]:
peft_model.config.use_cache = False
trainer.train()

Step,Training Loss


Step,Training Loss
10,1.6923


# Inference Pipeline

In [None]:
# Push to HuggingFace
trainer.push_to_hub(repo_name= "happify-chatbot")

In [None]:
# Load the PEFT model
PEFT_MODEL = "tgadgil/happify-chatbot"

config = PEFTConfig.from_pretrained(PEFT_MODEL)
