### Fine-Tuning a Mental Health Support Chatbot using Unsloth

In this notebook, I fine-tuned a language model using the **Unsloth** framework to develop a **mental health-focused chatbot**. The goal was to adapt a base model to generate empathetic and context-aware responses for users seeking emotional or psychological support. The workflow included:

- Installing all required packages like `unsloth`, `datasets`, `trl`, `bitsandbytes`, etc.
- Loading a base language model and tokenizer using Unsloth's optimized tools.
- Creating a small custom dataset with mental health-related dialogues.
- Formatting the dataset into instruction-response pairs for chat-style training.
- Tokenizing and preparing the dataset for fine-tuning.
- Setting up LoRA configuration for parameter-efficient fine-tuning.
- Training the model using `SFTTrainer`.
- Saving the trained chatbot for future inference and deployment.

This implementation enables adapting LLMs for sensitive tasks like mental health support using efficient and low-resource training techniques.


In [None]:
pip install -U unsloth accelerate datasets peft trl bitsandbytes

shell-init: error retrieving current directory: getcwd: cannot access parent directories: No such file or directory
Traceback (most recent call last):
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/ec2-user/anaconda3/envs/python3/lib/python3.10/site-packages/pip/__main__.py", line 8, in <module>
    if sys.path[0] in ("", os.getcwd()):
FileNotFoundError: [Errno 2] No such file or directory
Note: you may need to restart the kernel to use updated packages.


In [None]:
pip install unsloth_zoo

Note: you may need to restart the kernel to use updated packages.


In [None]:
pip install bitsandbytes==0.43.1

Collecting bitsandbytes==0.43.1
  Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl.metadata (2.2 kB)
Downloading bitsandbytes-0.43.1-py3-none-manylinux_2_24_x86_64.whl (119.8 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m119.8/119.8 MB[0m [31m120.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: bitsandbytes
  Attempting uninstall: bitsandbytes
    Found existing installation: bitsandbytes 0.41.2
    Uninstalling bitsandbytes-0.41.2:
      Successfully uninstalled bitsandbytes-0.41.2
Successfully installed bitsandbytes-0.43.1
Note: you may need to restart the kernel to use updated packages.


In [None]:
pip install --upgrade bitsandbytes==0.43.2

Collecting bitsandbytes==0.43.2
  Downloading bitsandbytes-0.43.2-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Downloading bitsandbytes-0.43.2-py3-none-manylinux_2_24_x86_64.whl (137.5 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m137.5/137.5 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: bitsandbytes
  Attempting uninstall: bitsandbytes
    Found existing installation: bitsandbytes 0.43.1
    Uninstalling bitsandbytes-0.43.1:
      Successfully uninstalled bitsandbytes-0.43.1
Successfully installed bitsandbytes-0.43.2
Note: you may need to restart the kernel to use updated packages.


In [None]:
import unsloth
from unsloth import FastLanguageModel

from transformers import TrainingArguments
from datasets import load_dataset
import torch

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [None]:
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit",
    max_seq_length = 2048,
    dtype = torch.float16,
    load_in_4bit = True,
)

==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.568 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.0+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.0.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.27.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


tokenizer_config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/438 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

In [None]:
file_path = "/home/ec2-user/SageMaker/mental_health_data.jsonl"
print("File exists?", os.path.exists(file_path))  # Should now return True

File exists? False


In [None]:
from datasets import Dataset

data = [
    {
        "instruction": "How can I reduce daily stress?",
        "input": "I get overwhelmed with work and family tasks.",
        "output": "Try using time blocking and meditation for 10 minutes daily."
    },
    {
        "instruction": "Coping with anxiety before public speaking?",
        "input": "My heart races before presenting in class.",
        "output": "Practice deep breathing and positive visualization beforehand."
    },
]

dataset = Dataset.from_list(data)

In [None]:
# Format for SFT (Supervised Fine-Tuning)
def format(example):
    return {
        "input": f"{example['instruction']}\n{example['input']}",
        "output": example["output"],
    }

dataset = dataset.map(format)

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

In [None]:
from unsloth import FastLanguageModel  # Make sure this is at the top of your file
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-bnb-4bit",
    max_seq_length = 2048,
    dtype = torch.float16,
    load_in_4bit = True,
)

model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    lora_alpha = 16,
    lora_dropout = 0.05,
    bias = "none",
)

==((====))==  Unsloth 2025.3.19: Fast Mistral patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.568 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.4.0+cu121. CUDA: 7.5. CUDA Toolkit: 12.1. Triton: 3.0.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.27.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Unsloth 2025.3.19 patched 32 layers with 0 QKV layers, 0 O layers and 0 MLP layers.


In [None]:
def tokenize(example):
    return tokenizer(
        f"### Instruction:\n{example['input']}\n\n### Response:\n{example['output']}",
        truncation=True,
        padding="max_length",
        max_length=2048,
    )

dataset = dataset.map(tokenize)

Map:   0%|          | 0/2 [00:00<?, ? examples/s]

In [None]:
from transformers import TrainingArguments

args = TrainingArguments(
    per_device_train_batch_size = 2,
    gradient_accumulation_steps = 4,
    warmup_steps = 10,
    max_steps = 100,  # You can increase this when using a full dataset
    learning_rate = 2e-4,
    fp16 = True,
    logging_steps = 10,
    output_dir = "./mental_health_lora",
    optim = "paged_adamw_8bit",
    save_total_limit = 2,
)

In [None]:
from trl import SFTTrainer

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    args = args,
    dataset_text_field = None,   # Needed since we're passing already-tokenized dataset
    max_seq_length = 2048,
    packing = False,
)

In [None]:
trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2 | Num Epochs = 100 | Total steps = 100
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/7,000,000,000 (0.60% trained)


Step,Training Loss
10,1.8779
20,0.1092
30,0.036
40,0.0202
50,0.02
60,0.0201
70,0.0178
80,0.017
90,0.0228
100,0.0181


Unsloth: Will smartly offload gradients to save VRAM!


TrainOutput(global_step=100, training_loss=0.21589711412787438, metrics={'train_runtime': 879.9538, 'train_samples_per_second': 0.909, 'train_steps_per_second': 0.114, 'total_flos': 1.75782374670336e+16, 'train_loss': 0.21589711412787438})

In [None]:
trainer.model.save_pretrained("./mental_health_lora")
tokenizer.save_pretrained("./mental_health_lora")

('./mental_health_lora/tokenizer_config.json',
 './mental_health_lora/special_tokens_map.json',
 './mental_health_lora/tokenizer.model',
 './mental_health_lora/added_tokens.json',
 './mental_health_lora/tokenizer.json')

In [None]:
!pip install gradio

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Collecting gradio
  Downloading gradio-5.24.0-py3-none-any.whl.metadata (16 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Downloading aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting ffmpy (from gradio)
  Downloading ffmpy-0.5.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==1.8.0 (from gradio)
  Downloading gradio_client-1.8.0-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Downloading groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.10.16-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (41 kB)
Collecting pydub (from gradio)
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting python-multipart>=0.0.18 (from gradio)
  Downloading python_multipart-0.0.20-py3-none-any.whl.metadata (1.8 kB)
Collecting ruff>=0.9.3 (from gradio)
  Downloading ruff-0.11.4-py3-none-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (25 kB)
Collec

In [None]:
def chat_with_bot_terminal():
    print("🧠 Mental Health Chatbot (type 'exit' to quit)\n")

    while True:
        user_input = input("You: ")
        if user_input.lower() in ["exit", "quit"]:
            print("👋 Exiting chat.")
            break

        prompt = f"""### Instruction:
{user_input}

### Response:"""

        inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
        with torch.no_grad():
            outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    do_sample=True,
    top_p=0.95,
    temperature=0.7,
    pad_token_id=tokenizer.pad_token_id,
    eos_token_id=tokenizer.eos_token_id  # <-- Add this line
)
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        response_text = response.split("### Response:")[-1].strip()
        print(f"Bot: {response_text}\n")
        print("EOS token ID:", tokenizer.eos_token_id)

# Run it
chat_with_bot_terminal()