# Fine-Tuning Customer Support Chatbot

**Instructions:**
1. Runtime > Change runtime type > GPU T4
2. Run all cells
3. Wait ~30-40 min

**Model:** TinyLlama-1.1B
**Dataset:** Bitext Customer Support (26k samples)

In [1]:
# Cell 1: Install dependencies
!pip install -q torch transformers datasets accelerate peft trl

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/532.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m532.9/532.9 kB[0m [31m34.5 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
# Cell 2: Check GPU
import torch
print(f'PyTorch: {torch.__version__}')
print(f'CUDA available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
    print(f'Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')

PyTorch: 2.9.0+cu126
CUDA available: True
GPU: Tesla T4
Memory: 15.8 GB


In [3]:
# Cell 3: Imports
from datasets import load_dataset
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer, SFTConfig

BASE_MODEL = 'TinyLlama/TinyLlama-1.1B-Chat-v1.0'
OUTPUT_DIR = './customer-support-model'

print('Imports done!')

Imports done!


In [4]:
# Cell 4: Load and format dataset
print('Loading dataset...')
dataset = load_dataset('bitext/Bitext-customer-support-llm-chatbot-training-dataset')
print(f'Total samples: {len(dataset["train"])}')

def format_chat(example):
    text = f"""<|system|>
You are a helpful customer support assistant.</s>
<|user|>
{example['instruction']}</s>
<|assistant|>
{example['response']}</s>"""
    return {'text': text}

dataset = dataset['train'].map(format_chat)
dataset = dataset.train_test_split(test_size=0.1, seed=42)

print(f'Train: {len(dataset["train"])}')
print(f'Val: {len(dataset["test"])}')
print('\nExample:')
print(dataset['train'][0]['text'][:300])

Loading dataset...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


README.md: 0.00B [00:00, ?B/s]

Bitext_Sample_Customer_Support_Training_(…):   0%|          | 0.00/19.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/26872 [00:00<?, ? examples/s]

Total samples: 26872


Map:   0%|          | 0/26872 [00:00<?, ? examples/s]

Train: 24184
Val: 2688

Example:
<|system|>
You are a helpful customer support assistant.</s>
<|user|>
could uhelp me close a freemium account</s>
<|assistant|>
Thank you for your message to us for assistance with closing your {{Account Category}} account. I understand that you're looking to terminate your account and I'm here to g


In [5]:
# Cell 5: Load model (without quantization to avoid BFloat16 issues)
print(f'Loading model: {BASE_MODEL}')
print('This takes 1-2 minutes...')

model = AutoModelForCausalLM.from_pretrained(
    BASE_MODEL,
    torch_dtype=torch.float16,
    device_map='auto',
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = 'right'

print('Model loaded!')
print(f'Parameters: {model.num_parameters():,}')

Loading model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
This takes 1-2 minutes...


config.json:   0%|          | 0.00/608 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/2.20G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

Model loaded!
Parameters: 1,100,048,384


In [6]:
# Cell 6: Apply LoRA
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias='none',
    task_type='CAUSAL_LM',
    target_modules=['q_proj', 'k_proj', 'v_proj', 'o_proj'],
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

trainable params: 4,505,600 || all params: 1,104,553,984 || trainable%: 0.4079


In [8]:
# Cell 7: Training
sft_config = SFTConfig(
    output_dir=OUTPUT_DIR,
    num_train_epochs=1,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,
    eval_strategy='steps',
    eval_steps=500,
    save_steps=500,
    logging_steps=50,
    learning_rate=2e-4,
    warmup_ratio=0.03,
    lr_scheduler_type='cosine',
    save_total_limit=2,
    fp16=True,
    bf16=False,
    report_to='none',
    gradient_checkpointing=False,
)

trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=dataset['train'],
    eval_dataset=dataset['test'],
    processing_class=tokenizer,
)

print('='*60)
print('STARTING TRAINING')
print('='*60)
print(f'Train samples: {len(dataset["train"])}')
print(f'Epochs: 1')
print(f'Batch size: 4 x 4 = 16')
print('='*60)

trainer.train()

The model is already on multiple devices. Skipping the move to device specified in `args`.


STARTING TRAINING
Train samples: 24184
Epochs: 1
Batch size: 4 x 4 = 16


Step,Training Loss,Validation Loss
500,0.5502,0.559478
1000,0.5266,0.535463
1500,0.5276,0.52687


TrainOutput(global_step=1512, training_loss=0.5385636772428241, metrics={'train_runtime': 2773.2714, 'train_samples_per_second': 8.72, 'train_steps_per_second': 0.545, 'total_flos': 4.119643134266573e+16, 'train_loss': 0.5385636772428241})

In [9]:
# Cell 8: Save model
trainer.save_model(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f'Model saved to {OUTPUT_DIR}')

Model saved to ./customer-support-model


In [10]:
# Cell 9: Test the model
def generate_response(question):
    prompt = f"""<|system|>
You are a helpful customer support assistant.</s>
<|user|>
{question}</s>
<|assistant|>
"""
    inputs = tokenizer(prompt, return_tensors='pt').to(model.device)

    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=150,
            temperature=0.7,
            do_sample=True,
            top_p=0.9,
            repetition_penalty=1.2,
            pad_token_id=tokenizer.eos_token_id,
        )

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    if '<|assistant|>' in response:
        response = response.split('<|assistant|>')[-1].strip()
    response = response.split('<')[0].strip()
    return response

# Test questions
test_questions = [
    'I want to cancel my order',
    'Where is my package?',
    'How do I get a refund?',
    'I received a damaged product',
    'What is your return policy?',
]

print('='*60)
print('TEST RESULTS')
print('='*60)
for q in test_questions:
    print(f'\nCustomer: {q}')
    print(f'Assistant: {generate_response(q)}')
    print('-'*40)

TEST RESULTS

Customer: I want to cancel my order
Assistant: I've come to understand that you need assistance with canceling your order, and I'm here to help! To proceed with the cancellation process, please follow these steps:
1. Access Your Account: Log in to our platform using your credentials.
2. Navigate to Order History: Once logged in, locate the section labeled "Order History" or "My Orders." This is where you will find all of your past purchases.
3. Find the Specific Relevant Order: Look for an order associated with the number {{Order Number}} and click on it to access the details page.
4. Initiate Cancellation Process: On this page, you should see options such as "Cancel Order
----------------------------------------

Customer: Where is my package?
Assistant: We've received your message to us regarding the location of your package. To assist you further, could you please provide me with the order number or any other relevant details associated with your purchase? This will en

In [11]:
# Cell 10: Download model (optional)
!zip -r customer-support-model.zip {OUTPUT_DIR}
from google.colab import files
files.download('customer-support-model.zip')

  adding: customer-support-model/ (stored 0%)
  adding: customer-support-model/README.md (deflated 45%)
  adding: customer-support-model/adapter_model.safetensors (deflated 8%)
  adding: customer-support-model/tokenizer_config.json (deflated 69%)
  adding: customer-support-model/special_tokens_map.json (deflated 73%)
  adding: customer-support-model/checkpoint-1500/ (stored 0%)
  adding: customer-support-model/checkpoint-1500/README.md (deflated 65%)
  adding: customer-support-model/checkpoint-1500/adapter_model.safetensors (deflated 8%)
  adding: customer-support-model/checkpoint-1500/tokenizer_config.json (deflated 69%)
  adding: customer-support-model/checkpoint-1500/special_tokens_map.json (deflated 73%)
  adding: customer-support-model/checkpoint-1500/tokenizer.model (deflated 55%)
  adding: customer-support-model/checkpoint-1500/adapter_config.json (deflated 57%)
  adding: customer-support-model/checkpoint-1500/chat_template.jinja (deflated 60%)
  adding: customer-support-model/c

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [12]:
# Cell 11: Training summary
print('='*60)
print('TRAINING COMPLETE')
print('='*60)
print(f'Model: {BASE_MODEL}')
print(f'Dataset: Bitext Customer Support')
print(f'Train samples: {len(dataset["train"])}')
print(f'LoRA rank: 16')
print(f'Epochs: 1')
print(f'Output: {OUTPUT_DIR}')
print('='*60)

TRAINING COMPLETE
Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Dataset: Bitext Customer Support
Train samples: 24184
LoRA rank: 16
Epochs: 1
Output: ./customer-support-model
