# Fine-Tuning Phi-4 for Text Summarization

This notebook fine-tunes the Phi-4 model for Text Summarization using the `abisee/cnn_dailymail` dataset. It includes dataset downloading, preprocessing, fine-tuning with LoRA and Unsloth, and evaluation with Accuracy and training loss plotting.

## Setup
- **Environment**: Google Colab with T4 GPU (16GB VRAM).
- **Libraries**: Unsloth for efficient fine-tuning, Hugging Face Transformers, Datasets, and Evaluate for metrics.
- **Dataset**: `abisee/cnn_dailymail`.
- **Output**: Fine-tuned model, ROUGE-L, and loss graph.




In [1]:
%%capture
!pip install evaluate
!pip install rouge_score
import os
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets>=3.4.1" huggingface_hub hf_transfer
    !pip install transformers==4.51.3
    !pip install --no-deps unsloth

## 1. Load Model and tokennizer
Load the Phi-4 model with 4-bit quantization using Unsloth for memory efficiency.

In [2]:
from unsloth import FastLanguageModel
import torch
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Phi-4",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    # token = "hf_...", # use one if using gated models like meta-llama/Llama-2-7b-hf
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.9: Fast Llama patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

## 2. Configure Model with LoRA adapter

Use LoRA for parameter-efficient fine-tuning.



In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    use_gradient_checkpointing = "unsloth",
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None,
)

Unsloth 2025.5.9 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.


## 3. Load and Preprocess Dataset

We load the `abisee/cnn_dailymail` dataset and preprocess it to extract classification tasks.




In [4]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template = "phi-4",
    mapping = {"role" : "from", "content" : "value", "user" : "human", "assistant" : "gpt"},
)

from datasets import load_dataset

cnn_dm_dataset = load_dataset('cnn_dailymail', '3.0.0')

def format_cnn_dm_for_chat(examples):
    texts = []
    for article, summary in zip(examples['article'], examples['highlights']):
        article_words = article.split()
        if len(article_words) > 800:
            article = " ".join(article_words[:800]) + "..."

        conversation = [
            {"from": "human", "value": f"Please summarize the following article:\n\n{article}"},
            {"from": "gpt", "value": summary}
        ]

        text = tokenizer.apply_chat_template(conversation, tokenize=False, add_generation_prompt=False)
        texts.append(text)

    return {"text": texts}

total_size = len(cnn_dm_dataset['train'])
sample_size = int(0.01 * total_size)
subset_dataset = cnn_dm_dataset['train'].select(range(sample_size))

formatted_dataset = subset_dataset.map(format_cnn_dm_for_chat, batched=True)

print("Formatted sample:")
print(formatted_dataset[0]['text'])
print("\n" + "="*50 + "\n")
print(formatted_dataset[1]['text'])

train_size = int(0.9 * len(formatted_dataset))
train_dataset = formatted_dataset.select(range(train_size))
val_dataset = formatted_dataset.select(range(train_size, len(formatted_dataset)))

print(f"\nTotal original dataset size: {total_size}")
print(f"Using 1% subset size: {sample_size}")
print(f"Train dataset size: {len(train_dataset)}")
print(f"Validation dataset size: {len(val_dataset)}")

Formatted sample:
<|im_start|>user<|im_sep|>Please summarize the following article:

LONDON, England (Reuters) -- Harry Potter star Daniel Radcliffe gains access to a reported £20 million ($41.1 million) fortune as he turns 18 on Monday, but he insists the money won't cast a spell on him. Daniel Radcliffe as Harry Potter in "Harry Potter and the Order of the Phoenix" To the disappointment of gossip columnists around the world, the young actor says he has no plans to fritter his cash away on fast cars, drink and celebrity parties. "I don't plan to be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar," he told an Australian interviewer earlier this month. "I don't think I'll be particularly extravagant. "The things I like buying are things that cost about 10 pounds -- books and CDs and DVDs." At 18, Radcliffe will be able to gamble in a casino, buy a drink in a pub or see the horror film "Hostel: Part II," curre

## 4. Setting Up Training Arguments and Fine-tune Model

Use SFTTrainer with Unsloth to fine-tune the model.

In [5]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported
import os

os.environ["WANDB_DISABLED"] = "true"
os.environ["WANDB_MODE"] = "disabled"

trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = train_dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False,
    args = TrainingArguments(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 2,
        warmup_steps = 2,
        max_steps = 50,
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 5,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 3407,
        output_dir = "outputs",
        report_to = [],
        dataloader_drop_last = True,
    ),
)

In [8]:
from evaluate import load
from tqdm import tqdm

predictions_before = []
references = []

# dùng 10 mẫu để đánh giá nhanh
for sample in tqdm(cnn_dm_dataset["validation"].select(range(10))):
    article = sample["article"]
    reference_summary = sample["highlights"]

    prompt = f"Summarize this article: {article}"

    messages = [{"role": "user", "content": prompt}]
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=150,
        temperature=0.7,
        do_sample=False,
    )

    pred_summary = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True).strip()

    predictions_before.append(pred_summary)
    references.append(reference_summary)

rouge = load("rouge")

results_before = rouge.compute(
    predictions=predictions_before,
    references=references,
    use_aggregator=True,
    use_stemmer=True
)


100%|██████████| 10/10 [02:59<00:00, 17.95s/it]


In [9]:
print("=== START TRAINING ===")
print(f"Dataset size: {len(train_dataset)}")
print(f"Batch size: 4")
print(f"Total steps: 50")
print("=" * 50)

trainer_stats = trainer.train()

print("\n=== END TRAINING ===")
print(f"Training loss: {trainer_stats.training_loss:.4f}")
print(f"Total training time: {trainer_stats.metrics.get('train_runtime', 'N/A')} seconds")

=== START TRAINING ===
Dataset size: 2583
Batch size: 4
Total steps: 50


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 2,583 | Num Epochs = 1 | Total steps = 50
O^O/ \_/ \    Batch size per device = 4 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (4 x 2 x 1) = 8
 "-____-"     Trainable parameters = 65,536,000/4,000,000,000 (1.64% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
5,2.0279
10,1.9244
15,1.9055
20,1.9442
25,1.8677
30,1.8286
35,1.8665
40,1.891
45,1.8913
50,1.8729



=== END TRAINING ===
Training loss: 1.9020
Total training time: 1887.4543 seconds


## 6. Get Some Examples output

In [10]:
# Cell test model - cập nhật để test summarization
# Test model với một vài sample
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# Load một vài sample từ validation set của CNN/DailyMail để test
test_articles = [
    "Scientists have discovered a new species of deep-sea fish in the Pacific Ocean. The fish, found at depths of over 3000 meters, has unique bioluminescent properties that help it navigate in the dark waters. Researchers believe this discovery could lead to new insights about marine biodiversity and evolution.",

    "The stock market experienced significant volatility today as investors reacted to new economic data. Technology stocks led the decline, with major companies seeing drops of up to 5%. Market analysts suggest the volatility is related to concerns about inflation and interest rate policies.",

    "A breakthrough in renewable energy technology has been announced by researchers at MIT. The new solar panel design can achieve 40% efficiency, significantly higher than current commercial panels. This advancement could make solar energy more cost-effective and accelerate the transition to clean energy."
]

print("=== TESTING SUMMARIZATION MODEL ===\n")

for i, article in enumerate(test_articles, 1):
    print(f"Test {i}:")
    print("Article:", article[:100] + "..." if len(article) > 100 else article)

    messages = [
        {"role": "user", "content": f"Please summarize the following article:\n\n{article}"}
    ]

    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize = True,
        add_generation_prompt = True,
        return_tensors = "pt"
    ).to("cuda")

    outputs = model.generate(
        input_ids = inputs,
        max_new_tokens = 100,
        use_cache = True,
        temperature = 0.3,
        do_sample = True,
        pad_token_id = tokenizer.eos_token_id
    )

    response = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True)
    print("Summary:", response.strip())
    print("-" * 80)
    print()

=== TESTING SUMMARIZATION MODEL ===

Test 1:
Article: Scientists have discovered a new species of deep-sea fish in the Pacific Ocean. The fish, found at d...
Summary: Scientists discover new species of deep-sea fish in the Pacific Ocean .
Fish found at depths of over 3,000 meters has unique bioluminescent properties .
Researchers believe discovery could lead to new insights about marine biodiversity and evolution .
--------------------------------------------------------------------------------

Test 2:
Article: The stock market experienced significant volatility today as investors reacted to new economic data....
Summary: Stock market experiences significant volatility today .
Technology stocks lead the decline, with major companies seeing drops of up to 5% .
Market analysts suggest the volatility is related to concerns about inflation and interest rate policies .
--------------------------------------------------------------------------------

Test 3:
Article: A breakthrough in renew

## 7. Evaluate The Model

In [14]:
predictions_after = []
references_after = []

# dùng 10 mẫu để đánh giá nhanh
for sample in tqdm(cnn_dm_dataset["validation"].select(range(10))):
    article = sample["article"]
    reference_summary = sample["highlights"]

    prompt = f"Summarize this article: {article}"

    messages = [{"role": "user", "content": prompt}]
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,
        return_tensors="pt"
    ).to("cuda")

    outputs = model.generate(
        input_ids=inputs,
        max_new_tokens=150,
        temperature=0.7,
        do_sample=False,
    )

    pred_summary = tokenizer.decode(outputs[0][inputs.shape[1]:], skip_special_tokens=True).strip()

    predictions_after.append(pred_summary)
    references_after.append(reference_summary)

rouge = load("rouge")

results_after = rouge.compute(
    predictions=predictions_after,
    references=references_after,
    use_aggregator=True,
    use_stemmer=True
)
print(f"ROUGE-L F1: {results_after['rougeL']:.4f}")

100%|██████████| 10/10 [01:13<00:00,  7.34s/it]


ROUGE-L F1: 0.2129


In [1]:
import matplotlib.pyplot as plt

# ROUGE values
rouge_names = ['ROUGE-1', 'ROUGE-2', 'ROUGE-L']
before_scores = [results_before['rouge1'], results_before['rouge2'], results_before['rougeL']]
after_scores  = [results_after['rouge1'], results_after['rouge2'], results_after['rougeL']]

x = range(len(rouge_names))
bar_width = 0.35

plt.figure(figsize=(8,6))
plt.bar(x, before_scores, width=bar_width, label='Before Fine-tuning', color='skyblue')
plt.bar([i + bar_width for i in x], after_scores, width=bar_width, label='After Fine-tuning', color='salmon')

plt.xlabel('ROUGE Score Type')
plt.ylabel('Score')
plt.title('ROUGE Score Comparison Before vs After Fine-Tuning')
plt.xticks([i + bar_width/2 for i in x], rouge_names)
plt.ylim(0, 1)
plt.legend()
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()


NameError: name 'results_before' is not defined

In [None]:
import matplotlib.pyplot as plt

def plot_training_loss(trainer):
    logs = trainer.state.log_history
    steps = []
    losses = []

    for log in logs:
        if "loss" in log and "step" in log:
            steps.append(log["step"])
            losses.append(log["loss"])

    plt.figure(figsize=(8, 5))
    plt.plot(steps, losses, marker="o", linestyle="-", color="blue")
    plt.xlabel("Training Step")
    plt.ylabel("Loss")
    plt.title("Training Loss Curve")
    plt.grid(True)
    plt.show()

plot_training_loss(trainer)


## 8. Login and Save Model to HuggingFace

In [None]:
!huggingface-cli login

In [None]:
model.push_to_hub("thanhle1702/phii4-finetuned-cnn-dailymail")
tokenizer.push_to_hub("thanhle1702/phii4-finetuned-cnn-dailymail")