<a href="https://colab.research.google.com/github/tomonari-masada/course2025-nlp/blob/main/10_SFT_with_TRL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

* ‰ª•‰∏ã„ÅÆ„Ç≥„Éº„Éâ„ÅØ„ÄÅLLM„ÅÆ„É¢„Éá„É´„Ç´„Éº„Éâ„Åã„Çâ„É™„É≥„ÇØ„ÅåË≤º„Çâ„Çå„Å¶„ÅÑ„Çãnotebook„ÇíÂ∞ë„ÅóÂ§âÊõ¥„Åó„Åü„ÇÇ„ÅÆ„Åß„Åô„ÄÇ
  * https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP#%F0%9F%94%A7-fine-tuning

----

# üíß LFM2 - SFT with TRL

This tutorial demonstrates how to fine-tune our LFM2 models, e.g. [`LiquidAI/LFM2.5-1.2B-Instruct`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct), using the TRL library.

Follow along if it's your first time using trl, or take single code snippets for your own workflow

## üéØ What you'll find:
- **SFT** (Supervised Fine-Tuning) - Basic instruction following
- **LoRA + SFT** - Using LoRA (from PEFT) to SFT while on constrained hardware

## üìã Prerequisites:
- **GPU Runtime**: Select GPU in `Runtime` ‚Üí `Change runtime type`
- **Hugging Face Account**: For accessing models and datasets



# üì¶ Installation & Setup

First, let's install all the required packages:


In [None]:
!pip install transformers>=4.54.0 trl>=0.18.2 peft>=0.15.2

Let's now verify the packages are installed correctly

In [None]:
import torch
import transformers
import trl
import os

os.environ["WANDB_DISABLED"] = "true"
transformers.set_seed(42)

print(f"üì¶ PyTorch version: {torch.__version__}")
print(f"ü§ó Transformers version: {transformers.__version__}")
print(f"üìä TRL version: {trl.__version__}")

# Loading the model from Transformers ü§ó



* „É¢„Éá„É´„ÅØ„ÄÅ[`LiquidAI/LFM2.5-1.2B-JP`](https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP)„Çí‰Ωø„ÅÑ„Åæ„Åô„ÄÇ

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "LiquidAI/LFM2.5-1.2B-JP"

print("üìö Loading tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_id)

print("üß† Loading model...")
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
#   attn_implementation="flash_attention_2" <- uncomment on compatible GPU
)

print("‚úÖ Local model loaded successfully!")
print(f"üî¢ Parameters: {model.num_parameters():,}")
print(f"üìñ Vocab size: {len(tokenizer)}")
print(f"üíæ Model size: ~{model.num_parameters() * 2 / 1e9:.1f} GB (bfloat16)")

# üéØ Part 1: Supervised Fine-Tuning (SFT)

SFT teaches the model to follow instructions by training on input-output pairs (instruction vs response). This is the foundation for creating instruction-following models.

## Load an SFT Dataset

We will use [HuggingFaceTB/smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smoltalk), limiting ourselves to the first 5k samples for brevity. Feel free to change the limit by changing the slicing index in the parameter `split`.

* „Åì„Åì„Åß„ÅØ„ÄÅ‰ª•‰∏ã„ÅÆQitta„ÅÆË®ò‰∫ã„ÇíÂèÇËÄÉ„Å´„Åó„Å¶„ÄÅÂà•„ÅÆ„Éá„Éº„Çø„Çª„ÉÉ„Éà„Çí‰Ωø„ÅÑ„Åæ„Åô„ÄÇ
  * https://qiita.com/t-hashiguchi/items/9f3b394ca0ae1c7e4d02

In [None]:
from datasets import load_dataset

print("üì• Loading SFT dataset...")
ds = load_dataset("bbz662bbz/databricks-dolly-15k-ja-gozaru")

In [None]:
ds

In [None]:
ds["train"][0]

* ÂèÇËÄÉ„Å´„Åó„Åünotebook„Åß‰Ωø„Çè„Çå„Å¶„ÅÑ„Çã`HuggingFaceTB/smoltalk`„Å®Âêå„Åò„Éï„Ç©„Éº„Éû„ÉÉ„Éà„Å´Â§âÊèõ„Åô„Çã„ÄÇ

In [None]:
def smoltalk_prompt_template(example, question_only=False):
    if question_only:
        return { "content": example["instruction"] + example["input"], "role": "user" }
    else:
        if example["input"]:
            return [
                { "content": example["instruction"] + example["input"], "role": "user" },
                { "content": example["output"], "role": "assistant" }
            ]
        else:
            return [
                { "content": example["instruction"], "role": "user" },
                { "content": example["output"], "role": "assistant" }
            ]

def add_messages(example):
    example["messages"] = smoltalk_prompt_template(example)
    return example

In [None]:
ds = ds.map(add_messages, remove_columns=ds["train"].column_names)

In [None]:
ds = ds["train"].train_test_split(test_size=0.2)
ds["validation"] = ds["test"].train_test_split(test_size=0.5)["test"]
ds["test"] = ds["test"].train_test_split(test_size=0.5)["train"]

train_dataset_sft = ds["train"]
eval_dataset_sft = ds["validation"]

print("‚úÖ SFT Dataset loaded:")
print(f"   üìö Train samples: {len(train_dataset_sft)}")
print(f"   üß™ Eval samples: {len(eval_dataset_sft)}")
print(f"\nüìù Single Sample: {train_dataset_sft[0]['messages']}")

## Launch Training

We are now ready to launch an SFT run with `SFTTrainer`, feel free to modify `SFTConfig` to play around with different configurations.



* Google ColabÁÑ°ÊñôÁâà„Å†„Å®„ÄÅGPU„ÅÆ„É°„É¢„É™„ÅåË∂≥„Çä„Å™„Åè„Å™„Çä„Åæ„Åô„ÄÇ

In [None]:
from trl import SFTConfig, SFTTrainer

sft_config = SFTConfig(
    output_dir="./lfm2-sft",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    learning_rate=5e-5,
    lr_scheduler_type="linear",
    warmup_steps=100,
    warmup_ratio=0.2,
    logging_steps=10,
    save_strategy="steps",
    save_steps=1000,
    eval_strategy="steps",
    eval_steps=200,
    load_best_model_at_end=True,
    report_to=None,
    bf16=False # <- not all colab GPUs support bf16
)

print("üèóÔ∏è  Creating SFT trainer...")
sft_trainer = SFTTrainer(
    model=model,
    args=sft_config,
    train_dataset=train_dataset_sft,
    eval_dataset=eval_dataset_sft,
    processing_class=tokenizer,
)

print("\nüöÄ Starting SFT training...")
sft_trainer.train()

print("üéâ SFT training completed!")

sft_trainer.save_model()
print(f"üíæ SFT model saved to: {sft_config.output_dir}")

* „ÉÜ„Çπ„Éà„Çª„ÉÉ„Éà„Å´„Å§„ÅÑ„Å¶„ÄÅÁ≠î„Åà„ÇíÁîüÊàê„Åï„Åõ„Å¶„Åø„Åæ„Åô„ÄÇ

In [None]:
ds["test"][0]["messages"]

In [None]:
ds["test"][0]["messages"][0]

* „ÉÜ„Ç≠„Çπ„ÉàÁîüÊàê„ÅÆÊñπÊ≥ï„ÅØ„É¢„Éá„É´„Ç´„Éº„Éâ„Å´„ÅÇ„ÇãÈÄö„Çä„ÄÇ
  * https://huggingface.co/LiquidAI/LFM2.5-1.2B-JP

In [None]:
streamer = transformers.TextStreamer(tokenizer, skip_special_tokens=True)

for i in range(3):
    example = ds["test"][i]
    input_ids = tokenizer.apply_chat_template(
        example["messages"][0:-1],
        add_generation_prompt=True,
        return_tensors="pt",
        tokenize=True,
    ).to(model.device)
    output = model.generate(
        input_ids,
        do_sample=True,
        temperature=0.3,
        min_p=0.15,
        repetition_penalty=1.05,
        max_new_tokens=256,
        streamer=streamer,
    )
    print(f"Ê≠£Ëß£‰æã: {example['messages'][-1]['content']}\n")

# üéõÔ∏è Part 2: LoRA + SFT (Parameter-Efficient Fine-tuning)

LoRA (Low-Rank Adaptation) allows efficient fine-tuning by only training a small number of additional parameters. Perfect for limited compute resources!


## Wrap the model with PEFT

We specify target modules that will be finetuned while the rest of the models weights remains frozen. Feel free to modify the `r` (rank) value:
- higher -> better approximation of full-finetuning
- lower -> needs even less compute resources

In [None]:
from peft import LoraConfig, get_peft_model, TaskType

GLU_MODULES = ["w1", "w2", "w3"]
MHA_MODULES = ["q_proj", "k_proj", "v_proj", "out_proj"]
CONV_MODULES = ["in_proj", "out_proj"]

lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    inference_mode=False,
    r=8,  # <- lower values = fewer parameters
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=GLU_MODULES + MHA_MODULES + CONV_MODULES,
    bias="none",
    modules_to_save=None,
)

lora_model = get_peft_model(model, lora_config)
lora_model.print_trainable_parameters()

print("‚úÖ LoRA configuration applied!")
print(f"üéõÔ∏è  LoRA rank: {lora_config.r}")
print(f"üìä LoRA alpha: {lora_config.lora_alpha}")
print(f"üéØ Target modules: {lora_config.target_modules}")

## Launch Training

Now ready to launch the SFT training, but this time with the LoRA-wrapped model

In [None]:
from trl import SFTConfig, SFTTrainer

lora_sft_config = SFTConfig(
    output_dir="./lfm2-sft-lora",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    learning_rate=5e-5,
    lr_scheduler_type="linear",
    warmup_steps=100,
    warmup_ratio=0.2,
    logging_steps=10,
    save_strategy="steps",
    save_steps=1000,
    eval_strategy="steps",
    eval_steps=200,
    load_best_model_at_end=True,
    report_to=None,
)

print("üèóÔ∏è  Creating LoRA SFT trainer...")
lora_sft_trainer = SFTTrainer(
    model=lora_model,
    args=lora_sft_config,
    train_dataset=train_dataset_sft,
    eval_dataset=eval_dataset_sft,
    processing_class=tokenizer,
)

print("\nüöÄ Starting LoRA + SFT training...")
lora_sft_trainer.train()

print("üéâ LoRA + SFT training completed!")

lora_sft_trainer.save_model()
print(f"üíæ LoRA model saved to: {lora_sft_config.output_dir}")

## Save merged model

Merge the extra weights learned with LoRA back into the model to obtain a "normal" model checkpoint.

* Google ColabÁÑ°ÊñôÁâà„Å†„Å®„ÄÅ„Åì„Åì„ÅåÈùûÂ∏∏„Å´ÈÅÖ„ÅÑ„Åß„Åô„Éª„Éª„Éª„ÄÇ

In [None]:
print("\nüîÑ Merging LoRA weights...")
merged_model = lora_model.merge_and_unload()
merged_model.save_pretrained("./lfm2-lora-merged")
tokenizer.save_pretrained("./lfm2-lora-merged")
print("üíæ Merged model saved to: ./lfm2-lora-merged")

* „É¢„Éá„É´„ÅÆ„É≠„Éº„Éâ

In [None]:
# load merged model for inference
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
    "./lfm2-lora-merged",
    device_map="auto",
    dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained("./lfm2-lora-merged")

* „ÉÜ„Çπ„Éà„Çª„ÉÉ„Éà„Å´„Å§„ÅÑ„Å¶„ÄÅÁ≠î„Åà„ÇíÁîüÊàê„Åï„Åõ„Å¶„Åø„Åæ„Åô„ÄÇ

In [None]:
streamer = transformers.TextStreamer(tokenizer, skip_special_tokens=True)

for i in range(3):
    example = ds["test"][i]
    input_ids = tokenizer.apply_chat_template(
        example["messages"][0:-1],
        add_generation_prompt=True,
        return_tensors="pt",
        tokenize=True,
    ).to(model.device)
    output = model.generate(
        input_ids,
        do_sample=True,
        temperature=0.3,
        min_p=0.15,
        repetition_penalty=1.05,
        max_new_tokens=256,
        streamer=streamer,
    )
    print(f"Ê≠£Ëß£‰æã: {example['messages'][-1]['content']}\n")