LLM Classification Finetuning 分類微調
-
**任務類型：分類任務（A / B / Tie）**  
**目標：訓練一個新的模型，去模仿人類怎麼判斷偏好。**

* 模型輸入：Prompt + Response A + Response B
* 模型輸出：判斷 A / B / tie，誰的回應較好


unsloth+llama-3-8b的錯誤
-

In [None]:
!pip uninstall -y torch torchvision torchaudio
!pip install torch==2.6.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
!pip install -U unsloth unsloth_zoo bitsandbytes
!pip install --upgrade "transformers>=4.53.0"


In [None]:
# ❗不要在這段前面 import transformers 或其他 model
from unsloth import FastLanguageModel

FastLanguageModel.PATCHING_MODE = "autograd"  # 🔥 一定要在 import 任何 transformers 前設定

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/llama-3-8b-Instruct-bnb-4bit",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

FastLanguageModel.for_training(model)

# ✅ 檢查
print(type(model))                 # ⬅️ 這裡應該要變成 FastLlamaForCausalLM
print(hasattr(model, "fit"))      # ⬅️ 應該要 True


所以不能用fit()

deberta-v3-base
-
Trainer + TrainingArguments

In [None]:
#deberta-v3-base
!pip install -q transformers datasets accelerate  

import os
os.environ["WANDB_DISABLED"] = "true"  

import pandas as pd
from datasets import Dataset
from sklearn.preprocessing import LabelEncoder

# 讀取資料
train = pd.read_csv("/kaggle/input/llm-classification-finetuning/train.csv")

# 整理 label
def get_label(row):
    if row["winner_model_a"] == 1:
        return "A"
    elif row["winner_model_b"] == 1:
        return "B"
    else:
        return "tie"

train["label"] = train.apply(get_label, axis=1)

# 組合成單一輸入欄位（問句 + 回答 A + 回答 B）
def build_input(prompt, a, b):
    return f"[PROMPT]: {prompt}\n\n[RESPONSE A]: {a}\n\n[RESPONSE B]: {b}"

train["text"] = train.apply(lambda x: build_input(x["prompt"], x["response_a"], x["response_b"]), axis=1)

# 去掉缺漏
train = train[["text", "label"]].dropna().reset_index(drop=True)

# Label 編碼
le = LabelEncoder()
train["label_id"] = le.fit_transform(train["label"])

# 轉為 Dataset 格式
dataset = Dataset.from_pandas(train).train_test_split(test_size=0.05, seed=42)  from transformers import AutoTokenizer

model_name = "microsoft/deberta-v3-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Tokenization function
def tokenize(example):
    return tokenizer(
        example["text"],
        padding="max_length",
        truncation=True,
        max_length=512
    )

tokenized_dataset = dataset.map(tokenize, batched=True)
tokenized_dataset = tokenized_dataset.remove_columns(["text", "label"])
tokenized_dataset = tokenized_dataset.rename_column("label_id", "labels")
tokenized_dataset.set_format("torch")  # 清理前次產物
!rm -rf ./results
!rm -rf ./logs

# metrics function
from sklearn.metrics import accuracy_score, f1_score


def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    acc = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions, average="macro")  # or 'weighted'
    return {
        "accuracy": acc,
        "f1": f1,
    }

    
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)

small_train_dataset = tokenized_dataset["train"].select(range(3000))

training_args = TrainingArguments(
    output_dir="./results",
    save_total_limit=1,
    per_device_train_batch_size=8,      # ✅ 大 batch（需配合顯卡）
    per_device_eval_batch_size=8,
    num_train_epochs=8,
    learning_rate=3e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.06,
    weight_decay=0.01,
    logging_steps=50,
    save_steps=2000,
    do_train=True,
    do_eval=True,
    logging_dir="./logs",
    fp16=True,
)


trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train_dataset,
    eval_dataset=tokenized_dataset["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,  # ✅ 加上這個

)  # 開始訓練
trainer.train()

# 訓練結束後評估
metrics = trainer.evaluate()
print("📊 評估指標：", metrics)  model.save_pretrained("/kaggle/working/deberta-v3-pref-model")
tokenizer.save_pretrained("/kaggle/working/deberta-v3-pref-model")  test_text = "[PROMPT]: 今天天氣如何？\n\n[RESPONSE A]: 很好！\n\n[RESPONSE B]: 我不確定。"

# 編碼輸入
inputs = tokenizer(test_text, return_tensors="pt", padding=True, truncation=True, max_length=512)
inputs = {k: v.to(model.device) for k, v in inputs.items()}  # ⬅️ 搬到正確裝置（GPU）

# 模型預測
outputs = model(**inputs)
pred = outputs.logits.argmax(dim=1).item()

# 解碼為 A / B / tie
print("預測結果：", le.inverse_transform([pred])[0]) 

##  模型訓練紀錄（使用 3000 筆訓練資料）

> 備註：全部訓練皆使用相同的 3000 筆訓練資料。訓練時間較長。

---

### Epoch 1
- Loss 約在 **1.10 ~ 1.15** 間浮動  
- 準確率：約 **38.4%**

---

### Epoch 3
- 初始 Loss 約為 **1.10**
- 全程 Loss 維持在 **1.07 ~ 1.15** 間，有波動但無明顯下降  
- 模型有在學習，但尚未出現穩定下降趨勢

---

### Epoch 5
- 前半段 Loss 多數在 **1.10** 左右  
- 自 step **1200** 開始，Loss 出現明顯下降趨勢  
- 最後 Loss 落在 **0.86 ~ 0.95** 區間

**驗證指標：**
- `eval_accuracy`: **37.6%**
- `eval_f1 (macro)`: **0.370**

> Loss 有收斂趨勢，模型開始學會區分 A/B  
> 有輕微過擬合跡象

---

### Epoch 10
- Loss 從初始 **1.10** 緩慢下降至 **1.02 ~ 0.99**

**驗證指標：**
- `eval_loss`: **1.119**
- `eval_accuracy`: **37.2%**
- `eval_f1 (macro)`: **0.370**

> Loss 穩定下降，無明顯過擬合現象


TinyLlama
---

In [None]:
# 🚩 1. 初始化 GPU 環境 & 登入 Hugging Face
import torch
torch.cuda.empty_cache()
torch.cuda.ipc_collect()

import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["WANDB_DISABLED"] = "true"

from kaggle_secrets import UserSecretsClient
from huggingface_hub import login
user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("kaggle-llama-token")
login(token=hf_token)

# 🚩 2. 讀取資料集
import pandas as pd
from datasets import Dataset
from sklearn.preprocessing import LabelEncoder

train = pd.read_csv("/kaggle/input/llm-classification-finetuning/train.csv")

def get_label(row):
    if row["winner_model_a"] == 1:
        return "A"
    elif row["winner_model_b"] == 1:
        return "B"
    else:
        return "tie"

train["label"] = train.apply(get_label, axis=1)

def build_input(prompt, a, b):
    return f"[PROMPT]: {prompt}\n\n[RESPONSE A]: {a}\n\n[RESPONSE B]: {b}"

train["text"] = train.apply(lambda x: build_input(x["prompt"], x["response_a"], x["response_b"]), axis=1)
train = train[["text", "label"]].dropna().reset_index(drop=True)

le = LabelEncoder()
train["label_id"] = le.fit_transform(train["label"])

dataset = Dataset.from_pandas(train).train_test_split(test_size=0.05, seed=42)

# 🚩 3. 載入 tokenizer 和 model，修正 pad token 問題
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token           # ✅ 設定 pad token
tokenizer.padding_side = "right"                    # ✅ 對齊方向很重要

model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
model.config.pad_token_id = tokenizer.pad_token_id  # ✅ 明確指定
model.gradient_checkpointing_enable()  # ✅ 減少記憶體用量

# 🚩 4. 處理資料
def preprocess_function(example):
    return tokenizer(
        example["text"],
        truncation=True,
        padding='max_length',
        max_length=128
    )

tokenized_dataset = dataset.map(preprocess_function)
tokenized_dataset = tokenized_dataset.remove_columns(["text", "label"])
tokenized_dataset = tokenized_dataset.rename_column("label_id", "labels")
tokenized_dataset.set_format("torch")

# 🚩 5. 評估 metrics
from sklearn.metrics import accuracy_score, f1_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = logits.argmax(axis=-1)
    acc = accuracy_score(labels, predictions)
    f1 = f1_score(labels, predictions, average="macro")
    return {
        "accuracy": acc,
        "f1": f1,
    }

# 🚩 6. 設定訓練參數
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    save_total_limit=1,
    per_device_train_batch_size=1,
    per_device_eval_batch_size=1,
    gradient_accumulation_steps=16,
    num_train_epochs=3,
    learning_rate=3e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.06,
    weight_decay=0.01,
    logging_steps=50,
    save_steps=2000,
    do_train=True,
    do_eval=True,
    logging_dir="./logs",
    fp16=True,
    report_to="none",
)

# 🚩 7. 設定 Trainer
from transformers import Trainer, DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer, pad_to_multiple_of=8)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"].select(range(1000)),  # 為了快取用小 subset
    eval_dataset=tokenized_dataset["test"],
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

# 🚩 8. 開始訓練
trainer.train()

# 🚩 9. 評估結果
metrics = trainer.evaluate()
print("📊 評估結果：", metrics)


一直出現  
ValueError: Cannot handle batch sizes > 1 if no padding token is defined.

LoRA + TinyLlama
-----------

In [None]:
# ⚙️ 0. 設定環境
import os
os.environ["WANDB_DISABLED"] = "true"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

import torch
torch.cuda.empty_cache()
torch.cuda.ipc_collect()

# ✅ HuggingFace 登入
from kaggle_secrets import UserSecretsClient
from huggingface_hub import login

user_secrets = UserSecretsClient()
hf_token = user_secrets.get_secret("kaggle-llama-token")
login(token=hf_token)

# ✅ 安裝 peft（Colab 才需要）
# %pip install -q peft

# 📥 1. 載入資料與前處理
import pandas as pd
from datasets import Dataset
from sklearn.preprocessing import LabelEncoder

df = pd.read_csv("/kaggle/input/llm-classification-finetuning/train.csv")

def get_label(row):
    if row["winner_model_a"] == 1:
        return "A"
    elif row["winner_model_b"] == 1:
        return "B"
    else:
        return "tie"

df["label"] = df.apply(get_label, axis=1)
df["text"] = df.apply(lambda x: f"[PROMPT]: {x['prompt']}\n\n[RESPONSE A]: {x['response_a']}\n\n[RESPONSE B]: {x['response_b']}", axis=1)
df = df[["text", "label"]].dropna().reset_index(drop=True)

le = LabelEncoder()
df["label_id"] = le.fit_transform(df["label"])
dataset = Dataset.from_pandas(df).train_test_split(test_size=0.05, seed=42)

# 🧠 2. Tokenizer + Dataset
from transformers import AutoTokenizer

model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token

def preprocess(example):
    return tokenizer(example["text"], truncation=True, padding="max_length", max_length=256)

tokenized = dataset.map(preprocess)
tokenized = tokenized.remove_columns(["text", "label"])
tokenized = tokenized.rename_column("label_id", "labels")
tokenized.set_format("torch")

# 🔧 3. LoRA + Model 載入
from transformers import AutoModelForSequenceClassification
from peft import LoraConfig, get_peft_model, TaskType

base_model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=3)
base_model.config.pad_token_id = tokenizer.pad_token_id
base_model.gradient_checkpointing_enable()  # ✅ 減少記憶體

# ✅ LoRA 設定
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.1,
    bias="none",
    task_type=TaskType.SEQ_CLS
)

model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()

# 🧪 4. metrics & collator
from sklearn.metrics import accuracy_score, f1_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    preds = logits.argmax(axis=-1)
    return {
        "accuracy": accuracy_score(labels, preds),
        "f1": f1_score(labels, preds, average="macro"),
    }

from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, pad_to_multiple_of=8)

# 🏋️ 5. 訓練設定
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    save_total_limit=1,
    per_device_train_batch_size=1,       # ✅ LoRA 可用小 batch
    per_device_eval_batch_size=4,
    gradient_accumulation_steps=4,
    num_train_epochs=2,
    learning_rate=3e-5,
    warmup_ratio=0.06,
    weight_decay=0.01,
    logging_steps=50,
    save_steps=500,
    fp16=True,
    report_to="none",
)

# ✅ 減少訓練資料數量測試（你可取消這行用整份資料）
small_train = tokenized["train"].select(range(500))

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=small_train,
    eval_dataset=tokenized["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()

metrics = trainer.evaluate()
print("📊 評估指標：", metrics)

pred = trainer.predict(tokenized["test"].select([0]))
print(pred.predictions.argmax(axis=-1))  # 看模型預測 label（0, 1, 2）


Map:   0%|          | 0/54603 [00:00<?, ? examples/s]

Map:   0%|          | 0/2874 [00:00<?, ? examples/s]

2025-07-28 08:52:22.287918: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1753692742.306855    1252 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1753692742.312890    1252 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at TinyLlama/TinyLlama-1.1B-Chat-v1.0 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 1,132,544 || all params: 1,035,651,072 || trainable%: 0.1094


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


Step,Training Loss


100筆資料訓練：
-------------------------
* TrainOutput(global_step=26, training_loss=1.682946278498723, metrics={'train_runtime': 1894.004, 'train_samples_per_second': 0.106, 'train_steps_per_second': 0.014, 'total_flos': 298019350118400.0, 'train_loss': 1.682946278498723, 'epoch': 2.0}) 

* 評估指標： {'eval_loss': 1.6719685792922974, 'eval_accuracy': 0.3298538622129436, 'eval_f1': 0.32163270505624236, 'eval_runtime': 383.1717, 'eval_samples_per_second': 7.501, 'eval_steps_per_second': 0.94, 'epoch': 2.0} 

* [1]


時間：約30分鐘

500筆資料訓練：
-------------------------
時間：約2.5小時  
失敗原因：網路斷掉