## 模型微调

### 1. 环境准备

In [1]:
%%capture
import os

if "COLAB_" in "".join(os.environ.keys()):
    !pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
    !pip install --no-deps unsloth

### 2. 下载模型

In [2]:
from unsloth import FastLanguageModel
import torch

max_seq_length = 2048  # Choose any! We auto support RoPE Scaling internally!
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True  # Use 4bit quantization to reduce memory usage. Can be False.

qwen_models = [
    "unsloth/Qwen2.5-Coder-32B-Instruct",  # Qwen 2.5 Coder 2x faster
    "unsloth/Qwen2.5-Coder-7B",
    "unsloth/Qwen2.5-14B-Instruct",  # 14B fits in a 16GB card
    "unsloth/Qwen2.5-7B",
    "unsloth/Qwen2.5-7B-Instruct",
    "unsloth/Qwen2.5-7B-Instruct-unsloth-bnb-4bit",
    "unsloth/Qwen2.5-72B-Instruct",  # 72B fits in a 48GB card
]

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen2.5-7B-Instruct",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit,
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Unsloth: Failed to patch Gemma3ForConditionalGeneration.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.19: Fast Qwen2 patching. Transformers: 4.51.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/112k [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.99G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.16G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/271 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/7.36k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/2.78M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/1.67M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/605 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

### 3. 配置 LoRA 微调参数

设置 PEFT 高效微调参数，使用 Unsloth 默认参数。

In [3]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,  # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj", ],
    lora_alpha=16,
    lora_dropout=0,  # Supports any, but = 0 is optimized
    bias="none",  # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing="unsloth",  # True or "unsloth" for very long context
    random_state=3407,
    use_rslora=False,  # We support rank stabilized LoRA
    loftq_config=None,  # And LoftQ
)

Unsloth 2025.3.19 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


### 4. 处理微调数据集

获取聊天模板

In [4]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template="qwen-2.5",
)


def formatting_prompts_func(examples):
    convos = examples["conversations"]
    texts = [tokenizer.apply_chat_template(convo, tokenize=False, add_generation_prompt=False) for convo in convos]
    return {"text": texts, }

加载微调数据集

In [5]:
from datasets import load_dataset

dataset = load_dataset("MoChenYa/code-nomist-llm-dataset", name="default", split="train")

README.md:   0%|          | 0.00/436 [00:00<?, ?B/s]

nomist_dataset.csv:   0%|          | 0.00/482k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1286 [00:00<?, ? examples/s]

查看原始数据集的行列数

In [6]:
dataset.shape

(1286, 2)

将数据集修改为多轮聊天的格式

In [7]:
system_prompt_content = "请充当一个代码命名助手，请根据用户给出的项目信息和具体需求生成多个命名建议，名称之间使用 | 分隔，注意不要生成其他任何内容。"


def formatting_dateset_table2conv_func(examples):
    system_prompt = {
        "role": "system",
        "content": system_prompt_content,
    }
    user_messages = {
        "role": "user",
        "content": examples["question"],
    }
    assistant_messages = {
        "role": "assistant",
        "content": examples["answer"],
    }
    conversations = [system_prompt, user_messages, assistant_messages]
    return {"conversations": conversations}


dataset = dataset.map(formatting_dateset_table2conv_func)

Map:   0%|          | 0/1286 [00:00<?, ? examples/s]

将聊天模板应用到数据集中

In [8]:
from unsloth.chat_templates import standardize_sharegpt

dataset = standardize_sharegpt(dataset)
dataset = dataset.map(formatting_prompts_func, batched=True, )

Unsloth: Standardizing formats (num_proc=8):   0%|          | 0/1286 [00:00<?, ? examples/s]

Map:   0%|          | 0/1286 [00:00<?, ? examples/s]

查看首个数据检查是否成功格式化

In [9]:
dataset[0]

{'question': '项目类型：销售管理系统；项目介绍：跟踪销售活动，提高销售业绩和预测准确性。；当前模块：销售预测模块；目标名称类型：函数名；格式化类型：驼峰命名（首字母小写）；目标描述：获取销售预测数据；生成数量：5；',
 'answer': 'getSalesForecast|fetchSalesPrediction|retrieveForecastData|obtainSalesForecast|acquirePredictionData',
 'conversations': [{'content': '请充当一个代码命名助手，请根据用户给出的项目信息和具体需求生成多个命名建议，名称之间使用 | 分隔，注意不要生成其他任何内容。',
   'role': 'system'},
  {'content': '项目类型：销售管理系统；项目介绍：跟踪销售活动，提高销售业绩和预测准确性。；当前模块：销售预测模块；目标名称类型：函数名；格式化类型：驼峰命名（首字母小写）；目标描述：获取销售预测数据；生成数量：5；',
   'role': 'user'},
  {'content': 'getSalesForecast|fetchSalesPrediction|retrieveForecastData|obtainSalesForecast|acquirePredictionData',
   'role': 'assistant'}],
 'text': '<|im_start|>system\n请充当一个代码命名助手，请根据用户给出的项目信息和具体需求生成多个命名建议，名称之间使用 | 分隔，注意不要生成其他任何内容。<|im_end|>\n<|im_start|>user\n项目类型：销售管理系统；项目介绍：跟踪销售活动，提高销售业绩和预测准确性。；当前模块：销售预测模块；目标名称类型：函数名；格式化类型：驼峰命名（首字母小写）；目标描述：获取销售预测数据；生成数量：5；<|im_end|>\n<|im_start|>assistant\ngetSalesForecast|fetchSalesPrediction|retrieveForecastData|obtainSalesForecast|acquirePredictionData<|im_end

### 5. 训练模型

获取 Huggingface 的 SFT 训练器

In [10]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
    dataset_num_proc=1,
    packing=False,  # Can make training 5x faster for short sequences.
    args=TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,  # Fixed major bug in latest Unsloth
        warmup_steps=5,
        # num_train_epochs = 1, # Set this for 1 full training run.
        max_steps=30,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="paged_adamw_8bit",  # Save more memory
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",  # Use this for WandB etc
    ),
)

Unsloth: Tokenizing ["text"]:   0%|          | 0/1286 [00:00<?, ? examples/s]

设置只在响应内容上进行损失计算

In [11]:
from unsloth.chat_templates import train_on_responses_only

trainer = train_on_responses_only(
    trainer,
    instruction_part="<|im_start|>user\n",
    response_part="<|im_start|>assistant\n",
)

Map (num_proc=8):   0%|          | 0/1286 [00:00<?, ? examples/s]

查看设置效果

In [12]:
tokenizer.decode(trainer.train_dataset[0]["input_ids"])

'<|im_start|>system\n请充当一个代码命名助手，请根据用户给出的项目信息和具体需求生成多个命名建议，名称之间使用 | 分隔，注意不要生成其他任何内容。<|im_end|>\n<|im_start|>user\n项目类型：销售管理系统；项目介绍：跟踪销售活动，提高销售业绩和预测准确性。；当前模块：销售预测模块；目标名称类型：函数名；格式化类型：驼峰命名（首字母小写）；目标描述：获取销售预测数据；生成数量：5；<|im_end|>\n<|im_start|>assistant\ngetSalesForecast|fetchSalesPrediction|retrieveForecastData|obtainSalesForecast|acquirePredictionData<|im_end|>\n'

In [13]:
space = tokenizer(" ", add_special_tokens=False).input_ids[0]
tokenizer.decode([space if x == -100 else x for x in trainer.train_dataset[0]["labels"]])

'                                                                                                              getSalesForecast|fetchSalesPrediction|retrieveForecastData|obtainSalesForecast|acquirePredictionData<|im_end|>\n'

开始训练

In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,286 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)


Step,Training Loss
1,0.9713
2,1.8048
3,1.5588
4,1.3394
5,1.0557
6,1.4145
7,1.4825
8,1.0985
9,0.9754
10,0.755


### 6. 测试模型

#### 6.1. 读取已保存的适配器（可选）

In [None]:
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "lora_model", # YOUR MODEL YOU USED FOR TRAINING
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

#### 6.2. 定义消息生成函数

In [15]:
def create_user_content(name, introduce, module, target_type, format_type, target_desc, num):
    return f"项目类型：{name}；项目介绍：{introduce}；当前模块：{module}；目标名称类型：{target_type}；格式化类型：{format_type}；目标描述：{target_desc}；生成数量：{num}；"

In [20]:
from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(
    tokenizer,
    chat_template="qwen-2.5",
)
FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

def test_message(**args):
    messages = [
        {"role": "system", "content": system_prompt_content},
        {"role": "user", "content": create_user_content(**args)},
    ]
    inputs = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True,  # Must add for generation
        return_tensors="pt",
    ).to("cuda")

    outputs = model.generate(input_ids=inputs, max_new_tokens=64, use_cache=True, temperature=1.7, min_p=0.1)
    print(tokenizer.batch_decode(outputs))

#### 6.3. 测试用例

In [24]:
test_message(
    name="在线教育平台",
    introduce="在线教育平台是一个提供在线学习和教学服务的网站或应用程序，用户可以通过它访问各种课程、学习资源和教师支持。",
    module="课程管理",
    target_type="类名",
    format_type="驼峰命名（首字母大写）",
    target_desc="课程目录实体类 + DO",
    num=5,
)

['<|im_start|>system\n请充当一个代码命名助手，请根据用户给出的项目信息和具体需求生成多个命名建议，名称之间使用 | 分隔，注意不要生成其他任何内容。<|im_end|>\n<|im_start|>user\n项目类型：在线教育平台；项目介绍：在线教育平台是一个提供在线学习和教学服务的网站或应用程序，用户可以通过它访问各种课程、学习资源和教师支持。；当前模块：课程管理；目标名称类型：类名；格式化类型：驼峰命名（首字母大写）；目标描述：课程目录实体类 + DO；生成数量：5；<|im_end|>\n<|im_start|>assistant\nCourseDirectoryEntity|CourseDirectoryDO|CourseDirEntity|CourseDirDO|CourseDirInfoDO<|im_end|>']


### 7. 保存模型

#### 7.1. 配置参数

In [25]:
new_model_name = "CodeNomist-Qwen2.5-7B-Instruct-unsloth"
hf_repo = "MoChenYa/CodeNomist-Qwen2.5-7B-Instruct-unsloth"
hf_token = "hf_xxx"

#### 7.2. 保存适配器

In [26]:
if "COLAB_" in "".join(os.environ.keys()):
    model.push_to_hub(hf_repo, token = hf_token) # Online saving
    tokenizer.push_to_hub(hf_repo, token = hf_token) # Online saving
else:
    model.save_pretrained(new_model_name)
    tokenizer.save_pretrained(new_model_name)

README.md:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/162M [00:00<?, ?B/s]

Saved model to https://huggingface.co/MoChenYa/CodeNomist-Qwen2.5-7B-Instruct-unsloth


README.md:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

  0%|          | 0/1 [00:00<?, ?it/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

#### 7.3. 合并适配器保存到本地

In [27]:
model.save_pretrained_merged(new_model_name, tokenizer, save_method = "merged_16bit")

if "COLAB_" in "".join(os.environ.keys()):
    model.push_to_hub_merged(hf_repo, tokenizer, save_method = "merged_16bit", token = hf_token)

Unsloth: Kaggle/Colab has limited disk space. We need to delete the downloaded
model which will save 4-16GB of disk space, allowing you to save on Kaggle/Colab.
Unsloth: Will remove a cached repo with size 7.1G


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 30.74 out of 50.99 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


 25%|██▌       | 7/28 [00:00<00:01, 12.84it/s]
We will save to Disk and not RAM now.
100%|██████████| 28/28 [00:26<00:00,  1.07it/s]


Unsloth: Saving tokenizer... Done.
Done.


Unsloth: You are pushing to hub, but you passed your HF username = MoChenYa.
We shall truncate MoChenYa/CodeNomist-Qwen2.5-7B-Instruct-unsloth to CodeNomist-Qwen2.5-7B-Instruct-unsloth


Unsloth: Merging 4bit and LoRA weights to 16bit...
Unsloth: Will use up to 30.74 out of 50.99 RAM for saving.
Unsloth: Saving model... This might take 5 minutes ...


100%|██████████| 28/28 [00:25<00:00,  1.10it/s]


Unsloth: Saving tokenizer... Done.


  0%|          | 0/4 [00:00<?, ?it/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.33G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.09G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.93G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.88G [00:00<?, ?B/s]

Done.
Saved merged model to https://huggingface.co/MoChenYa/CodeNomist-Qwen2.5-7B-Instruct-unsloth


#### 7.4. 保存或推送gguf格式

In [None]:
if "COLAB_" in "".join(os.environ.keys()):
    model.push_to_hub_gguf(
        hf_repo,
        tokenizer,
        quantization_method = ["f16", "q8_0"],
        token = hf_token,
    )
else:
    model.save_pretrained_gguf(new_model_name, tokenizer)