In [None]:
### 环境安装
# 在Google Colab环境中运行时的特殊安装流程
# 首先安装所有依赖库,但不处理他们的依赖关系(--no-deps参数)
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
# 安装常用自然语言处理和模型托管工具
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
# 最后安装unsloth库本身,不处理依赖(避免版本冲突)
!pip install --no-deps unsloth



In [None]:
from unsloth import FastLanguageModel
import torch
model,tokenizer = FastLanguageModel.from_pretrained(
    model_name= "unsloth/Qwen3-0.6B",
    max_seq_length = 2048, # 控制上下文长度
    load_in_4bit = True # 启用4位量化,减少微调时内存使用量至原来的1/4.适用于16GB GPU
)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.3: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


In [None]:
# 添加LoRA适配器
# 通过LogRA技术,只需要更新1-10%的参数即可实现有效微调
model =FastLanguageModel.get_peft_model(
    model,
    r = 32, # LoRA秩,建议值位8,16,32,64,128
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=32,#LoRA alpha值,建议设为rank或rank*2
    lora_dropout=0,
    bias="none", # 偏置设置,none已优化
    use_gradient_checkpointing="unsloth",#梯度检查点,用于长上下文
    random_state = 3407,#随机种子
    use_rslora=False,# 是否使用rank stabilized LoRa,
    loftq_config = None,#LoftQ配置
)

Unsloth 2025.5.3 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


## 数据准备
Qwen3同时具有推理和非推理模式,因此需要两种数据集
1. OpenMathReasoning数据集-用于数学推理能力
2. FineTome-100k数据集-用于一般对话能力

In [None]:
!pip install -U datasets



In [None]:
from datasets import load_dataset
# 加载数学推理数据集
reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini",split = "cot")
# 加载对话数据集
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k",split="train")

In [None]:
# 将推理数据集转换为对话格式
# 将数学问题和解决方法转换为用户-助手对话格式
# 参数:
#   examples:批量样本,包含问题和解决方案
# 返回:
#   包含对话格式的字典
def generate_conbersation(examples):
  problems = examples["problem"]
  solutions=examples["generated_solution"]
  conversations=[]
  for problem,solution in zip(problems,solutions):
    conversations.append([
        {"role":"user","content":problem},
        {"role":"assistant","content":solution},
    ])
  return {"conversations":conversations}

In [None]:
# 将转换后的推理数据集应用对话模板
reasoning_conversations =tokenizer.apply_chat_template(
    reasoning_dataset.map(generate_conbersation,batched = True)["conversations"],
    tokenize = False,# 不进行分层,仅应用模板
)

Map:   0%|          | 0/19252 [00:00<?, ? examples/s]

In [None]:
# 查看一个样本
reasoning_conversations[0]

"<|im_start|>user\nGiven $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.<|im_end|>\n<|im_start|>assistant\n<think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² 

In [None]:
# 处理飞推理数据集，转换为标准对话模式
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)
# 将标准化后的飞推理数据集应用对话模板
non_reasoning_conversations=tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize=False,
)

Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/100000 [00:00<?, ? examples/s]

In [None]:
# 查看转换后的第一个飞推理样本
non_reasoning_conversations[0]

'<|im_start|>user\nExplain what boolean operators are, what they do, and provide examples of how they can be used in programming. Additionally, describe the concept of operator precedence and provide examples of how it affects the evaluation of boolean expressions. Discuss the difference between short-circuit evaluation and normal evaluation in boolean expressions and demonstrate their usage in code. \n\nFurthermore, add the requirement that the code must be written in a language that does not support short-circuit evaluation natively, forcing the test taker to implement their own logic for short-circuit evaluation.\n\nFinally, delve into the concept of truthiness and falsiness in programming languages, explaining how it affects the evaluation of boolean expressions. Add the constraint that the test taker must write code that handles cases where truthiness and falsiness are implemented differently across different programming languages.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</th

In [None]:
# 查看两个数据集大小
print(len(reasoning_conversations))
print(len(non_reasoning_conversations))

19252
100000


In [None]:
# 设置聊天数据比例
# 让模型保持25%推理能力,75%聊天能力
chat_percentage = 0.75

In [None]:
# 从飞推理数据集中抽样,抽样数量为推理数据集的25%