In [None]:
### 环境安装
# 在Google Colab环境中运行时的特殊安装流程
# 首先安装所有依赖库,但不处理他们的依赖关系(--no-deps参数)
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl==0.15.2 triton cut_cross_entropy unsloth_zoo
# 安装常用自然语言处理和模型托管工具
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
# 最后安装unsloth库本身,不处理依赖(避免版本冲突)
!pip install --no-deps unsloth



In [None]:
from unsloth import FastLanguageModel
import torch
model,tokenizer = FastLanguageModel.from_pretrained(
    model_name= "unsloth/Qwen3-8B",
    max_seq_length = 2048, # 控制上下文长度
    load_in_4bit = True # 启用4位量化,减少微调时内存使用量至原来的1/4.适用于16GB GPU
)


Please restructure your imports with 'import unsloth' at the top of your file.
  from unsloth import FastLanguageModel


🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.5.6: Fast Qwen3 patching. Transformers: 4.51.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [None]:
# 添加LoRA适配器
# 通过LogRA技术,只需要更新1-10%的参数即可实现有效微调
model =FastLanguageModel.get_peft_model(
    model,
    r = 32, # LoRA秩,建议值位8,16,32,64,128
    target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
    lora_alpha=32,#LoRA alpha值,建议设为rank或rank*2
    lora_dropout=0,
    bias="none", # 偏置设置,none已优化
    use_gradient_checkpointing="unsloth",#梯度检查点,用于长上下文
    random_state = 3407,#随机种子
    use_rslora=False,# 是否使用rank stabilized LoRa,
    loftq_config = None,#LoftQ配置
)

Unsloth 2025.5.6 patched 36 layers with 36 QKV layers, 36 O layers and 36 MLP layers.


## 数据准备
Qwen3同时具有推理和非推理模式,因此需要两种数据集
1. OpenMathReasoning数据集-用于数学推理能力
2. FineTome-100k数据集-用于一般对话能力

In [None]:
!pip install -U datasets



In [None]:
from datasets import load_dataset
# 加载数学推理数据集
reasoning_dataset = load_dataset("unsloth/OpenMathReasoning-mini",split = "cot")
# 加载对话数据集
non_reasoning_dataset = load_dataset("mlabonne/FineTome-100k",split="train")

In [None]:
# 将推理数据集转换为对话格式
# 将数学问题和解决方法转换为用户-助手对话格式
# 参数:
#   examples:批量样本,包含问题和解决方案
# 返回:
#   包含对话格式的字典
def generate_conbersation(examples):
  problems = examples["problem"]
  solutions=examples["generated_solution"]
  conversations=[]
  for problem,solution in zip(problems,solutions):
    conversations.append([
        {"role":"user","content":problem},
        {"role":"assistant","content":solution},
    ])
  return {"conversations":conversations}

In [None]:
# 将转换后的推理数据集应用对话模板
reasoning_conversations =tokenizer.apply_chat_template(
    reasoning_dataset.map(generate_conbersation,batched = True)["conversations"],
    tokenize = False,# 不进行分层,仅应用模板
)

In [None]:
# 查看一个样本
reasoning_conversations[0]

"<|im_start|>user\nGiven $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.<|im_end|>\n<|im_start|>assistant\n<think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² 

In [None]:
# 处理飞推理数据集，转换为标准对话模式
from unsloth.chat_templates import standardize_sharegpt
dataset = standardize_sharegpt(non_reasoning_dataset)
# 将标准化后的飞推理数据集应用对话模板
non_reasoning_conversations=tokenizer.apply_chat_template(
    dataset["conversations"],
    tokenize=False,
)

In [None]:
# 查看转换后的第一个飞推理样本
non_reasoning_conversations[0]

'<|im_start|>user\nExplain what boolean operators are, what they do, and provide examples of how they can be used in programming. Additionally, describe the concept of operator precedence and provide examples of how it affects the evaluation of boolean expressions. Discuss the difference between short-circuit evaluation and normal evaluation in boolean expressions and demonstrate their usage in code. \n\nFurthermore, add the requirement that the code must be written in a language that does not support short-circuit evaluation natively, forcing the test taker to implement their own logic for short-circuit evaluation.\n\nFinally, delve into the concept of truthiness and falsiness in programming languages, explaining how it affects the evaluation of boolean expressions. Add the constraint that the test taker must write code that handles cases where truthiness and falsiness are implemented differently across different programming languages.<|im_end|>\n<|im_start|>assistant\n<think>\n\n</th

In [None]:
# 查看两个数据集大小
print(len(reasoning_conversations))
print(len(non_reasoning_conversations))

19252
100000


In [None]:
# 设置聊天数据比例
# 让模型保持25%推理能力,75%聊天能力
chat_percentage = 0.75

In [None]:
# 从非推理数据集中抽样,抽样数量为推理数据集的25%
import pandas as pd
non_reasoning_subset = pd.Series(non_reasoning_conversations)
non_reasoning_subset = non_reasoning_subset.sample(
    int(len(reasoning_conversations)*(1.0-chat_percentage)),# 采样大小,推理数据集的75%
    random_state = 2407,
)


In [None]:
# 合并两个数据集
data = pd.concat([
    pd.Series(reasoning_conversations),
    pd.Series(non_reasoning_subset)
    ])
data.name = "text"#设置数据列名为text

In [None]:
# 将合并后的数据转换为Huggingface Dataset格式
from datasets import Dataset
combained_dataset = Dataset.from_pandas(pd.DataFrame(data))
# 随机打乱数据
combained_dataset = combained_dataset.shuffle(seed = 3407)

In [None]:
# 查看数据集基本信息
print(combained_dataset)

Dataset({
    features: ['text', '__index_level_0__'],
    num_rows: 24065
})


In [None]:
# 使用dataframe展示前10条记录
import pandas as pd
df = pd.DataFrame(combained_dataset[:10])
display(df)

Unnamed: 0,text,__index_level_0__
0,<|im_start|>user\nCalculate the pH during a ti...,49038
1,<|im_start|>user\nFind the remainder when \(9 ...,17982
2,<|im_start|>user\nDetermine the surface area o...,18456
3,<|im_start|>user\nAn isosceles right triangle ...,57138
4,<|im_start|>user\nUse the Residue theorem to e...,10703
5,<|im_start|>user\nFind the minimum value of \(...,16248
6,<|im_start|>user\nFind \(\lim_{n\to+\infty}\in...,1475
7,<|im_start|>user\nWhat is the most formal defi...,57551
8,<|im_start|>user\nWhat is the greatest integer...,1226
9,<|im_start|>user\nLet $f(n)$ denote the n-th i...,16692


In [None]:
# 使用Huggingface TRL的SFTTrainer进行训练
from trl import SFTTrainer,SFTConfig
trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset=combained_dataset,
    eval_dataset=None,
    args=SFTConfig(
        dataset_text_field="text",# 指定数据集中的文本字段
        per_device_train_batch_size=2,# 每个设备的训练批次大小
        gradient_accumulation_steps=4,#使用梯度累积模拟更大批次大小
        warmup_steps=30,#预热步数
# num_train_epochs=1,#设置为1以进行完整训练
        max_steps=30,
        learning_rate=2e-4,#学习率(产期训练可降至2e-5)
        logging_steps=1,#日志记录间隔
        optim="adamw_8bit",#优化器
        weight_decay=0.01,#权重衰减
        lr_scheduler_type="linear",#学习率调度器
        seed=3407,#随机种子
        report_to="none",#可设置为"wando"等进行实验追踪
    ),

)

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/24065 [00:00<?, ? examples/s]

In [None]:
# 开始训练模型
# 要回复训练,可设置resume_from_checkpoint=True
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 24,065 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 87,293,952/8,000,000,000 (1.09% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,0.5721
2,0.6485
3,0.7932
4,0.6766
5,0.5851
6,0.6085
7,0.6124
8,0.5893
9,0.5532
10,0.6603


In [None]:
# 模型推理
# 使用Unsloth原生推理功能测试模型
# 根据Qwen-3团队建议:
# -推理模式:temperature=0.6,top_p=0.95,top_k=20
# -p普通聊天模式:temperature=0.7,top_p=0.8,top_k=20

#测试没有启用thinking模式的普通对话
messages=[
    {"role":"user","content":"Solve (x+2)^2=0."}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,# 必须添加生成提示
    enable_thinking=False ,#禁用thinking模式
)


In [None]:
# 使用普通对话参数进行文本生成
from transformers import TextStreamer
_ = model.generate(
    **tokenizer(
        text,
        return_tensors="pt",
    ).to("cuda"),
    max_new_tokens = 256, #增加已获得更长输出
    streamer=TextStreamer(tokenizer,skip_prompt=True),
)

To solve the equation \((x + 2)^2 = 0\), follow these steps:

1. Recognize that \((x + 2)^2 = 0\) is a quadratic equation.
2. Take the square root of both sides to simplify the equation:
   \[
   x + 2 = \pm\sqrt{0}
   \]
3. Since the square root of 0 is 0, the equation simplifies to:
   \[
   x + 2 = 0
   \]
4. Solve for \(x\) by subtracting 2 from both sides:
   \[
   x = -2
   \]

Thus, the solution to the equation \((x + 2)^2 = 0\) is \(\boxed{-2}\).<|im_end|>


In [None]:
# 启用thinking莫斯的推理对话
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True,# 必须添加生成提示
    enable_thinking=True ,#禁用thinking模式
)
from transformers import TextStreamer
_ = model.generate(
    **tokenizer(
        text,
        return_tensors="pt",
    ).to("cuda"),
    max_new_tokens = 256*10, #增加已获得更长输出
    streamer=TextStreamer(tokenizer,skip_prompt=True),
)

<think>
Okay, let's see. I need to solve the equation (x + 2)^2 = 0. Hmm, so the equation is (x + 2) squared equals zero. Alright, how do I approach this?

First, I remember that when you have something squared equal to zero, that means the inside of the square must be zero because anything squared is zero only if the original number is zero. So, if (x + 2)^2 = 0, then x + 2 must be equal to zero. That makes sense because if you square any number, the only way the result is zero is if the number itself is zero.

So, setting the inside equal to zero: x + 2 = 0. Then, to solve for x, I just need to subtract 2 from both sides. That gives x = -2. Wait, is there another solution? Hmm, no, because if you have a quadratic equation, usually you can have two solutions, but in this case, since it's a square, both solutions are the same. So, x = -2 is the only solution, but it's a repeated root.

Let me double-check. If I plug x = -2 back into the original equation, it should satisfy (x + 2)^2 = 