# PEFT 库 LoRA 实战 - OPT-6.7B

在本教程中，我们将介绍如何使用最新的 `peft` 库和 `bitsandbytes` 来以 8-bits 加载大语言模型，并对其进行高效微调。

微调方法将依赖于一种名为“低秩适配器”（LoRA）的方法，与其微调整个模型，您只需要微调这些适配器（Adapter）并在模型中正确加载它们。

### 加载模型

`Facebook opt-6.7b` 模型，半精度（float16）模型权重大约需要13GB左右显存。

下面我们以8-bits 加载它，只需要大约7GB左右显存。

In [1]:
import os

import torch
import torch.nn as nn
import bitsandbytes as bnb
from transformers import GPT2Tokenizer, AutoConfig, OPTForCausalLM

model_id = "facebook/opt-6.7b"

model = OPTForCausalLM.from_pretrained(model_id, load_in_8bit=True)

tokenizer = GPT2Tokenizer.from_pretrained(model_id)

  from .autonotebook import tqdm as notebook_tqdm
config.json: 100%|██████████| 651/651 [00:00<00:00, 4.89MB/s]
pytorch_model.bin.index.json: 100%|██████████| 41.9k/41.9k [00:00<00:00, 107MB/s]
Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]
pytorch_model-00001-of-00002.bin:   0%|          | 0.00/9.96G [00:00<?, ?B/s][A
pytorch_model-00001-of-00002.bin:   0%|          | 41.9M/9.96G [00:00<00:24, 401MB/s][A
pytorch_model-00001-of-00002.bin:   1%|          | 94.4M/9.96G [00:00<00:23, 421MB/s][A
pytorch_model-00001-of-00002.bin:   1%|▏         | 147M/9.96G [00:00<00:32, 304MB/s] [A
pytorch_model-00001-of-00002.bin:   2%|▏         | 189M/9.96G [00:00<00:29, 331MB/s][A
pytorch_model-00001-of-00002.bin:   2%|▏         | 231M/9.96G [00:00<00:28, 341MB/s][A
pytorch_model-00001-of-00002.bin:   3%|▎         | 283M/9.96G [00:00<00:25, 385MB/s][A
pytorch_model-00001-of-00002.bin:   3%|▎         | 325M/9.96G [00:00<00:25, 385MB/s][A
pytorch_model-00001-of-00002.bin:   4%|▎        

### PEFT 微调前的模型处理

在使用 `peft` 训练 int8 模型之前，需要进行一些预处理：
- 将所有非 `int8` 模块转换为全精度（`fp32`）以保证稳定性
- 为输入嵌入层添加一个 `forward_hook`，以启用输入隐藏状态的梯度计算
- 启用梯度检查点以实现更高效的内存训练

使用 `peft` 库预定义的工具函数 `prepare_model_for_int8_training`，便可自动完成以上模型处理工作。

In [2]:
from peft import prepare_model_for_int8_training

model = prepare_model_for_int8_training(model)



In [3]:
# 获取当前模型占用的 GPU显存（差值为预留给 PyTorch 的显存）
memory_footprint_bytes = model.get_memory_footprint()
memory_footprint_mib = memory_footprint_bytes / (1024 ** 3)  # 转换为 GB

print(f"{memory_footprint_mib:.2f}GB")

6.80GB


In [4]:
model

OPTForCausalLM(
  (model): OPTModel(
    (decoder): OPTDecoder(
      (embed_tokens): Embedding(50272, 4096, padding_idx=1)
      (embed_positions): OPTLearnedPositionalEmbedding(2050, 4096)
      (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
      (layers): ModuleList(
        (0-31): 32 x OPTDecoderLayer(
          (self_attn): OPTAttention(
            (k_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
            (v_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
            (q_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
            (out_proj): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
          )
          (activation_fn): ReLU()
          (self_attn_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
          (fc1): Linear8bitLt(in_features=4096, out_features=16384, bias=True)
          (fc2): Linear8bitLt(in_features=16384, out_features=4096, bias=True

### LoRA Adapter 配置

在 `peft` 中使用`LoRA`非常简捷，借助 `PeftModel`抽象，我们可以快速使用低秩适配器（LoRA）到任意模型。

通过使用 `peft` 中的 `get_peft_model` 工具函数来实现。

#### 关于 LoRA 超参数的说明：
```
MatMul(B,A) * Scaling
Scaling = LoRA_Alpha / Rank
```

In [5]:
# 从peft库导入LoraConfig和get_peft_model函数
from peft import LoraConfig, get_peft_model

# 创建一个LoraConfig对象，用于设置LoRA（Low-Rank Adaptation）的配置参数
config = LoraConfig(
    r=8,  # LoRA的秩，影响LoRA矩阵的大小
    lora_alpha=32,  # LoRA适应的比例因子
    # 指定将LoRA应用到的模型模块，通常是attention和全连接层的投影
    target_modules = ["q_proj", "k_proj", "v_proj", "out_proj", "fc_in", "fc_out"],
    lora_dropout=0.05,  # 在LoRA模块中使用的dropout率
    bias="none",  # 设置bias的使用方式，这里没有使用bias
    task_type="CAUSAL_LM"  # 任务类型，这里设置为因果(自回归）语言模型
)

# 使用get_peft_model函数和给定的配置来获取一个PEFT模型
model = get_peft_model(model, config)

# 打印出模型中可训练的参数
model.print_trainable_parameters()

trainable params: 8,388,608 || all params: 6,666,862,592 || trainable%: 0.12582542214183376


参考：打印待训练模型参数的实现逻辑
```python
def print_trainable_parameters(self,):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )
```

### 数据处理

In [6]:
from datasets import load_dataset

dataset = load_dataset("Abirate/english_quotes")

Downloading readme: 100%|██████████| 5.55k/5.55k [00:00<00:00, 27.5MB/s]
Downloading data: 100%|██████████| 647k/647k [00:00<00:00, 5.46MB/s]
Generating train split: 2508 examples [00:00, 55730.58 examples/s]


In [7]:
dataset["train"]

Dataset({
    features: ['quote', 'author', 'tags'],
    num_rows: 2508
})

In [8]:
from datasets import ClassLabel, Sequence
import random
import pandas as pd
from IPython.display import display, HTML

def show_random_elements(dataset, num_examples=10):
    assert num_examples <= len(dataset), "Can't pick more elements than there are in the dataset."
    picks = []
    for _ in range(num_examples):
        pick = random.randint(0, len(dataset)-1)
        while pick in picks:
            pick = random.randint(0, len(dataset)-1)
        picks.append(pick)
    
    df = pd.DataFrame(dataset[picks])
    for column, typ in dataset.features.items():
        if isinstance(typ, ClassLabel):
            df[column] = df[column].transform(lambda i: typ.names[i])
        elif isinstance(typ, Sequence) and isinstance(typ.feature, ClassLabel):
            df[column] = df[column].transform(lambda x: [typ.feature.names[i] for i in x])
    display(HTML(df.to_html()))

In [9]:
show_random_elements(dataset["train"])

Unnamed: 0,quote,author,tags
0,“I'm so glad I live in a world where there are Octobers.”,"L. M. Montgomery,","[autumn, october, thankfulness]"
1,"“You endure what is unbearable, and you bear it. That is all.”","Cassandra Clare,","[2013, goodbye, james-carstairs, life, magnus-bane, william-herondale]"
2,"“You know, Minister, I disagree with Dumbledore on many counts...but you cannot deny he's got style...”","J.K. Rowling,","[dumbledore, fudge, minister, style]"
3,"“When a man gives his opinion, he's a man. When a woman gives her opinion, she's a bitch.”",Bette Davis,"[clichÃ©s, double-standards, empowerment, feminism, gender, hypocrisy, misogyny, opinions, speaking-out, stereotypes, women]"
4,“Compassion is the basis of morality.”,Arthur Schopenhauer,"[compassion, morality, morals]"
5,"“Isn't it odd how much fatter a book gets when you've read it several times?"" Mo had said...""As if something were left between the pages every time you read it. Feelings, thoughts, sounds, smells...and then, when you look at the book again many years later, you find yourself there, too, a slightly younger self, slightly different, as if the book had preserved you like a pressed flower...both strange and familiar.”","Cornelia Funke,","[books, feelings, reading, thoughts]"
6,“This is a new year. A new beginning. And things will change.”,Taylor Swift,"[change, fresh-starts, new-beginnings, reinvention, time]"
7,"“Sonnet XVIII do not love you as if you were salt-rose, or topaz,or the arrow of carnations the fire shoots off.I love you as certain dark things are to be loved,in secret, between the shadow and the soul.I love you as the plant that never bloomsbut carries in itself the light of hidden flowers;thanks to your love a certain solid fragrance,risen from the earth, lives darkly in my body.I love you without knowing how, or when, or from where.I love you straightforwardly, without complexities or pride;so I love you because I know no other way than this: where I does not exist, nor you,so close that your hand on my chest is my hand,so close that your eyes close as I fall asleep. ”",Pablo Neruda,[sonnet-xvii]
8,"“Gus: ""It tastes like...""Me: ""Food.""Gus: ""Yes, precisely. It tastes like food, excellently prepared. But it does not taste, how do I put this delicately...?""Me: ""It does not taste like God Himself cooked heaven into a series of five dishes which were then served to you accompanied by several luminous balls of fermented, bubbly plasma while actual and literal flower petals floated down around your canal-side dinner table.""Gus: ""Nicely phrased.""Gus's father: ""Our children are weird.""My dad: ""Nicely phrased.”","John Green,","[food-discussion, tfios]"
9,"“Sometimes, you read a book and it fills you with this weird evangelical zeal, and you become convinced that the shattered world will never be put back together unless and until all living humans read the book. And then there are books like An Imperial Affliction, which you can't tell people about, books so special and rare and yours that advertising your affection feels like betrayal”","John Green,","[books, john-green, reading, the-fault-in-our-stars]"


In [11]:

tokenized_dataset = dataset.map(lambda samples: tokenizer(samples["quote"]), batched=True)

In [12]:
from transformers import DataCollatorForLanguageModeling

# 数据收集器，用于处理语言模型的数据，这里设置为不使用掩码语言模型(MLM)
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

### 微调模型

In [17]:
from transformers import TrainingArguments, Trainer

model_dir = "models"

training_args = TrainingArguments(
        output_dir=f"{model_dir}/{model_id}-lora",  # 指定模型输出和保存的目录
        per_device_train_batch_size=4,  # 每个设备上的训练批量大小
        learning_rate=2e-4,  # 学习率
        fp16=True,  # 启用混合精度训练，可以提高训练速度，同时减少内存使用
        logging_steps=20,  # 指定日志记录的步长，用于跟踪训练进度
        # max_steps=100, # 最大训练步长
        num_train_epochs=1  # 训练的总轮数
    )

In [14]:
# 添加 LoRA 模块后的模型
model

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): OPTForCausalLM(
      (model): OPTModel(
        (decoder): OPTDecoder(
          (embed_tokens): Embedding(50272, 4096, padding_idx=1)
          (embed_positions): OPTLearnedPositionalEmbedding(2050, 4096)
          (final_layer_norm): LayerNorm((4096,), eps=1e-05, elementwise_affine=True)
          (layers): ModuleList(
            (0-31): 32 x OPTDecoderLayer(
              (self_attn): OPTAttention(
                (k_proj): lora.Linear8bitLt(
                  (base_layer): Linear8bitLt(in_features=4096, out_features=4096, bias=True)
                  (lora_dropout): ModuleDict(
                    (default): Dropout(p=0.05, inplace=False)
                  )
                  (lora_A): ModuleDict(
                    (default): Linear(in_features=4096, out_features=8, bias=False)
                  )
                  (lora_B): ModuleDict(
                    (default): Linear(in_features=8, out_features=4096, bias=Fals

In [18]:
trainer = Trainer(
    model=model,  # 指定训练时使用的模型
    train_dataset=tokenized_dataset["train"],  # 指定训练数据集
    args=training_args,
    data_collator=data_collator,
)

In [19]:
model.use_cache = False

In [20]:
trainer.train()

`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...


Step,Training Loss
20,1.966
40,2.0173
60,1.8569
80,1.8669
100,2.0153
120,1.9663
140,2.0577
160,2.0246
180,1.8512
200,1.6914




TrainOutput(global_step=627, training_loss=1.8925303926118062, metrics={'train_runtime': 689.2613, 'train_samples_per_second': 3.639, 'train_steps_per_second': 0.91, 'total_flos': 8985823469568000.0, 'train_loss': 1.8925303926118062, 'epoch': 1.0})

### 保存 LoRA 模型

In [21]:
model_path = f"{model_dir}/{model_id}-lora-int8"

#trainer.save_model(model_path)
model.save_pretrained(model_path)

### 使用 LoRA 模型

In [22]:
lora_model = trainer.model

In [23]:
text = "Two things are infinite: "
inputs = tokenizer(text, return_tensors="pt").to(0)

out = lora_model.generate(**inputs, max_new_tokens=48)
print(tokenizer.decode(out[0], skip_special_tokens=True))



Two things are infinite:  the universe and human stupidity; and I'm not sure about the universe.  But I'm sure of this:  I'm not sure about the universe.
I'm not sure about the universe either.


In [24]:
text = "Two things are infinite: "
inputs = tokenizer(text, return_tensors="pt").to(0)

out = lora_model.generate(**inputs, max_new_tokens=48)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Two things are infinite:  The universe and human stupidity.  And I'm not sure about the universe.  -Albert Einstein
I'm not sure about the universe either.


通过在 [english_quotes 数据集](https://huggingface.co/datasets/Abirate/english_quotes)上的少量微调（100 steps，不到1个epoch），LoRA 适配器恢复了阿尔伯特·爱因斯坦的名言警句。

In [25]:
out[0]

tensor([    2,  9058,   383,    32, 32952,    35,  1437,    20,  9468,     8,
         1050, 38821,     4,  1437,   178,    38,   437,    45,   686,    59,
            5,  9468,     4,  1437,   111, 36977, 27648, 50118,   100,   437,
           45,   686,    59,     5,  9468,  1169,     4,     2],
       device='cuda:0')