<a href="https://colab.research.google.com/github/weedge/doraemon-nb/blob/main/sft_peft_prompt_tuning_bloomz_560m.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# soft prompts - prompt tuning

- https://huggingface.co/docs/peft/package_reference/prompt_tuning
- [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)


**注**：
- 训练参数量少，直接可以在cpu上训练，注意内存需要hold住训练的模型参数加载到内从中进行训练。如果有条件使用gpu加速


Prompt tuning(提示调整)是一种用于调整预训练语言模型以执行特定下游任务的技术。在这种方法中，会向输入中添加特定于任务的提示（prompts），而这些提示参数是独立于预训练模型参数进行更新的，预训练模型的参数则保持固定。

论文[The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/pdf/2104.08691.pdf)摘要:

这篇论文的摘要介绍了一种名为“prompt tuning”的技术，这是一种用于调整冻结语言模型以执行特定下游任务的方法。与GPT-3使用的离散文本提示不同，软提示（soft prompts）是通过反向传播学习得到的，并且可以根据标记样本的数量进行调整。文章展示了通过prompt tuning的方法，可以在不调整模型权重的情况下，通过调整输入文本前的软提示来改善模型性能。

研究者们发现，prompt tuning在模型规模扩大时变得更加有竞争力，尤其是在模型参数超过数十亿时，其性能与模型调整（调整所有模型权重）相当。这一点特别重要，因为大型模型共享和部署成本高昂，能够重用一个冻结模型来处理多个下游任务可以减轻这一负担。

文章还比较了prompt tuning与最近提出的“prefix tuning”等方法，并展示了prompt tuning在鲁棒性和效率方面的优势。此外，文章还探讨了prompt tuning在领域转移任务中的性能，以及如何通过学习多个提示（prompt ensembling）来提高任务性能。

总的来说，这篇论文提出了一种有效的、参数高效的调整方法，适用于大型预训练语言模型，并且在多个方面展示了其优越性。

## 数据集

- ought/raft twitter_complaints 数据,用于标记是否是投诉的微博 50条用作训练(20%作为训练验证loss)，3399条用做测试验证


中文数据集可以找下微博相关数据集， 如下场景：
- 用户@官方账号，进行投诉，或者产品改进建议， 比如这条改进建议：
```
类似其他页面文档也是， 但是存在一个问题，如果能把每页的阅读过的会话保存，能够直接查阅以往阅读会话就好了， 这样不用回到原来页面在加载分析一次，貌似这样会增加浏览器本地缓存。@Kimi智能助手
```
- 标识为危险信息，进行过滤，风控服务中会用到，对模型微调风控数据。比如UGC中的发布信息功能


In [3]:
!pip install -q transformers datasets peft tqdm torch

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/510.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/510.5 kB[0m [31m2.6 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m501.8/510.5 kB[0m [31m7.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m510.5/510.5 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m199.1/199.1 kB[0m [31m18.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m116.3/116.3 kB[0m [31m13.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m194.1/194.1 kB[0m [31m20.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m


配置参数：

In [4]:
from transformers import AutoModelForCausalLM
from peft import get_peft_config, get_peft_model, PromptTuningInit, PromptTuningConfig, TaskType, PeftType
import torch
from datasets import load_dataset
import os
from transformers import AutoTokenizer
from torch.utils.data import DataLoader
from transformers import default_data_collator, get_linear_schedule_with_warmup
from tqdm import tqdm
from datasets import load_dataset

device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(device)

model_name_or_path = "bigscience/bloomz-560m"
tokenizer_name_or_path = "bigscience/bloomz-560m"
peft_config = PromptTuningConfig(
    task_type=TaskType.CAUSAL_LM,
    prompt_tuning_init=PromptTuningInit.TEXT,
    num_virtual_tokens=8,
    prompt_tuning_init_text="Classify if the tweet is a complaint or not:",
    tokenizer_name_or_path=model_name_or_path,
)
#peft_config = PrefixTuningConfig(task_type=TaskType.CAUSAL_LM, num_virtual_tokens=30)

print("peft_config",peft_config)

dataset_name = "twitter_complaints"
checkpoint_name = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}_v1.pt".replace(
    "/", "_"
)
text_column = "Tweet text"
label_column = "text_label"
max_length = 64
lr = 3e-2
num_epochs = 50

batch_size = 8

cpu
peft_config PromptTuningConfig(peft_type=<PeftType.PROMPT_TUNING: 'PROMPT_TUNING'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, inference_mode=False, num_virtual_tokens=8, token_dim=None, num_transformer_submodules=None, num_attention_heads=None, num_layers=None, prompt_tuning_init=<PromptTuningInit.TEXT: 'TEXT'>, prompt_tuning_init_text='Classify if the tweet is a complaint or not:', tokenizer_name_or_path='bigscience/bloomz-560m', tokenizer_kwargs=None)


`PromptTuningConfig` 的配置，用于配置 Prompt Tuning（PT）的相关选项。Prompt Tuning 是一种通过微调预训练语言模型来实现零样本或小样本学习的技术，它通过添加自定义的提示（prompt）来指导模型进行特定任务的学习。以下是各个参数的含义解释：

1. `peft_type`: PEFT（Parameter-Efficient Fine-Tuning）类型。在这里，设置为 `PeftType.PROMPT_TUNING`，表示使用 Prompt Tuning 进行微调。

2. `auto_mapping`: 自动映射。用于指定是否自动映射预训练模型的参数以适应新任务。

3. `base_model_name_or_path`: 基础模型名称或路径。用于指定要微调的基础语言模型的名称或路径。

4. `revision`: 模型修订版本。

5. `task_type`: 任务类型。在这里，设置为 `TaskType.CAUSAL_LM`，表示任务类型为因果语言建模（Causal Language Modeling）。

6. `inference_mode`: 推理模式。用于指定是否在推理时使用 Prompt Tuning。

7. `num_virtual_tokens`: 虚拟token数量。用于指定在 PT 中使用的虚拟token数量。

8. `token_dim`: token维度。用于指定 PT 中标记的维度。

9. `num_transformer_submodules`: Transformer 子模块数量。

10. `num_attention_heads`: 注意力头数量。

11. `num_layers`: 层数量。

12. `prompt_tuning_init`: Prompt Tuning 初始化方法。在这里，设置为 `PromptTuningInit.TEXT`，表示使用文本作为初始化。

13. `prompt_tuning_init_text`: Prompt Tuning 初始化文本。指定了用于初始化的文本提示。

14. `tokenizer_name_or_path`: 分词器名称或路径。用于指定分词器的名称或路径。

15. `tokenizer_kwargs`: 分词器参数。用于指定分词器的其他参数，如词汇表大小、特殊标记等。

这些参数用于配置 Prompt Tuning 过程中的各个方面，包括任务类型、模型初始化、分词器设置等。通过调整这些参数，可以根据具体的任务和需求来定制 Prompt Tuning 的行为。

In [5]:
from datasets import load_dataset

dataset = load_dataset("ought/raft", dataset_name)
print(dataset)
print(dataset["train"][0])

classes = [k.replace("_", " ") for k in dataset["train"].features["Label"].names]
print(classes)
dataset = dataset.map(
    lambda x: {"text_label": [classes[label] for label in x["Label"]]},
    batched=True,
    num_proc=1,
)
print(dataset)
print(dataset["train"][0])


DatasetDict({
    train: Dataset({
        features: ['Tweet text', 'ID', 'Label'],
        num_rows: 50
    })
    test: Dataset({
        features: ['Tweet text', 'ID', 'Label'],
        num_rows: 3399
    })
})
{'Tweet text': '@HMRCcustomers No this is my first job', 'ID': 0, 'Label': 2}
['Unlabeled', 'complaint', 'no complaint']
DatasetDict({
    train: Dataset({
        features: ['Tweet text', 'ID', 'Label', 'text_label'],
        num_rows: 50
    })
    test: Dataset({
        features: ['Tweet text', 'ID', 'Label', 'text_label'],
        num_rows: 3399
    })
})
{'Tweet text': '@HMRCcustomers No this is my first job', 'ID': 0, 'Label': 2, 'text_label': 'no complaint'}


In [6]:
# data preprocessing
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
if tokenizer.pad_token_id is None:
    tokenizer.pad_token_id = tokenizer.eos_token_id
target_max_length = max([len(tokenizer(class_label)["input_ids"]) for class_label in classes])
print(target_max_length)

#这里和lora稍有不同，应为是prompt tuning, 对input进行了 PE 模版操作，"{key} : {val} Lable : "
def preprocess_function(examples):
    batch_size = len(examples[text_column])
    inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
    targets = [str(x) for x in examples[label_column]]
    model_inputs = tokenizer(inputs)
    labels = tokenizer(text_target=targets, add_special_tokens=False)  # don't add bos token because we concatenate with inputs
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i] + [tokenizer.eos_token_id]
        # print(i, sample_input_ids, label_input_ids)
        model_inputs["input_ids"][i] = sample_input_ids + label_input_ids
        labels["input_ids"][i] = [-100] * len(sample_input_ids) + label_input_ids
        model_inputs["attention_mask"][i] = [1] * len(model_inputs["input_ids"][i])
    # print(model_inputs)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        label_input_ids = labels["input_ids"][i]
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
            max_length - len(sample_input_ids)
        ) + sample_input_ids
        model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
            "attention_mask"
        ][i]
        labels["input_ids"][i] = [-100] * (max_length - len(sample_input_ids)) + label_input_ids
        model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
        model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
        labels["input_ids"][i] = torch.tensor(labels["input_ids"][i][:max_length])
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


raw_processed_datasets = dataset.map(
    preprocess_function,
    batched=True,
    num_proc=1,
    #移除不需要的字段，为了model(**batch)推理，input_ids(输入)，attention_mask(self attention mask)，labels(输出期望)
    remove_columns=dataset["train"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)
#processed_datasets=raw_processed_datasets["train"].train_test_split(test_size=0.2)
processed_datasets=raw_processed_datasets
processed_datasets["validation"]=raw_processed_datasets["test"]
print("processed_datasets",processed_datasets)
print(processed_datasets["train"][0])

# train_dataset is the same as eval_dataset
train_dataset = processed_datasets["train"]
eval_dataset = processed_datasets["train"]
#eval_dataset = processed_datasets["test"]

# but train data is shuffle random; eval data don't shuffle
train_dataloader = DataLoader(
    train_dataset, shuffle=True, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True
)
eval_dataloader = DataLoader(eval_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)

3


Running tokenizer on dataset:   0%|          | 0/50 [00:00<?, ? examples/s]

Running tokenizer on dataset:   0%|          | 0/3399 [00:00<?, ? examples/s]

processed_datasets DatasetDict({
    train: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 50
    })
    test: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 3399
    })
    validation: Dataset({
        features: ['input_ids', 'attention_mask', 'labels'],
        num_rows: 3399
    })
})
{'input_ids': [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 227985, 5484, 915, 2566, 169403, 15296, 36272, 525, 3928, 1119, 632, 2670, 3968, 15270, 77658, 915, 210, 1936, 106863, 2], 'attention_mask': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'labels': [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -

In [7]:
def test_preprocess_function(examples):
    batch_size = len(examples[text_column])
    inputs = [f"{text_column} : {x} Label : " for x in examples[text_column]]
    model_inputs = tokenizer(inputs)
    # print(model_inputs)
    for i in range(batch_size):
        sample_input_ids = model_inputs["input_ids"][i]
        model_inputs["input_ids"][i] = [tokenizer.pad_token_id] * (
            max_length - len(sample_input_ids)
        ) + sample_input_ids
        model_inputs["attention_mask"][i] = [0] * (max_length - len(sample_input_ids)) + model_inputs[
            "attention_mask"
        ][i]
        model_inputs["input_ids"][i] = torch.tensor(model_inputs["input_ids"][i][:max_length])
        model_inputs["attention_mask"][i] = torch.tensor(model_inputs["attention_mask"][i][:max_length])
    return model_inputs


test_dataset = dataset["test"].map(
    test_preprocess_function,
    batched=True,
    num_proc=1,
    #移除不需要的字段，为了model(**batch)推理
    remove_columns=dataset["test"].column_names,
    load_from_cache_file=False,
    desc="Running tokenizer on dataset",
)

test_dataloader = DataLoader(test_dataset, collate_fn=default_data_collator, batch_size=batch_size, pin_memory=True)
next(iter(test_dataloader))

Running tokenizer on dataset:   0%|          | 0/3399 [00:00<?, ? examples/s]

{'input_ids': tensor([[     3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
          227985,   5484,    915,   2566,  74757,  64626,  12384,  44639,    613,
           52282,   2670,  79920,   3344,   1002,    368,  17646,  14472,   8348,
             664,    718,      4,  19036,     17,  31849,     17,   6312,     76,
              44,  62470,     56,     91,     50,  14839,     21,  77658,    915,
             210],
         [     3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3, 227985,   5484,    915,    405, 187059,
            2256,    664,   2550,  18833,  18607, 162467,      4, 

In [8]:
next(iter(train_dataloader))

{'input_ids': tensor([[     3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3, 227985,   5484,
             915,   2566,     44,    256,  67875,  21033,  86274,  79707,   2632,
            9999,    427,   2150,  54036,  98091,     34, 112164,  15971,  16154,
            5382,    861,   7220,     17,  77658,    915,    210,   1936, 106863,
               2],
         [     3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3, 227985,   5484,
             915,   2566,  88653,   2321, 144017, 138861,  59283,   1152,    613,
            2632,  12120,      4,   5673,   1152,  32153,    427, 

In [9]:
len(test_dataloader)

425

In [10]:
next(iter(test_dataloader))

{'input_ids': tensor([[     3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
          227985,   5484,    915,   2566,  74757,  64626,  12384,  44639,    613,
           52282,   2670,  79920,   3344,   1002,    368,  17646,  14472,   8348,
             664,    718,      4,  19036,     17,  31849,     17,   6312,     76,
              44,  62470,     56,     91,     50,  14839,     21,  77658,    915,
             210],
         [     3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3,      3,      3,      3,      3,      3,
               3,      3,      3,      3, 227985,   5484,    915,    405, 187059,
            2256,    664,   2550,  18833,  18607, 162467,      4, 

In [11]:

# creating model
model = AutoModelForCausalLM.from_pretrained(model_name_or_path)
print(model)
model = get_peft_model(model, peft_config)
print("peft_model",model)

model.print_trainable_parameters()

BloomForCausalLM(
  (transformer): BloomModel(
    (word_embeddings): Embedding(250880, 1024)
    (word_embeddings_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    (h): ModuleList(
      (0-23): 24 x BloomBlock(
        (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (self_attention): BloomAttention(
          (query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
          (dense): Linear(in_features=1024, out_features=1024, bias=True)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (mlp): BloomMLP(
          (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
          (gelu_impl): BloomGelu()
          (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
        )
      )
    )
    (ln_f): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (

peft_mode 在embedding层多了如下PromptEmbedding权重需要微调训练， 训练参数量才8*1024=8192，cpu就可以hold住训练，训练时间不长
```
  (prompt_encoder): ModuleDict(
    (default): PromptEmbedding(
      (embedding): Embedding(8, 1024)
    )
  )
  (word_embeddings): Embedding(250880, 1024)
```

In [12]:
# model
# optimizer and lr scheduler
optimizer = torch.optim.AdamW(model.parameters(), lr=lr)
lr_scheduler = get_linear_schedule_with_warmup(
    optimizer=optimizer,
    num_warmup_steps=0,
    num_training_steps=(len(train_dataloader) * num_epochs),
)

In [13]:
# training and evaluation
model = model.to(device)

for epoch in range(num_epochs):
    model.train()
    total_loss = 0
    for step, batch in enumerate(tqdm(train_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        #         print(batch)
        #         print(batch["input_ids"].shape)
        outputs = model(**batch)
        loss = outputs.loss
        total_loss += loss.detach().float()
        loss.backward()
        optimizer.step()
        lr_scheduler.step()
        optimizer.zero_grad()

    model.eval()
    eval_loss = 0
    eval_preds = []
    for step, batch in enumerate(tqdm(eval_dataloader)):
        batch = {k: v.to(device) for k, v in batch.items()}
        with torch.no_grad():
            outputs = model(**batch)
        loss = outputs.loss
        eval_loss += loss.detach().float()
        eval_preds.extend(
            tokenizer.batch_decode(torch.argmax(outputs.logits, -1).detach().cpu().numpy(), skip_special_tokens=True)
        )

    eval_epoch_loss = eval_loss / len(eval_dataloader)
    eval_ppl = torch.exp(eval_epoch_loss)
    train_epoch_loss = total_loss / len(train_dataloader)
    train_ppl = torch.exp(train_epoch_loss)
    print(f"{epoch=}: {train_ppl=} {train_epoch_loss=} {eval_ppl=} {eval_epoch_loss=}")

100%|██████████| 7/7 [03:11<00:00, 27.32s/it]
100%|██████████| 7/7 [01:24<00:00, 12.06s/it]


epoch=0: train_ppl=tensor(166.0994) train_epoch_loss=tensor(5.1126) eval_ppl=tensor(14.0646) eval_epoch_loss=tensor(2.6437)


100%|██████████| 7/7 [04:02<00:00, 34.69s/it]
100%|██████████| 7/7 [01:28<00:00, 12.71s/it]


epoch=1: train_ppl=tensor(6.0815) train_epoch_loss=tensor(1.8052) eval_ppl=tensor(1.6809) eval_epoch_loss=tensor(0.5193)


100%|██████████| 7/7 [03:09<00:00, 27.10s/it]
100%|██████████| 7/7 [01:24<00:00, 12.12s/it]


epoch=2: train_ppl=tensor(1.4180) train_epoch_loss=tensor(0.3493) eval_ppl=tensor(1.2926) eval_epoch_loss=tensor(0.2567)


100%|██████████| 7/7 [03:15<00:00, 27.91s/it]
100%|██████████| 7/7 [01:23<00:00, 12.00s/it]


epoch=3: train_ppl=tensor(1.2551) train_epoch_loss=tensor(0.2272) eval_ppl=tensor(1.3150) eval_epoch_loss=tensor(0.2739)


100%|██████████| 7/7 [03:05<00:00, 26.56s/it]
100%|██████████| 7/7 [01:23<00:00, 11.95s/it]


epoch=4: train_ppl=tensor(1.2556) train_epoch_loss=tensor(0.2276) eval_ppl=tensor(1.2436) eval_epoch_loss=tensor(0.2180)


100%|██████████| 7/7 [03:05<00:00, 26.44s/it]
100%|██████████| 7/7 [01:30<00:00, 12.88s/it]


epoch=5: train_ppl=tensor(1.2180) train_epoch_loss=tensor(0.1972) eval_ppl=tensor(1.3320) eval_epoch_loss=tensor(0.2866)


100%|██████████| 7/7 [03:10<00:00, 27.19s/it]
100%|██████████| 7/7 [01:22<00:00, 11.85s/it]


epoch=6: train_ppl=tensor(1.2837) train_epoch_loss=tensor(0.2497) eval_ppl=tensor(1.2280) eval_epoch_loss=tensor(0.2054)


100%|██████████| 7/7 [03:04<00:00, 26.32s/it]
100%|██████████| 7/7 [01:27<00:00, 12.56s/it]


epoch=7: train_ppl=tensor(1.2562) train_epoch_loss=tensor(0.2281) eval_ppl=tensor(1.2512) eval_epoch_loss=tensor(0.2241)


100%|██████████| 7/7 [03:07<00:00, 26.80s/it]
100%|██████████| 7/7 [01:23<00:00, 11.98s/it]


epoch=8: train_ppl=tensor(1.2744) train_epoch_loss=tensor(0.2424) eval_ppl=tensor(1.2423) eval_epoch_loss=tensor(0.2169)


100%|██████████| 7/7 [03:05<00:00, 26.53s/it]
100%|██████████| 7/7 [01:23<00:00, 11.89s/it]


epoch=9: train_ppl=tensor(1.2327) train_epoch_loss=tensor(0.2092) eval_ppl=tensor(1.1897) eval_epoch_loss=tensor(0.1737)


100%|██████████| 7/7 [03:13<00:00, 27.58s/it]
100%|██████████| 7/7 [01:23<00:00, 11.91s/it]


epoch=10: train_ppl=tensor(1.1969) train_epoch_loss=tensor(0.1797) eval_ppl=tensor(1.2143) eval_epoch_loss=tensor(0.1941)


100%|██████████| 7/7 [03:05<00:00, 26.55s/it]
100%|██████████| 7/7 [01:23<00:00, 11.92s/it]


epoch=11: train_ppl=tensor(1.1437) train_epoch_loss=tensor(0.1343) eval_ppl=tensor(1.1764) eval_epoch_loss=tensor(0.1624)


100%|██████████| 7/7 [03:12<00:00, 27.47s/it]
100%|██████████| 7/7 [01:23<00:00, 11.98s/it]


epoch=12: train_ppl=tensor(1.2012) train_epoch_loss=tensor(0.1833) eval_ppl=tensor(1.1684) eval_epoch_loss=tensor(0.1556)


100%|██████████| 7/7 [03:05<00:00, 26.48s/it]
100%|██████████| 7/7 [01:25<00:00, 12.23s/it]


epoch=13: train_ppl=tensor(1.1420) train_epoch_loss=tensor(0.1328) eval_ppl=tensor(1.1498) eval_epoch_loss=tensor(0.1396)


100%|██████████| 7/7 [03:05<00:00, 26.44s/it]
100%|██████████| 7/7 [01:29<00:00, 12.85s/it]


epoch=14: train_ppl=tensor(1.1283) train_epoch_loss=tensor(0.1207) eval_ppl=tensor(1.1276) eval_epoch_loss=tensor(0.1201)


100%|██████████| 7/7 [03:04<00:00, 26.38s/it]
100%|██████████| 7/7 [01:22<00:00, 11.81s/it]


epoch=15: train_ppl=tensor(1.1305) train_epoch_loss=tensor(0.1227) eval_ppl=tensor(1.1180) eval_epoch_loss=tensor(0.1115)


100%|██████████| 7/7 [03:09<00:00, 27.03s/it]
100%|██████████| 7/7 [01:34<00:00, 13.52s/it]


epoch=16: train_ppl=tensor(1.1335) train_epoch_loss=tensor(0.1253) eval_ppl=tensor(1.0977) eval_epoch_loss=tensor(0.0932)


100%|██████████| 7/7 [03:03<00:00, 26.15s/it]
100%|██████████| 7/7 [01:27<00:00, 12.50s/it]


epoch=17: train_ppl=tensor(1.1269) train_epoch_loss=tensor(0.1195) eval_ppl=tensor(1.0938) eval_epoch_loss=tensor(0.0897)


100%|██████████| 7/7 [04:28<00:00, 38.34s/it]
100%|██████████| 7/7 [02:00<00:00, 17.27s/it]


epoch=18: train_ppl=tensor(1.1231) train_epoch_loss=tensor(0.1161) eval_ppl=tensor(1.1237) eval_epoch_loss=tensor(0.1166)


100%|██████████| 7/7 [03:11<00:00, 27.30s/it]
100%|██████████| 7/7 [01:23<00:00, 11.98s/it]


epoch=19: train_ppl=tensor(1.1131) train_epoch_loss=tensor(0.1071) eval_ppl=tensor(1.0837) eval_epoch_loss=tensor(0.0803)


100%|██████████| 7/7 [03:00<00:00, 25.81s/it]
100%|██████████| 7/7 [01:22<00:00, 11.83s/it]


epoch=20: train_ppl=tensor(1.0853) train_epoch_loss=tensor(0.0818) eval_ppl=tensor(1.0755) eval_epoch_loss=tensor(0.0728)


100%|██████████| 7/7 [03:00<00:00, 25.85s/it]
100%|██████████| 7/7 [01:22<00:00, 11.80s/it]


epoch=21: train_ppl=tensor(1.0846) train_epoch_loss=tensor(0.0812) eval_ppl=tensor(1.0633) eval_epoch_loss=tensor(0.0614)


100%|██████████| 7/7 [03:00<00:00, 25.83s/it]
100%|██████████| 7/7 [01:22<00:00, 11.83s/it]


epoch=22: train_ppl=tensor(1.0710) train_epoch_loss=tensor(0.0686) eval_ppl=tensor(1.0803) eval_epoch_loss=tensor(0.0772)


100%|██████████| 7/7 [03:01<00:00, 25.97s/it]
100%|██████████| 7/7 [01:23<00:00, 11.97s/it]


epoch=23: train_ppl=tensor(1.0711) train_epoch_loss=tensor(0.0687) eval_ppl=tensor(1.0740) eval_epoch_loss=tensor(0.0714)


100%|██████████| 7/7 [02:59<00:00, 25.61s/it]
100%|██████████| 7/7 [01:23<00:00, 11.91s/it]


epoch=24: train_ppl=tensor(1.0678) train_epoch_loss=tensor(0.0656) eval_ppl=tensor(1.0532) eval_epoch_loss=tensor(0.0518)


100%|██████████| 7/7 [02:58<00:00, 25.54s/it]
100%|██████████| 7/7 [01:22<00:00, 11.84s/it]


epoch=25: train_ppl=tensor(1.0832) train_epoch_loss=tensor(0.0800) eval_ppl=tensor(1.0544) eval_epoch_loss=tensor(0.0529)


100%|██████████| 7/7 [02:59<00:00, 25.69s/it]
100%|██████████| 7/7 [01:22<00:00, 11.74s/it]


epoch=26: train_ppl=tensor(1.1777) train_epoch_loss=tensor(0.1635) eval_ppl=tensor(1.1620) eval_epoch_loss=tensor(0.1502)


100%|██████████| 7/7 [03:01<00:00, 25.98s/it]
100%|██████████| 7/7 [01:23<00:00, 11.90s/it]


epoch=27: train_ppl=tensor(1.1276) train_epoch_loss=tensor(0.1201) eval_ppl=tensor(1.0934) eval_epoch_loss=tensor(0.0893)


100%|██████████| 7/7 [03:07<00:00, 26.82s/it]
100%|██████████| 7/7 [01:21<00:00, 11.70s/it]


epoch=28: train_ppl=tensor(1.0846) train_epoch_loss=tensor(0.0812) eval_ppl=tensor(1.0541) eval_epoch_loss=tensor(0.0527)


100%|██████████| 7/7 [03:01<00:00, 25.94s/it]
100%|██████████| 7/7 [01:22<00:00, 11.82s/it]


epoch=29: train_ppl=tensor(1.0565) train_epoch_loss=tensor(0.0549) eval_ppl=tensor(1.0509) eval_epoch_loss=tensor(0.0497)


100%|██████████| 7/7 [03:01<00:00, 25.90s/it]
100%|██████████| 7/7 [01:23<00:00, 11.91s/it]


epoch=30: train_ppl=tensor(1.0475) train_epoch_loss=tensor(0.0464) eval_ppl=tensor(1.0387) eval_epoch_loss=tensor(0.0380)


100%|██████████| 7/7 [02:58<00:00, 25.51s/it]
100%|██████████| 7/7 [01:23<00:00, 11.87s/it]


epoch=31: train_ppl=tensor(1.0346) train_epoch_loss=tensor(0.0340) eval_ppl=tensor(1.0306) eval_epoch_loss=tensor(0.0302)


100%|██████████| 7/7 [02:57<00:00, 25.37s/it]
100%|██████████| 7/7 [01:23<00:00, 11.87s/it]


epoch=32: train_ppl=tensor(1.0286) train_epoch_loss=tensor(0.0282) eval_ppl=tensor(1.0243) eval_epoch_loss=tensor(0.0240)


100%|██████████| 7/7 [02:57<00:00, 25.29s/it]
100%|██████████| 7/7 [01:22<00:00, 11.72s/it]


epoch=33: train_ppl=tensor(1.0251) train_epoch_loss=tensor(0.0247) eval_ppl=tensor(1.0206) eval_epoch_loss=tensor(0.0204)


100%|██████████| 7/7 [02:56<00:00, 25.17s/it]
100%|██████████| 7/7 [01:21<00:00, 11.65s/it]


epoch=34: train_ppl=tensor(1.0185) train_epoch_loss=tensor(0.0183) eval_ppl=tensor(1.0169) eval_epoch_loss=tensor(0.0167)


100%|██████████| 7/7 [02:54<00:00, 24.99s/it]
100%|██████████| 7/7 [01:21<00:00, 11.64s/it]


epoch=35: train_ppl=tensor(1.0159) train_epoch_loss=tensor(0.0158) eval_ppl=tensor(1.0151) eval_epoch_loss=tensor(0.0150)


100%|██████████| 7/7 [02:55<00:00, 25.10s/it]
100%|██████████| 7/7 [01:21<00:00, 11.66s/it]


epoch=36: train_ppl=tensor(1.0159) train_epoch_loss=tensor(0.0158) eval_ppl=tensor(1.0135) eval_epoch_loss=tensor(0.0134)


100%|██████████| 7/7 [02:54<00:00, 24.88s/it]
100%|██████████| 7/7 [01:22<00:00, 11.75s/it]


epoch=37: train_ppl=tensor(1.0146) train_epoch_loss=tensor(0.0145) eval_ppl=tensor(1.0120) eval_epoch_loss=tensor(0.0120)


100%|██████████| 7/7 [02:56<00:00, 25.23s/it]
100%|██████████| 7/7 [01:21<00:00, 11.66s/it]


epoch=38: train_ppl=tensor(1.0113) train_epoch_loss=tensor(0.0112) eval_ppl=tensor(1.0131) eval_epoch_loss=tensor(0.0130)


100%|██████████| 7/7 [02:55<00:00, 25.01s/it]
100%|██████████| 7/7 [01:22<00:00, 11.72s/it]


epoch=39: train_ppl=tensor(1.0117) train_epoch_loss=tensor(0.0116) eval_ppl=tensor(1.0108) eval_epoch_loss=tensor(0.0107)


100%|██████████| 7/7 [02:55<00:00, 25.02s/it]
100%|██████████| 7/7 [01:22<00:00, 11.75s/it]


epoch=40: train_ppl=tensor(1.0103) train_epoch_loss=tensor(0.0103) eval_ppl=tensor(1.0099) eval_epoch_loss=tensor(0.0099)


100%|██████████| 7/7 [02:55<00:00, 25.08s/it]
100%|██████████| 7/7 [01:23<00:00, 11.95s/it]


epoch=41: train_ppl=tensor(1.0105) train_epoch_loss=tensor(0.0104) eval_ppl=tensor(1.0092) eval_epoch_loss=tensor(0.0092)


100%|██████████| 7/7 [02:54<00:00, 24.94s/it]
100%|██████████| 7/7 [01:24<00:00, 12.00s/it]


epoch=42: train_ppl=tensor(1.0092) train_epoch_loss=tensor(0.0091) eval_ppl=tensor(1.0087) eval_epoch_loss=tensor(0.0086)


100%|██████████| 7/7 [02:55<00:00, 25.02s/it]
100%|██████████| 7/7 [01:22<00:00, 11.82s/it]


epoch=43: train_ppl=tensor(1.0082) train_epoch_loss=tensor(0.0082) eval_ppl=tensor(1.0083) eval_epoch_loss=tensor(0.0083)


100%|██████████| 7/7 [02:55<00:00, 25.12s/it]
100%|██████████| 7/7 [01:22<00:00, 11.84s/it]


epoch=44: train_ppl=tensor(1.0075) train_epoch_loss=tensor(0.0075) eval_ppl=tensor(1.0080) eval_epoch_loss=tensor(0.0080)


100%|██████████| 7/7 [02:55<00:00, 25.11s/it]
100%|██████████| 7/7 [01:22<00:00, 11.83s/it]


epoch=45: train_ppl=tensor(1.0080) train_epoch_loss=tensor(0.0080) eval_ppl=tensor(1.0078) eval_epoch_loss=tensor(0.0077)


100%|██████████| 7/7 [02:58<00:00, 25.55s/it]
100%|██████████| 7/7 [01:23<00:00, 11.95s/it]


epoch=46: train_ppl=tensor(1.0072) train_epoch_loss=tensor(0.0072) eval_ppl=tensor(1.0076) eval_epoch_loss=tensor(0.0076)


100%|██████████| 7/7 [02:55<00:00, 25.02s/it]
100%|██████████| 7/7 [01:23<00:00, 11.92s/it]


epoch=47: train_ppl=tensor(1.0070) train_epoch_loss=tensor(0.0070) eval_ppl=tensor(1.0075) eval_epoch_loss=tensor(0.0074)


100%|██████████| 7/7 [02:55<00:00, 25.01s/it]
100%|██████████| 7/7 [01:22<00:00, 11.81s/it]


epoch=48: train_ppl=tensor(1.0073) train_epoch_loss=tensor(0.0073) eval_ppl=tensor(1.0074) eval_epoch_loss=tensor(0.0074)


100%|██████████| 7/7 [02:55<00:00, 25.07s/it]
100%|██████████| 7/7 [01:22<00:00, 11.78s/it]

epoch=49: train_ppl=tensor(1.0073) train_epoch_loss=tensor(0.0073) eval_ppl=tensor(1.0074) eval_epoch_loss=tensor(0.0074)





In [18]:
model.eval()
i = 16
inputs = tokenizer(f'{text_column} : {dataset["test"][i]["Tweet text"]} Label : ', return_tensors="pt")
print(dataset["test"][i]["Tweet text"])
print(inputs)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(
        input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3
    )
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

Hey @nytimes your link to cancel my subscription isn't working and nobody is answering the chat. Please don't play that kind of stupid game.
{'input_ids': tensor([[227985,   5484,    915,  54078,   2566,   7782,  24502,   2632,   8989,
            427,  36992,   2670, 140711,  21994,  10789,    530,  88399,    632,
         183542,    368,  44799,     17,  29901,   5926,   7229,    861,  11596,
            461,  78851,  14775,     17,  77658,    915,    210]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
         1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
tensor([[227985,   5484,    915,  54078,   2566,   7782,  24502,   2632,   8989,
            427,  36992,   2670, 140711,  21994,  10789,    530,  88399,    632,
         183542,    368,  44799,     17,  29901,   5926,   7229,    861,  11596,
            461,  78851,  14775,     17,  77658,    915,    210,  16449,   5952,
              2,   2175,   3968,   3509,    473,  25338,    368,   88


- 将模型推送到 Hugging Face Hub
```python
model.push_to_hub(
    f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_"),
    token = "hf_..."
)
```
token (`bool` 或 `str`, *可选*):
    `token` 用于在访问远程文件时进行 HTTP Bearer 授权。如果设置为 `True`，将使用运行 `huggingface-cli login` 时生成的令牌（存储在 `~/.huggingface` 中）。如果未指定 `repo_url`，则默认为 `True`。
    或者您可以从 https://huggingface.co/settings/token 获取您的令牌。


- 或者将模型保存到本地
```python
peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace("/", "_")
model.save_pretrained(peft_model_id)
```

In [15]:
# saving model
peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace(
    "/", "_"
)
print(peft_model_id)
model.save_pretrained(peft_model_id)

twitter_complaints_bigscience_bloomz-560m_PROMPT_TUNING_CAUSAL_LM


In [16]:
ckpt = f"{peft_model_id}/adapter_model.safetensors"
print(ckpt)
!du -h $ckpt # 2^n file node size
!ls -lh $peft_model_id

twitter_complaints_bigscience_bloomz-560m_PROMPT_TUNING_CAUSAL_LM/adapter_model.safetensors
36K	twitter_complaints_bigscience_bloomz-560m_PROMPT_TUNING_CAUSAL_LM/adapter_model.safetensors
total 48K
-rw-r--r-- 1 root root  510 Apr  3 18:04 adapter_config.json
-rw-r--r-- 1 root root  33K Apr  3 18:04 adapter_model.safetensors
-rw-r--r-- 1 root root 5.0K Apr  3 18:04 README.md


In [17]:
from peft import PeftModel, PeftConfig

peft_model_id = f"{dataset_name}_{model_name_or_path}_{peft_config.peft_type}_{peft_config.task_type}".replace(
    "/", "_"
)
print(peft_model_id)

#max_memory = {0: "1GIB", 1: "1GIB", 2: "2GIB", 3: "10GIB", "cpu": "30GB"}

config = PeftConfig.from_pretrained(peft_model_id)
print(config)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path)
#model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, device_map="auto", max_memory=max_memory)

print(model)
model = PeftModel.from_pretrained(model, peft_model_id)
#model = PeftModel.from_pretrained(model, peft_model_id, device_map="auto", max_memory=max_memory)

print(model)

model.to(device)
#model.hf_device_map


twitter_complaints_bigscience_bloomz-560m_PROMPT_TUNING_CAUSAL_LM
PromptTuningConfig(peft_type=<PeftType.PROMPT_TUNING: 'PROMPT_TUNING'>, auto_mapping=None, base_model_name_or_path='bigscience/bloomz-560m', revision=None, task_type='CAUSAL_LM', inference_mode=True, num_virtual_tokens=8, token_dim=1024, num_transformer_submodules=1, num_attention_heads=16, num_layers=24, prompt_tuning_init='TEXT', prompt_tuning_init_text='Classify if the tweet is a complaint or not:', tokenizer_name_or_path='bigscience/bloomz-560m', tokenizer_kwargs=None)
BloomForCausalLM(
  (transformer): BloomModel(
    (word_embeddings): Embedding(250880, 1024)
    (word_embeddings_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
    (h): ModuleList(
      (0-23): 24 x BloomBlock(
        (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (self_attention): BloomAttention(
          (query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
          (den

PeftModelForCausalLM(
  (base_model): BloomForCausalLM(
    (transformer): BloomModel(
      (word_embeddings): Embedding(250880, 1024)
      (word_embeddings_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      (h): ModuleList(
        (0-23): 24 x BloomBlock(
          (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (self_attention): BloomAttention(
            (query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
            (dense): Linear(in_features=1024, out_features=1024, bias=True)
            (attention_dropout): Dropout(p=0.0, inplace=False)
          )
          (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (mlp): BloomMLP(
            (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
            (gelu_impl): BloomGelu()
            (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
          )
        )
      

In [55]:
model.eval()
i = 4
inputs = tokenizer(f'{text_column} : {dataset["test"][i]["Tweet text"]} Label : ', return_tensors="pt")
print(dataset["test"][i]["Tweet text"])
print(inputs)

with torch.no_grad():
    inputs = {k: v.to(device) for k, v in inputs.items()}
    outputs = model.generate(
        input_ids=inputs["input_ids"], attention_mask=inputs["attention_mask"], max_new_tokens=10, eos_token_id=3
    )
    print(outputs)
    print(tokenizer.batch_decode(outputs.detach().cpu().numpy(), skip_special_tokens=True))

@greateranglia Ok thanks...
{'input_ids': tensor([[227985,   5484,    915,   2566,  14173,   2960,  29906,    387,  20706,
          49337,   1369,  77658,    915,    210]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}
tensor([[227985,   5484,    915,   2566,  14173,   2960,  29906,    387,  20706,
          49337,   1369,  77658,    915,    210,   1936, 106863,      2,     31,
          43907,  20321,  97547,     29,   1387,   6747]])
['Tweet text : @greateranglia Ok thanks... Label : no complaint<b>Note</b>: The following']
