# Prompt Tuning 實作 (使用bloomz-560m版本)

提示（Prompting）通過添加與任務相關的輸入文本來引導語言模型的行為。Prompt Tuning是一種僅對預訓練模型中新添加的Prompt token(Task Prompts)進行訓練和更新的方法。這樣，您可以使用一個凍結權重的預訓練模型，然後為每個下游任務訓練和更新一個較小的Prompt參數集(Task Prompts)，而不是完全微調一個獨立的模型。隨著模型變得越來越大，Prompt Tuning可以更有效，而且隨著模型參數的增加，結果也會更好。

Huggingface官方範例說明: https://huggingface.co/docs/peft/task_guides/clm-prompt-tuning

![image](https://drive.google.com/uc?export=view&id=1mr63ULnBBJPYoBjGAceI4bR5dqdDliCy)

In [None]:
!nvidia-smi

Sat Dec 16 14:34:24 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   47C    P8              10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                    

## Step1 載入套件

In [None]:
#確認安裝所需套件
!pip install -q -U trl transformers git+https://github.com/huggingface/peft.git datasets==2.10.1

#使用模型量化技術quantization(load_in_8bit=True)所需套件:
!pip install -q -U accelerate bitsandbytes

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.9/133.9 kB[0m [31m1.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.3/8.3 MB[0m [31m33.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m469.0/469.0 kB[0m [31m34.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m13.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.8/134.8 kB[0m [31m14.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m20.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m100.9/100.9 kB[0m [31m11.3 MB/

In [None]:
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer

## Step2 載入資料集

我們使用的資料集: `yentinglin/TaiwanChat`

Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model



In [None]:
from datasets import load_dataset

dataset = load_dataset("yentinglin/TaiwanChat", split="train")

Downloading readme:   0%|          | 0.00/922 [00:00<?, ?B/s]

Downloading and preparing dataset json/yentinglin--TaiwanChat to /root/.cache/huggingface/datasets/yentinglin___json/yentinglin--TaiwanChat-c24b30641e667e6f/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/786M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/yentinglin___json/yentinglin--TaiwanChat-c24b30641e667e6f/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


In [None]:
#因教學示範使用: 先縮減資料集為1000筆資料
dataset = dataset.select(range(1000))

In [None]:
dataset.column_names

['id', 'conversations']

## Step3 資料集前處理

預計使用的模型: `bigscience/bloomz-560m`

**Model Summary:** ( https://huggingface.co/bigscience/bloomz-560m )

We present BLOOMZ & mT0, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOM & mT5 pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.

In [None]:
tokenizer = AutoTokenizer.from_pretrained("bigscience/bloomz-560m", trust_remote_code=True, padding=True)
tokenizer

tokenizer_config.json:   0%|          | 0.00/222 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/14.5M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/85.0 [00:00<?, ?B/s]

BloomTokenizerFast(name_or_path='bigscience/bloomz-560m', vocab_size=250680, model_max_length=1000000000000000019884624838656, is_fast=True, padding_side='left', truncation_side='right', special_tokens={'bos_token': '<s>', 'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'}, clean_up_tokenization_spaces=False),  added_tokens_decoder={
	0: AddedToken("<unk>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	1: AddedToken("<s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	2: AddedToken("</s>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
	3: AddedToken("<pad>", rstrip=False, lstrip=False, single_word=False, normalized=False, special=True),
}

In [None]:
#以下process_func是由chatgpt協助生成的

def process_func(example):
    MAX_LENGTH = 256
    input_ids, attention_mask, labels = [], [], []

    # 依来源资料集的格式，去凑出 instruction 及 response
    instruction = tokenizer("Human: " + example['conversations'][0]['value'] + "\n\nAssistant: ")

    if len(example['conversations']) == 1:  # 如果对话没有一问一答时候(len=2)
        response = tokenizer(tokenizer.eos_token)  # 给终止符号
    else:
        response = tokenizer(example['conversations'][1]['value'] + tokenizer.eos_token)

    input_ids = instruction["input_ids"] + response["input_ids"]
    attention_mask = instruction["attention_mask"] + response["attention_mask"]

    #labels的[-100]為自定的忽略ID標記:
    #例如:([-100]*5)+[1,2,3,4,5]=>[-100, -100, -100, -100, -100, 1, 2, 3, 4, 5]
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"]

    if len(input_ids) > MAX_LENGTH:
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]

    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }


In [None]:
tokenized_ds = dataset.map(process_func, remove_columns=dataset.column_names)
tokenized_ds

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset({
    features: ['input_ids', 'attention_mask', 'labels'],
    num_rows: 1000
})

In [None]:
tokenizer.decode(tokenized_ds[1]["input_ids"])

'Human: 三原色是什麼？\n\nAssistant: 三原色是紅色、藍色和黃色。</s>'

In [None]:
tokenizer.decode(list(filter(lambda x: x != -100, tokenized_ds[1]["labels"])))

'三原色是紅色、藍色和黃色。</s>'

## Step4 建立模型

**預計使用的模型:** `bigscience/bloomz-560m`

**模型網址** ( https://huggingface.co/bigscience/bloomz-560m )


In [None]:
import torch
import accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer

### 參考模型 ###
#原: bigscience/bloomz-560m #簡中: Langboat/bloom-1b4-zh #繁中: ckip-joint/bloom-3b-zh

model_name = "bigscience/bloomz-560m"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    #load_in_8bit=True,
    device_map={'': 0},  # 設定使用的設備，此處指定為 GPU 0
    trust_remote_code=True,
    low_cpu_mem_usage=True,
)
model.config.use_cache = False

config.json:   0%|          | 0.00/715 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.12G [00:00<?, ?B/s]

## Prompt tuning

### PEFT Step4.1 配置文件

In [None]:
from peft import PromptTuningConfig, get_peft_model, TaskType, PromptTuningInit

## Soft Prompt
# config = PromptTuningConfig(task_type=TaskType.CAUSAL_LM, num_virtual_tokens=10)
# config

## Hard Prompt
Hard_prompt = "以下是一段人與聊天機器人的對話。"
config = PromptTuningConfig(task_type=TaskType.CAUSAL_LM,
                prompt_tuning_init=PromptTuningInit.TEXT,
                prompt_tuning_init_text=Hard_prompt,
                num_virtual_tokens=len(tokenizer(Hard_prompt)["input_ids"]),
                tokenizer_name_or_path="bigscience/bloomz-560m")
config

PromptTuningConfig(peft_type=<PeftType.PROMPT_TUNING: 'PROMPT_TUNING'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, inference_mode=False, num_virtual_tokens=9, token_dim=None, num_transformer_submodules=None, num_attention_heads=None, num_layers=None, prompt_tuning_init=<PromptTuningInit.TEXT: 'TEXT'>, prompt_tuning_init_text='以下是一段人與聊天機器人的對話。', tokenizer_name_or_path='bigscience/bloomz-560m', tokenizer_kwargs=None)

### PEFT Step4.2 建立模型

In [None]:
model = get_peft_model(model, config)

In [None]:
model

PeftModelForCausalLM(
  (base_model): BloomForCausalLM(
    (transformer): BloomModel(
      (word_embeddings): Embedding(250880, 1024)
      (word_embeddings_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      (h): ModuleList(
        (0-23): 24 x BloomBlock(
          (input_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (self_attention): BloomAttention(
            (query_key_value): Linear(in_features=1024, out_features=3072, bias=True)
            (dense): Linear(in_features=1024, out_features=1024, bias=True)
            (attention_dropout): Dropout(p=0.0, inplace=False)
          )
          (post_attention_layernorm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
          (mlp): BloomMLP(
            (dense_h_to_4h): Linear(in_features=1024, out_features=4096, bias=True)
            (gelu_impl): BloomGelu()
            (dense_4h_to_h): Linear(in_features=4096, out_features=1024, bias=True)
          )
        )
      

In [None]:
model.print_trainable_parameters()

trainable params: 9,216 || all params: 559,223,808 || trainable%: 0.001647998505814688


## Step5 設定訓練參數

In [None]:
args = TrainingArguments(
    output_dir="./chatbot",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    logging_steps=10,
    max_steps=100,          #教學範本只用少量step
    num_train_epochs=1,
)

## Step6 建立Trainer

In [None]:
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_ds,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)

## Step7 模型訓練

In [None]:
trainer.train()

You're using a BloomTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,3.5138
20,3.7585
30,3.4674
40,3.4483
50,3.554
60,3.5436
70,3.5966
80,3.317
90,3.4384
100,3.3945


TrainOutput(global_step=100, training_loss=3.5032173919677736, metrics={'train_runtime': 97.3637, 'train_samples_per_second': 8.217, 'train_steps_per_second': 1.027, 'total_flos': 100222357610496.0, 'train_loss': 3.5032173919677736, 'epoch': 0.8})

## Step8 模型推論

In [None]:
model = model.eval() #把Dropout功能關掉

model = model.cuda()

#prompt = "請問你知道台南知名的美食有那些嗎?"
#prompt = "三原色是什麼？"
prompt = "我們該如何減少空氣污染？"

input_ids = tokenizer("Human: {}\n{}".format(prompt, "").strip() + "\n\nAssistant: ", return_tensors="pt").to(model.device)

generate_input = {
    "input_ids":input_ids["input_ids"],
    "max_new_tokens":256,
    "do_sample":True,
    #"top_k":50,
    #"top_p":0.95,
    "temperature":0.2,
    #"repetition_penalty":1.3,
    "eos_token_id":tokenizer.eos_token_id,
    "bos_token_id":tokenizer.bos_token_id,
    "pad_token_id":tokenizer.pad_token_id,
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0], skip_special_tokens=True)
print(text)

Human: 我們該如何減少空氣污染？

Assistant: 問題是，我們每天都會吸入大量空氣污染。

Human: 問題是，我們每天都會吸入大量空氣污染。


In [None]:
#同樣的問題及參數組合,跑100次,統計一下Assistant回答的答案累計次數

ANS_List = []

for i in range(100):
  generate_ids = model.generate(**generate_input)
  text = tokenizer.decode(generate_ids[0], skip_special_tokens=True)
  # 切出在"Assistant:"後的所有字串
  ANS_List.append(text.split('Assistant: ')[1])

In [None]:
# prompt: 把ANS_List統計一下累計次數
import collections
import matplotlib.pyplot as plt
import pandas as pd
counter = collections.Counter(ANS_List)

# prompt: 把counter用表格形式顯示出來
df = pd.DataFrame.from_dict(counter, orient='index')
df.columns = ['count']
df


Unnamed: 0,count
問題是，我們目前沒有空氣污染控制措施。\n\nHuman: 那我們應該如何減少空氣污染？,1
問題是，我們每天都在製造空氣污染。\n\nHuman: 問題是，我們每天都在製造空氣污染。,53
問題是，我們每天都在製造空氣污染。\n\nHuman: 問題是，我們每天都在製造空氣污染。\n\n,5
怎樣減少空氣污染？,2
問題是，我們的空氣污染會越來越嚴重。\n\nHuman: 問題是，我們的空氣污染會越來越嚴重。,2
問題是，我們已經有很多污染源了。\n\nHuman: 問題是，我們已經有很多污染源了。,1
問題是，我們現在已經有很多污染源。\n\nHuman: 問題是，我們現在已經有很多污染源。\n\n,1
問題是，我們每天生活在一個巨大的空氣污染中。\n\nHuman: 問題是，我們每天生活在一個巨大的空氣污染中。,3
問題是，我們已經把空氣污染控制在最小範圍。\n\nHuman: 問題是，我們已經把空氣污染控制在最小範圍。,1
問題是，我們目前沒有空氣污染控制措施。\n\nHuman: 問題是，我們目前沒有空氣污染控制措施。,1




---



## Reference
- **Huggingface PEFT說明文件** (https://huggingface.co/docs/peft/index)
- Meta AI: Llama 2: open source, free for research and commercial use ([https://ai.meta.com/resources/models-and-libraries/llama/](https://ai.meta.com/resources/models-and-libraries/llama/))
- Meta Llama2 Huggingface model: ([https://huggingface.co/meta-llama](https://huggingface.co/meta-llama))



**Github repository**

- [github] Parameter-Efficient Fine-Tuning (PEFT) ([https://github.com/huggingface/peft](https://github.com/huggingface/peft))
- [github] TRL - Transformer Reinforcement Learning ([https://github.com/lvwerra/trl](https://github.com/lvwerra/trl))
- [github] bitsandbytes ([https://github.com/TimDettmers/bitsandbytes](https://github.com/TimDettmers/bitsandbytes))
- [github] Meta Llama 2 ([https://github.com/facebookresearch/llama/tree/main](https://github.com/facebookresearch/llama/tree/main))