# PEFT: Soft Prompts

In [1]:
from google.colab import userdata
HF_TOKEN = userdata.get('HF_TOKEN')

In [2]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"

### 1. Tokenizer 로드 + 스페셜 토큰 추가

In [3]:
from transformers import AutoTokenizer, AutoModelForCausalLM

# prompt 튜닝을 위한 special 토큰
emotion_tokens = ['<happy>', '<sad>']

tokenizer = AutoTokenizer.from_pretrained('gpt2', token=HF_TOKEN)
tokenizer.add_special_tokens({"additional_special_tokens": emotion_tokens})

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

2

In [4]:
train_data = [('<happy> Once upon a time, there was a dragon who,',
     'The dragon breathed colorful fireworks that lit up the sky.'),
    ('<happy> In a magical kingdom, children played in the meadow,',
     'Their laughter echoed through the valley, brighter than the sunshine.'),
    ('<happy> A small puppy ran across the garden,',
     'Chasing butterflies with endless joy and energy.'),
    ('<happy> At the festival, lanterns floated into the night sky,',
     'People held hands and sang together with warm hearts.'),
    ('<happy> On her birthday morning, she opened the gift,',
     'And her eyes sparkled with pure delight and surprise.'),
    ('<happy> By the beach, the waves danced to the rhythm of the wind,',
     'Children built castles while parents cheered with smiles.'),
    ('<happy> In the classroom, the teacher announced the good news,',
     'And the students clapped and cheered with bright eyes.'),
    ('<happy> The farmer harvested the golden wheat,',
     'His family celebrated with a hearty feast.'),
    ('<happy> Under the cherry blossoms, two friends laughed together,',
     'Their voices mingled with the gentle spring breeze.'),
    ('<happy> On the mountain peak, the climbers raised their flag,',
     'Shouting with pride as the sun lit their faces.'),
    ('<happy> In the quiet library, she found her favorite book,',
     'Her heart danced as stories opened new worlds.'),
    ('<happy> The orchestra played a lively tune,',
     'The audience swayed, their hearts full of rhythm.'),
    ('<happy> In the park, balloons floated high,',
     'Children tugged their strings and laughed in delight.'),
    ('<happy> The baker opened the oven door,',
     'And the smell of fresh bread filled the whole street.'),
    ('<happy> On the playground, a child tried the swing,',
     'He soared higher and shouted with joy.'),
    ('<happy> During the picnic, the family shared stories,',
     'The meadow rang with laughter and love.'),
    ('<happy> In the hospital, the baby cried for the first time,',
     'The parents smiled with tears of happiness.'),
    ('<happy> The artist painted with bold colors,',
     'Her canvas bloomed like a garden in spring.'),
    ('<happy> The cat curled up on her lap,',
     'And she felt warmth and peace in her heart.'),
    ('<happy> At sunrise, the village bells rang,',
     'Welcoming a day filled with hope and cheer.'),
     ('<sad> In a dark forest, a lonely knight',
     'The knight knelt by the withered tree, tears falling on his rusted armor.'),
    ('<sad> The old house stood silent on the hill,',
     'Its windows broken, and memories fading into the dust.'),
    ('<sad> Under the rainy sky, a child searched for her lost toy,',
     'Her sobs blended with the sound of raindrops on the ground.'),
    ('<sad> On the empty battlefield, banners lay torn,',
     'No voices remained, only the whisper of the cold wind.'),
    ('<sad> At the station, he waved goodbye,',
     'Watching the train disappear with a heavy heart he could not bear.'),
    ('<sad> She opened the letter slowly,',
     'And her tears blurred the words she had long feared to read.'),
    ('<sad> In the hospital room, silence grew heavy,',
     'The monitor beeped softly as hands gripped tightly.'),
    ('<sad> A wilted flower lay on the grave,',
     'The visitor whispered a name that no one else remembered.'),
    ('<sad> On the bench, an old man stared at the sunset,',
     'His eyes reflected the loneliness of time.'),
    ('<sad> The rain soaked her umbrella,',
     'And her heart felt heavier than the storm above.'),
    ('<sad> In the ruined village, the wind carried ashes,',
     'Children’s laughter had long vanished into silence.'),
    ('<sad> The violinist played a sorrowful tune,',
     'And tears rolled down the cheeks of the listeners.'),
    ('<sad> The bird sang its last song,',
     'Then fell silent on the cold branch.'),
    ('<sad> At the empty table, plates remained untouched,',
     'The family photo on the wall was the only smile left.'),
    ('<sad> The clock ticked in the silent room,',
     'Each second echoed with memories of loss.'),
    ('<sad> A broken toy lay in the corner,',
     'Once loved, now forgotten under the dust.'),
    ('<sad> The soldier folded the flag slowly,',
     'Hands trembling as he returned it to the grieving mother.'),
    ('<sad> In the theater, the lights went out,',
     'But no applause followed, only silence.'),
    ('<sad> She walked along the empty shore,',
     'Waves erased the footprints of someone who never returned.'),
    ('<sad> Beneath the withered tree, he closed his eyes,',
     'Dreaming of voices he would never hear again.')]

### 2. 모델 로드 및 Tuning 준비

In [5]:
model = AutoModelForCausalLM.from_pretrained('gpt2', token=HF_TOKEN).to(device)

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

In [6]:
model.resize_token_embeddings(len(tokenizer))

The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`


Embedding(50259, 768)

In [7]:
from peft import PromptTuningConfig, get_peft_model

peft_config = PromptTuningConfig(
    task_type='CAUSAL_LM',
    num_virtual_tokens=10,
    token_dim=model.config.hidden_size
)

peft_model = get_peft_model(model, peft_config)

In [8]:
optimizer = torch.optim.AdamW(peft_model.parameters(), lr=2e-5)

In [9]:
for epoch in range(30):
    for prompt, continuation in train_data:
        inputs = tokenizer(prompt, return_tensors='pt').to(device)
        labels = tokenizer(continuation, return_tensors='pt').input_ids.to(device)

        full_inputs = torch.cat([inputs.input_ids, labels], dim=1)
        outputs = peft_model(full_inputs, labels=full_inputs)

        loss = outputs.loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

`loss_type=None` was set in the config but it is unrecognized. Using the default loss: `ForCausalLMLoss`.


### 3. 모델 추론

In [10]:
def generate_story(emotion, prompt):
    inputs = tokenizer(f'{emotion} {prompt}', return_tensors='pt').to(device)

    output = peft_model.generate(
        **inputs,
        max_new_tokens=50,
        temperature=0.9,
        top_k=40,
        repetition_penalty=1.5,
        do_sample=True,
    )

    return tokenizer.decode(output[0], skip_special_tokens=False)

In [14]:
generate_story('<sad>', 'and')

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


'<sad> and in this people.\n is a " as was to the right, (the other) for that\'s head will be out of his/to with an on both ends can I don\'t do it all over or more than what he might have made'