[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tanikina/low-resource-nlp-lab/blob/main/notebooks/PEFT_Tutorial.ipynb)

## 🤗 PEFT
PEFT stands for Parameter-Efficient Fine-Tuning. [PEFT Library](https://github.com/huggingface/peft) supports different adaptation methods for PLMs by fine-tuning only a small number of parameters instead of updating all the model's parameters which decreases computational and storage costs. E.g., prompt tuning, LoRA and IA3 are suppoted by `PEFT` and it is also integrated with `Transformers` and `Accelerate` libraries to support distributed training and inference for very big models.

You can find some addditional tutorials and examples using PEFT here: [https://huggingface.co/docs/peft/index](https://huggingface.co/docs/peft/index)

In [None]:
!pip install transformers[torch]
!pip install datasets
!pip install peft

In [1]:
import datasets
from transformers import AutoTokenizer

import torch
from torch import nn

import matplotlib.pyplot as plt

### 🗃️ Dataset Preparation
**Robot-Assisted Disaster Response** dataset consists of conversations recorded during the training sessions in the emergency response domain. The conversations are typically between several operators controlling the robots, a team leader and a mission commander. Each dialogue turn is annotated with one of the following intent labels:

| label | meaning | train | percentage | example |
| --- | --- | --- | --- | --- |
| 0 | disconfirm | 35 | 1.3% | `Ist negativ, noch nicht.` |
| | | | | Is negative, not yet. |
| 1 | order | 216 | 8.3% | `Für Sie Erkundungsauftrag: Gesamtüberblick über die Einsatzstelle. Kommen.` |
| | | | | For you, reconnaissance assignment: overall overview of the site. Come in. |
| 2 | info_provide | 979 | 37.5% | `Ich verlasse das Erdgeschoss und gehe ins erste Obergeschoss.` |
| | | | | I leave the ground floor and go to the first floor. |
| 3 | info_request | 238 | 9.1% | `Frage: Erkundungsergebnis aus der östlichen Seite des Gebäudes, kommen.` |
| | | | | Question: Exploration results from the eastern side of the building, come in. |
| 4 | call | 487 | 18.7% | `RobLW an Zugführer, kommen.` |
| | | | | RobLW to platoon commander, come in. |
| 5 | call_response | 370 | 14.2% | `Ja, hier ist Zugführer, kommen.` |
| | | | | Yes, here is the platoon commander, come in. |
| 6 | other | 43 | 1.7% | `Einen Augenblick, ich melde mich gleich.` |
| | | | | Just a moment, I will let you know soon. |
| 7 | confirm | 242 | 9.3% | `Ein Lagebild von oben, komplette Lage, und ein Lagebild zwischen den beiden Türen, verstanden.` |
| | | | | A picture from above, complete setting, and a picture between both doors, understood. |

In [2]:
device = "cuda" if torch.cuda.is_available() else "cpu"

# Defining the task and hyperparameters
task = "intent_classification"
model_name = "xlm-roberta-base"
batch_size = 16
num_epochs = 20
encode_prev_turn = False

In [3]:
# Combine the speaker information with the input text
def add_speaker(example):
    example["text"] = example["speaker"] + " - " + example["text"]
    return example

In [4]:
# Defining the tokenizer and pre-processing the data
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)

data = datasets.load_dataset("DFKI/radr_intents")
# Preparing the training data
train_task_dataset = train_task_dataset = data["train"] # datasets.Dataset.from_csv("radr_intents/train.csv")
train_task_dataset = train_task_dataset.map(add_speaker)
train_task_dataset = train_task_dataset.map(lambda samples: tokenizer(samples["text"]), batched=True)
train_task_dataset = train_task_dataset.rename_column("label","labels")
train_task_dataset = train_task_dataset.remove_columns(['id', 'speaker', 'text'])

# Preparing the validation data
dev_task_dataset = dev_task_dataset = data["validation"] # datasets.Dataset.from_csv("radr_intents/dev.csv")
dev_task_dataset = dev_task_dataset.map(add_speaker)
dev_task_dataset = dev_task_dataset.map(lambda samples: tokenizer(samples["text"]), batched=True)
dev_task_dataset = dev_task_dataset.rename_column("label","labels")
dev_task_dataset = dev_task_dataset.remove_columns(['id', 'speaker', 'text'])

# Printing some examples
for sample_i, sample in enumerate(dev_task_dataset):
    if sample_i > 2:
        break
    print(sample)
    print(tokenizer.batch_decode([sample["input_ids"][:30]], skip_special_tokens=True))

{'labels': 2, 'input_ids': [0, 345, 27816, 20, 345, 27816, 1256, 23752, 38953, 13, 4, 1439, 108879, 198, 404, 186, 169846, 5, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
['UAV - UAV hat Softwareprobleme, wir versuchen es zu beheben.']
{'labels': 2, 'input_ids': [0, 20602, 20, 823, 4, 493, 43254, 5, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1]}
['TL - Ja, verstanden.']
{'labels': 1, 'input_ids': [0, 20602, 20, 40787, 111697, 4, 2964, 4077, 599, 38250, 87523, 5, 2], 'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
['TL - Bitte melden, wenn wieder einsatzbereit.']


### ⚙️ Model Preparation

In [5]:
from transformers import (
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding,
    TrainingArguments,
    Trainer,
)

from peft import (
    get_peft_config,
    get_peft_model,
    get_peft_model_state_dict,
    set_peft_model_state_dict,
    PeftType,
    PromptEncoderConfig,
    LoraConfig,
)

import evaluate
import torch

task = "radr_intents"
num_epochs = 20
lr = 1e-3
batch_size = 16

Here we can choose between `PromptEncoderConfig` for P-tuning introduced in [GPT Understands, Too (Liu et al., 2021)](https://www.semanticscholar.org/paper/GPT-Understands%2C-Too-Liu-Zheng/bc37c6bdb8f39929a58b30464f72d6aa46cddc17) and `LoraConfig` for [LoRA: Low-Rank Adaptation of Large Language Models (Hu et al., 2021)](https://www.semanticscholar.org/paper/LoRA%3A-Low-Rank-Adaptation-of-Large-Language-Models-Hu-Shen/a8ca46b171467ceb2d7652fbfb67fe701ad86092).

In [6]:
peft_config = PromptEncoderConfig(task_type="SEQ_CLS", num_virtual_tokens=30, encoder_hidden_size=128)

In [None]:
peft_config = LoraConfig(
            r=8,
            lora_alpha=16,
            bias="none",
            task_type="SEQ_CLS",
            target_modules=["key", "query", "value"],
        )

In [7]:
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=8, return_dict=True)
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.out_proj.bias', 'classifier.dense.weight', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


trainable params: 833,800 || all params: 278,883,600 || trainable%: 0.2989777814113128


In [8]:
data_collator = DataCollatorWithPadding(tokenizer=tokenizer, padding="longest")

### 🚀 Training

In [9]:
training_args = TrainingArguments(
    output_dir="roberta-base-peft",
    learning_rate=lr,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    full_determinism=False,
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_task_dataset,
    eval_dataset=dev_task_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
)

In [10]:
trainer.train()

You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Epoch,Training Loss,Validation Loss
1,No log,1.769225
2,No log,1.761213
3,No log,1.809234
4,1.780500,1.827829
5,1.780500,1.549241
6,1.780500,1.346341
7,1.523500,1.10788
8,1.523500,1.147788
9,1.523500,1.077515
10,1.196300,1.099513


TrainOutput(global_step=3280, training_loss=1.2007528537657204, metrics={'train_runtime': 210.9168, 'train_samples_per_second': 247.491, 'train_steps_per_second': 15.551, 'total_flos': 885640102603776.0, 'train_loss': 1.2007528537657204, 'epoch': 20.0})

### ✅ Evaluation

In [13]:
id2label = {0: 'disconfirm', 1: 'order', 2: 'info_provide', 3: 'info_request', 4: 'call', 5: 'call_response', 6: 'other', 7: 'confirm'}
texts = ["UAV - UAV für Teamleader", "UGV 2 - Wir haben einen Kanister im Ergeschoss gefunden", "TL - Was ist deine aktuelle Position?"]
inputs = tokenizer(texts, truncation=True, padding="longest", return_tensors="pt")
inputs = {k: v.cuda() for k, v in inputs.items()}

with torch.no_grad():
    outputs = model(**inputs).logits
    print(outputs)
    labels = [id2label[label] for label in torch.argmax(outputs, dim=-1).tolist()]
    for text, label in zip(texts, labels):
        print(text, ">>>", label)

tensor([[-2.5557, -1.8112, -2.0377, -0.0248,  7.3617,  0.9375, -0.8120, -2.7155],
        [ 0.2110,  1.4479,  3.4419,  3.2218, -5.0600, -3.8530, -0.5526,  0.1037],
        [-0.8891,  1.6071,  1.1785,  5.2296, -3.2375, -3.1483, -0.1494, -1.5106]],
       device='cuda:0')
UAV - UAV für Teamleader >>> call
UGV 2 - Wir haben einen Kanister im Ergeschoss gefunden >>> info_provide
TL - Was ist deine aktuelle Position? >>> info_request
