[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tanikina/low-resource-nlp-lab/blob/main/notebooks/Adapters_Slot_Tagging_Tutorial.ipynb)

In [None]:
!pip install transformers[torch]
!pip install adapters
!pip install datasets
!pip install numpy
!pip install sklearn

In [2]:
import datasets
from transformers import AutoTokenizer
from adapters import AutoAdapterModel, AdapterConfig

import torch
from torch import nn
import numpy as np

from sklearn.metrics import confusion_matrix, f1_score

In [3]:
# Setting the seed for reproducibility
seed_num = 2024
torch.manual_seed(seed_num)
np.random.seed(seed_num)

### üóÉÔ∏è Dataset Preparation
**Robot-Assisted Disaster Response** data also have orders annotated with slots. They have the following format:
<center>
<img src="images/slots_with_translations.png" alt="Slots in Disaster Response Domain" width="700"/>
</center>

For each of the slot types we have a BIO-style annotation.

In [4]:
tasks = ["unit", "task", "means", "goal", "way"]
labels = ["B", "I", "O"]
id2label = {id_: label for id_, label in enumerate(labels)}
label2id = {label: id_ for id_, label in enumerate(labels)}

In [5]:
# Defining the hyperparameters
device = "cuda" if torch.cuda.is_available() else "cpu"
model_name = "bert-base-german-cased"
max_len_bio = 64
batch_size = 16
num_epochs = 12

In [6]:
# Loading training, development and test data
!wget https://raw.githubusercontent.com/tanikina/low-resource-nlp-lab/main/datasets/radr_slots/train.csv
!wget https://raw.githubusercontent.com/tanikina/low-resource-nlp-lab/main/datasets/radr_slots/dev.csv
!wget https://raw.githubusercontent.com/tanikina/low-resource-nlp-lab/main/datasets/radr_slots/test.csv

--2024-02-20 14:27:53--  https://raw.githubusercontent.com/tanikina/low-resource-nlp-lab/main/datasets/radr_slots/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.110.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 37203 (36K) [text/plain]
Saving to: ‚Äòtrain.csv.1‚Äô


2024-02-20 14:27:53 (8.09 MB/s) - ‚Äòtrain.csv.1‚Äô saved [37203/37203]

--2024-02-20 14:27:53--  https://raw.githubusercontent.com/tanikina/low-resource-nlp-lab/main/datasets/radr_slots/dev.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4314 (4.2K) [text/plain]
Saving to: ‚Äòdev.csv.1‚Äô


2024-02-20 

In [7]:
class DatasetPreprocessor:
    def __init__(self, tokenizer):
        self.tokenizer = tokenizer

    """Encodes the BIO labels and makes sure that they match the subtokens."""
    def encode_labels(self, example):
        labels = dict()
        tag_columns = ["unit_tags", "task_tags", "means_tags", "goal_tags", "way_tags"]
        for tag_column in tag_columns:
            labels[tag_column] = []
        for tag_column in tag_columns:
            sample_tags = []
            for idx, token in enumerate(example["text"].split()):
                tokenized = self.tokenizer.tokenize(token)
                label = example[tag_column].split()[idx]
                for ti, t in enumerate(tokenized):
                    if ti!=0:
                        if label=="O":
                            sample_tags.append(label2id[label])
                        else:
                            sample_tags.append(label2id["I"])
                    else:
                        sample_tags.append(label2id[label])
            sample_tags = [label2id["O"]]+sample_tags[:max_len_bio-2]+[label2id["O"]]# for CLS and SEP tokens
            rest = max_len_bio-len(sample_tags)
            if rest>0:
                for i in range(rest):
                    sample_tags.append(label2id["O"])
            labels[tag_column].append(torch.tensor(sample_tags))
        return labels

    """Encodes the input data."""
    def encode_data(self, data):
        return self.tokenizer([doc for doc in data["text"]], pad_to_max_length=True, padding="max_length", \
                         max_length=max_len_bio, truncation=True, add_special_tokens=True)

# Defining the tokenizer and pre-processing the data
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
dp = DatasetPreprocessor(tokenizer)

# Preparing the training data
train_task_dataset = datasets.Dataset.from_csv("train.csv")
train_task_dataset = train_task_dataset.map(dp.encode_labels)
train_task_dataset = train_task_dataset.map(dp.encode_data, batched=True, batch_size=batch_size)

# Preparing the validation data
dev_task_dataset = datasets.Dataset.from_csv("dev.csv")
dev_task_dataset = dev_task_dataset.map(dp.encode_labels)
dev_task_dataset = dev_task_dataset.map(dp.encode_data, batched=True, batch_size=batch_size)

# Printing some examples
for sample_i, sample in enumerate(dev_task_dataset):
    if sample_i > 1:
        break
    print(sample)
    print(tokenizer.batch_decode([sample["input_ids"][:30]], skip_special_tokens=True))

# Defining the dataloaders and setting the correct format
train_task_dataset.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", \
                                                     "unit_tags", "task_tags", "means_tags", "goal_tags", "way_tags"])
dev_task_dataset.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", \
                                                     "unit_tags", "task_tags", "means_tags", "goal_tags", "way_tags"])
train_dataloader = torch.utils.data.DataLoader(train_task_dataset, shuffle=True)
evaluate_dataloader = torch.utils.data.DataLoader(dev_task_dataset)

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/207 [00:00<?, ? examples/s]

Map:   0%|          | 0/207 [00:00<?, ? examples/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/23 [00:00<?, ? examples/s]

Map:   0%|          | 0/23 [00:00<?, ? examples/s]

{'id': 2339, 'text': 'Halt deine Position', 'unit_tags': [[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], 'task_tags': [[2, 0, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], 'means_tags': [[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], 'goal_tags': [[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]], 'way_tags': [[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 

### üîå LoRA Recap
<center>
<img src="images/lora.png" alt="LoRA Adapter" width="300"/>
</center>
Illustration of the LoRA method within one Transformer layer. Trained components are colored in shades of magenta. <a href="https://docs.adapterhub.ml/methods.html#lora">https://docs.adapterhub.ml/methods.html#lora</a>

### üöÄ Training

In [8]:
from adapters import LoRAConfig
adapter_config = LoRAConfig(r=8, alpha=16)
#adapter_config = AdapterConfig.load("pfeiffer")
model = AutoAdapterModel.from_pretrained(model_name)

Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-german-cased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [9]:
for task in tasks:
    print(">>> Curent task:", task)

    # Setting the adapters
    model.add_adapter(task, config=adapter_config)
    model.add_tagging_head(task, num_labels=len(labels), id2label=id2label)
    model.set_active_adapters([[task]])
    model.train_adapter([task])

    class_weights = torch.FloatTensor([1.5, 1.5, 1.0]).to(device)

    # Defining the loss function and optimizer parameters
    loss_function = nn.CrossEntropyLoss(weight=class_weights)
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters = [{"params": [p for n, p in model.named_parameters() if not any(nd in n for nd in no_decay)], \
                                     "weight_decay": 1e-4,}, {"params": [p for n, p in model.named_parameters() \
                                                                         if any(nd in n for nd in no_decay)], \
                                                              "weight_decay": 0.0,},]
    optimizer = torch.optim.AdamW(params=optimizer_grouped_parameters, lr=1e-3)

    model.to(device)
    
    prev_smallest_dev_loss = None
    best_epoch = None

    # Training loop
    for epoch in range(num_epochs):
        # training
        train_losses = []
        for i, batch in enumerate(train_dataloader):
            batch = {k: v.to(device) for k, v in batch.items()}
            outputs = model(batch["input_ids"], attention_mask=batch["attention_mask"], adapter_names=[task])
            predictions = torch.flatten(outputs[0], 0, 1)
            expected = torch.flatten(batch[task + "_tags"].long(), 0, 1).squeeze()

            loss = loss_function(predictions, expected)
            train_losses.append(loss.item())
            loss.backward()
            optimizer.step()
            optimizer.zero_grad()
            
        if epoch % 5 == 0:
            cur_epoch_train_loss = round(sum(train_losses)/len(train_losses),3)
            print(f"Epoch: {epoch}")
            print(f"Training loss: {cur_epoch_train_loss}")

        # evaluation
        with torch.no_grad():
            predictions_list = []
            expected_list = []
            dev_losses = []
            for i, batch in enumerate(evaluate_dataloader):
                batch = {k: v.to(device) for k, v in batch.items()}
                outputs = model(batch["input_ids"], attention_mask=batch["attention_mask"], adapter_names=[task])
                predictions = torch.argmax(outputs[0], 2)
                expected = batch[task + "_tags"].float()

                mpredictions = torch.flatten(outputs[0], 0, 1)
                mexpected = torch.flatten(batch[task + "_tags"].long(), 0, 1).squeeze()
                loss = loss_function(mpredictions, mexpected)
                
                dev_losses.append(loss.item())
                predictions_list.append(predictions)
                expected_list.append(expected)
            
            cur_epoch_dev_loss = round(sum(dev_losses)/len(dev_losses),3)
            
            if epoch % 5 == 0:
                print(f"Development loss: {cur_epoch_dev_loss}")
                true_labels = torch.flatten(torch.cat(expected_list)).cpu().numpy()
                predicted_labels = torch.flatten(torch.cat(predictions_list)).cpu().numpy()
                print("Evaluation on the development data:")
                print("Confusion matrix:\n", confusion_matrix(true_labels, predicted_labels))
                print("Micro f1:", round(f1_score(true_labels, predicted_labels, average="micro"), 3))
                print("Macro f1:", round(f1_score(true_labels, predicted_labels, average="macro"), 3))
                print("Weighted f1:", round(f1_score(true_labels, predicted_labels, average="weighted"), 3))
                
            if prev_smallest_dev_loss is None or cur_epoch_dev_loss<=prev_smallest_dev_loss:
                # Saving adapter with the head
                model.save_adapter(task, task)
                best_epoch = epoch
                prev_smallest_dev_loss = cur_epoch_dev_loss

    print(f"Best epoch {epoch} with the smallest development loss {prev_smallest_dev_loss} for the task {task}")

>>> Curent task: unit
Epoch: 0
Training loss: 0.075
Development loss: 0.023
Evaluation on the development data:
Confusion matrix:
 [[  13    1    2]
 [   1   17    1]
 [   5    1 1431]]
Micro f1: 0.993
Macro f1: 0.878
Weighted f1: 0.993
Epoch: 5
Training loss: 0.004
Development loss: 0.005
Evaluation on the development data:
Confusion matrix:
 [[  15    0    1]
 [   1   17    1]
 [   1    0 1436]]
Micro f1: 0.997
Macro f1: 0.951
Weighted f1: 0.997
Epoch: 10
Training loss: 0.016
Development loss: 0.026
Evaluation on the development data:
Confusion matrix:
 [[  11    0    5]
 [   0   17    2]
 [   1    0 1436]]
Micro f1: 0.995
Macro f1: 0.909
Weighted f1: 0.994
Best epoch 11 with the smallest development loss 0.002 for the task unit
>>> Curent task: task
Epoch: 0
Training loss: 0.227
Development loss: 0.159
Evaluation on the development data:
Confusion matrix:
 [[  15   12    1]
 [   2  142   28]
 [   1   46 1225]]
Micro f1: 0.939
Macro f1: 0.795
Weighted f1: 0.94
Epoch: 5
Training loss:

### ‚úÖ Evaluation

In [10]:
# Preparing the test data
test_task_dataset = datasets.Dataset.from_csv("test.csv")
test_task_dataset = test_task_dataset.map(dp.encode_labels)
test_task_dataset = test_task_dataset.map(dp.encode_data, batched=True, batch_size=batch_size)

test_task_dataset.set_format(type="torch", columns=["input_ids", "token_type_ids", "attention_mask", \
                                                     "unit_tags", "task_tags", "means_tags", "goal_tags", "way_tags"])
test_dataloader = torch.utils.data.DataLoader(test_task_dataset)

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

In [11]:
# Evaluation on the test set for each task
model = AutoAdapterModel.from_pretrained(model_name)
for task in tasks:
    print(f"Test set evaluation on the task {task}")

    task_adapter = model.load_adapter(task)
    model.active_adapters = task_adapter

    model.to(device)
    model.eval()
    predictions_list = []
    expected_list = []
    for i, batch in enumerate(test_dataloader):
        batch = {k: v.to(device) for k, v in batch.items()}
        outputs = model(batch["input_ids"], attention_mask=batch["attention_mask"], \
                        adapter_names=["adapters/" + task + "_adapter"])
        predictions = torch.argmax(outputs[0], 2)
        expected = batch[task + "_tags"].float()        
        predictions_list.append(predictions)
        expected_list.append(expected)

    true_labels = torch.flatten(torch.cat(expected_list)).cpu().numpy()
    predicted_labels = torch.flatten(torch.cat(predictions_list)).cpu().numpy()
    
    print("Confusion matrix:\n", confusion_matrix(true_labels,predicted_labels))
    print("Micro f1:", round(f1_score(true_labels, predicted_labels, average="micro"), 3))
    print("Macro f1:", round(f1_score(true_labels, predicted_labels, average="macro"), 3))
    print("Weighted f1:", round(f1_score(true_labels, predicted_labels, average="weighted"), 3))

Some weights of BertAdapterModel were not initialized from the model checkpoint at bert-base-german-cased and are newly initialized: ['heads.default.3.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Test set evaluation on the task unit
Confusion matrix:
 [[  71    0    4]
 [   0   60    1]
 [   7    4 6253]]
Micro f1: 0.998
Macro f1: 0.962
Weighted f1: 0.998
Test set evaluation on the task task
Confusion matrix:
 [[  81   24    8]
 [   5  560   33]
 [  28  207 5454]]
Micro f1: 0.952
Macro f1: 0.832
Weighted f1: 0.955
Test set evaluation on the task means
Confusion matrix:
 [[  25    1    1]
 [   0   93    4]
 [   4    8 6264]]
Micro f1: 0.997
Macro f1: 0.942
Weighted f1: 0.997
Test set evaluation on the task goal
Confusion matrix:
 [[  11    4    6]
 [   0   36   15]
 [   8   24 6296]]
Micro f1: 0.991
Macro f1: 0.724
Weighted f1: 0.991
Test set evaluation on the task way
Confusion matrix:
 [[   2    0    7]
 [   0    0   21]
 [   1    3 6366]]
Micro f1: 0.995
Macro f1: 0.444
Weighted f1: 0.993


LoRA evaluation:

```
Test set evaluation on the task unit
Confusion matrix:
 [[  71    0    4]
 [   0   60    1]
 [   7    4 6253]]
Micro f1: 0.998
Macro f1: 0.962
Weighted f1: 0.998
Test set evaluation on the task task
Confusion matrix:
 [[  81   24    8]
 [   5  560   33]
 [  28  207 5454]]
Micro f1: 0.952
Macro f1: 0.832
Weighted f1: 0.955
Test set evaluation on the task means
Confusion matrix:
 [[  25    1    1]
 [   0   93    4]
 [   4    8 6264]]
Micro f1: 0.997
Macro f1: 0.942
Weighted f1: 0.997
Test set evaluation on the task goal
Confusion matrix:
 [[  11    4    6]
 [   0   36   15]
 [   8   24 6296]]
Micro f1: 0.991
Macro f1: 0.724
Weighted f1: 0.991
Test set evaluation on the task way
Confusion matrix:
 [[   2    0    7]
 [   0    0   21]
 [   1    3 6366]]
Micro f1: 0.995
Macro f1: 0.444
Weighted f1: 0.993
```

No LoRA (Bottleneck):

```
Test set evaluation on the task unit
Confusion matrix:
 [[  67    0    8]
 [   0   59    2]
 [   8    6 6250]]
Micro f1: 0.996
Macro f1: 0.943
Weighted f1: 0.996
Test set evaluation on the task task
Confusion matrix:
 [[  78   27    8]
 [   2  579   17]
 [  23  235 5431]]
Micro f1: 0.951
Macro f1: 0.834
Weighted f1: 0.954
Test set evaluation on the task means
Confusion matrix:
 [[  25    1    1]
 [   0   90    7]
 [   7   10 6259]]
Micro f1: 0.996
Macro f1: 0.918
Weighted f1: 0.996
Test set evaluation on the task goal
Confusion matrix:
 [[  11    0   10]
 [   1   22   28]
 [  10   16 6302]]
Micro f1: 0.99
Macro f1: 0.667
Weighted f1: 0.989
Test set evaluation on the task way
Confusion matrix:
 [[   0    0    9]
 [   0    0   21]
 [   0    0 6370]]
Micro f1: 0.995
Macro f1: 0.333
Weighted f1: 0.993
```