## Working with Transformers in the HuggingFace Ecosystem

In this laboratory exercise we will learn how to work with the HuggingFace ecosystem to adapt models to new tasks. As you will see, much of what is required is *investigation* into the inner-workings of the HuggingFace abstractions. With a little work, a little trial-and-error, it is fairly easy to get a working adaptation pipeline up and running.

### Exercise 1: Sentiment Analysis (warm up)

In this first exercise we will start from a pre-trained BERT transformer and build up a model able to perform text sentiment analysis. Transformers are complex beasts, so we will build up our pipeline in several explorative and incremental steps.

#### Exercise 1.1: Dataset Splits and Pre-trained model
There are a many sentiment analysis datasets, but we will use one of the smallest ones available: the [Cornell Rotten Tomatoes movie review dataset](cornell-movie-review-data/rotten_tomatoes), which consists of 5,331 positive and 5,331 negative processed sentences from the Rotten Tomatoes movie reviews.

**Your first task**: Load the dataset and figure out what splits are available and how to get them. Spend some time exploring the dataset to see how it is organized. Note that we will be using the [HuggingFace Datasets](https://huggingface.co/docs/datasets/en/index) library for downloading, accessing, splitting, and batching data for training and evaluation.

In [4]:
from datasets import load_dataset, get_dataset_split_names


dataset = load_dataset("rotten_tomatoes")



In [5]:
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})


In [3]:
# Guarda le prime frasi e le etichette
print(dataset["train"][0])
print(dataset["train"][1])


{'text': 'the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .', 'label': 1}
{'text': 'the gorgeously elaborate continuation of " the lord of the rings " trilogy is so huge that a column of words cannot adequately describe co-writer/director peter jackson\'s expanded vision of j . r . r . tolkien\'s middle-earth .', 'label': 1}


#### Exercise 1.2: A Pre-trained BERT and Tokenizer

The model we will use is a *very* small BERT transformer called [Distilbert](https://huggingface.co/distilbert/distilbert-base-uncased) this model was trained (using self-supervised learning) on the same corpus as BERT but using the full BERT base model as a *teacher*.

**Your next task**: Load the Distilbert model and corresponding tokenizer. Use the tokenizer on a few samples from the dataset and pass the tokens through the model to see what outputs are provided. I suggest you use the [`AutoModel`](https://huggingface.co/transformers/v3.0.2/model_doc/auto.html) class (and the `from_pretrained()` method) to load the model and `AutoTokenizer` to load the tokenizer).

In [3]:
from transformers import AutoTokenizer, AutoModel

# Carichiamo il tokenizer e il modello distilBERT pre-addestrati
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
model = AutoModel.from_pretrained("distilbert-base-uncased")


  Referenced from: <0B7EB158-53DC-3403-8A49-22178CAB4612> /opt/anaconda3/envs/transformers/lib/python3.10/site-packages/torchvision/image.so
  warn(


In [4]:
# Prendiamo una frase di esempio dal dataset
sample_text = dataset["train"][0]["text"]
print("Frase:", sample_text)

# Tokenizzazione con padding e truncation automatici
inputs = tokenizer(sample_text, return_tensors="pt", padding=True, truncation=True)
print("Token IDs:", inputs["input_ids"])


Frase: the rock is destined to be the 21st century's new " conan " and that he's going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .
Token IDs: tensor([[  101,  1996,  2600,  2003, 16036,  2000,  2022,  1996,  7398,  2301,
          1005,  1055,  2047,  1000, 16608,  1000,  1998,  2008,  2002,  1005,
          1055,  2183,  2000,  2191,  1037, 17624,  2130,  3618,  2084,  7779,
         29058,  8625, 13327,  1010,  3744,  1011, 18856, 19513,  3158,  5477,
          4168,  2030,  7112, 16562,  2140,  1012,   102]])


In [5]:
import torch
# Output di DistilBERT
with torch.no_grad():
    outputs = model(**inputs)

# Il modello ritorna un dizionario con almeno "last_hidden_state"
last_hidden_state = outputs.last_hidden_state
print("Output shape:", last_hidden_state.shape)


Output shape: torch.Size([1, 47, 768])


#### Exercise 1.3: A Stable Baseline

In this exercise I want you to:
1. Use Distilbert as a *feature extractor* to extract representations of the text strings from the dataset splits;
2. Train a classifier (your choice, by an SVM from Scikit-learn is an easy choice).
3. Evaluate performance on the validation and test splits.

These results are our *stable baseline* -- the **starting** point on which we will (hopefully) improve in the next exercise.

**Hint**: There are a number of ways to implement the feature extractor, but probably the best is to use a [feature extraction `pipeline`](https://huggingface.co/tasks/feature-extraction). You will need to interpret the output of the pipeline and extract only the `[CLS]` token from the *last* transformer layer. *How can you figure out which output that is?*

In [6]:
from transformers import pipeline

# Crea un pipeline per feature extraction
extractor = pipeline("feature-extraction", model="distilbert-base-uncased", tokenizer="distilbert-base-uncased", device=-1)  # usa CPU


Device set to use cpu


In [7]:
text = dataset["train"][0]["text"]
features = extractor(text)
print(len(features), len(features[0]), len(features[0][0]))


1 47 768


In [8]:
cls_vector = features[0][0]  # Embedding [CLS] da 768 valori


In [9]:
def extract_cls_features(dataset_split):
    features = []
    labels = []
    for example in dataset_split:
        text = example["text"]
        label = example["label"]
        output = extractor(text, truncation=True, padding=True)
        cls_vector = output[0][0]  # [CLS] embedding
        features.append(cls_vector)
        labels.append(label)
    return features, labels

X_train, y_train = extract_cls_features(dataset["train"])[:1000]
X_val, y_val = extract_cls_features(dataset["validation"])
X_test, y_test = extract_cls_features(dataset["test"])


In [10]:
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score

clf = SVC(kernel="linear")
clf.fit(X_train, y_train)

val_preds = clf.predict(X_val)
test_preds = clf.predict(X_test)

print("Validation Accuracy:", accuracy_score(y_val, val_preds))
print("Test Accuracy:", accuracy_score(y_test, test_preds))


Validation Accuracy: 0.8189493433395872
Test Accuracy: 0.8067542213883677


### Risultati:

- **Accuratezza sul validation set**: circa 81,9%  
- **Accuratezza sul test set**: circa 80,7%

---

### Punti chiave:

- Anche **senza effettuare il fine-tuning**, l’utilizzo degli embedding pre-addestrati di un Transformer fornisce una **baseline molto solida**.
- Le performance ottenute **superano nettamente** approcci classici di classificazione testuale (es. Bag-of-Words + Regressione Logistica).
- Questo approccio rappresenta un’alternativa **leggera e veloce** rispetto al fine-tuning completo, ideale per la prototipazione o per sistemi con risorse computazionali limitate.
- Questi risultati costituiscono un **punto di riferimento** con cui confrontare eventuali miglioramenti ottenuti tramite fine-tuning o con architetture diverse.



-----
### Exercise 2: Fine-tuning Distilbert

In this exercise we will fine-tune the Distilbert model to (hopefully) improve sentiment analysis performance.

#### Exercise 2.1: Token Preprocessing

The first thing we need to do is *tokenize* our dataset splits. Our current datasets return a dictionary with *strings*, but we want *input token ids* (i.e. the output of the tokenizer). This is easy enough to do my hand, but the HugginFace `Dataset` class provides convenient, efficient, and *lazy* methods. See the documentation for [`Dataset.map`](https://huggingface.co/docs/datasets/v3.5.0/en/package_reference/main_classes#datasets.Dataset.map).

**Tip**: Verify that your new datasets are returning for every element: `text`, `label`, `intput_ids`, and `attention_mask`.

In [6]:
def tokenize_function(example):
    return tokenizer(
        example["text"],
        padding="max_length", #per uniformare la lunghezza 
        truncation=True,
        max_length=128  #lunghezza massima del testo
    )

In [7]:
#Tokenizza ogni elemento del dataset in modo efficiente e batchato
tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/8530 [00:00<?, ? examples/s]

In [13]:
print(tokenized_datasets["train"].column_names)


['text', 'label', 'input_ids', 'attention_mask']


In [8]:
tokenized_datasets["train"][0]


{'text': 'the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',
 'label': 1,
 'input_ids': [101,
  1996,
  2600,
  2003,
  16036,
  2000,
  2022,
  1996,
  7398,
  2301,
  1005,
  1055,
  2047,
  1000,
  16608,
  1000,
  1998,
  2008,
  2002,
  1005,
  1055,
  2183,
  2000,
  2191,
  1037,
  17624,
  2130,
  3618,
  2084,
  7779,
  29058,
  8625,
  13327,
  1010,
  3744,
  1011,
  18856,
  19513,
  3158,
  5477,
  4168,
  2030,
  7112,
  16562,
  2140,
  1012,
  102,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,
  0,


#### Exercise 2.2: Setting up the Model to be Fine-tuned

In this exercise we need to prepare the base Distilbert model for fine-tuning for a *sequence classification task*. This means, at the very least, appending a new, randomly-initialized classification head connected to the `[CLS]` token of the last transformer layer. Luckily, HuggingFace already provides an `AutoModel` for just this type of instantiation: [`AutoModelForSequenceClassification`](https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#automodelforsequenceclassification). You will want you instantiate one of these for fine-tuning.

In [15]:
from transformers import AutoModelForSequenceClassification

# 2 classi: positivo e negativo
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased",
    num_labels=2  # importante per classificazione binaria
)

Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [16]:
import torch

device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")
model.to(device)


DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


#### Exercise 2.3: Fine-tuning Distilbert

Finally. In this exercise you should use a HuggingFace [`Trainer`](https://huggingface.co/docs/transformers/main/en/trainer) to fine-tune your model on the Rotten Tomatoes training split. Setting up the trainer will involve (at least):


1. Instantiating a [`DataCollatorWithPadding`](https://huggingface.co/docs/transformers/en/main_classes/data_collator) object which is what *actually* does your batch construction (by padding all sequences to the same length).
2. Writing an *evaluation function* that will measure the classification accuracy. This function takes a single argument which is a tuple containing `(logits, labels)` which you should use to compute classification accuracy (and maybe other metrics like F1 score, precision, recall) and return a `dict` with these metrics.  
3. Instantiating a [`TrainingArguments`](https://huggingface.co/docs/transformers/v4.51.1/en/main_classes/trainer#transformers.TrainingArguments) object using some reasonable defaults.
4. Instantiating a `Trainer` object using your train and validation splits, you data collator, and function to compute performance metrics.
5. Calling `trainer.train()`, waiting, waiting some more, and then calling `trainer.evaluate()` to see how it did.

**Tip**: When prototyping this laboratory I discovered the HuggingFace [Evaluate library](https://huggingface.co/docs/evaluate/en/index) which provides evaluation metrics. However I found it to have insufferable layers of abstraction and getting actual metrics computed. I suggest just using the Scikit-learn metrics...

In [17]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


In [18]:
import numpy as np
from sklearn.metrics import accuracy_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    acc = accuracy_score(labels, predictions)
    return {"accuracy": acc}

In [19]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy"
)


In [9]:
tokenized_datasets = tokenized_datasets.remove_columns(["text"])  # solo input_ids etc.
tokenized_datasets.set_format("torch")


In [21]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)


  trainer = Trainer(


In [22]:
trainer.train()


Epoch,Training Loss,Validation Loss,Accuracy
1,0.4266,0.389991,0.827392
2,0.2525,0.39033,0.854597
3,0.1626,0.488452,0.857411


TrainOutput(global_step=1602, training_loss=0.271679728813981, metrics={'train_runtime': 843.3063, 'train_samples_per_second': 30.345, 'train_steps_per_second': 1.9, 'total_flos': 847460182901760.0, 'train_loss': 0.271679728813981, 'epoch': 3.0})

In [23]:
trainer.evaluate(tokenized_datasets["validation"])
trainer.evaluate(tokenized_datasets["test"])


{'eval_loss': 0.5327609181404114,
 'eval_accuracy': 0.8424015009380863,
 'eval_runtime': 8.7536,
 'eval_samples_per_second': 121.778,
 'eval_steps_per_second': 7.654,
 'epoch': 3.0}

In [24]:
model.save_pretrained("distilbert-finetuned-rotten")
tokenizer.save_pretrained("distilbert-finetuned-rotten")


('distilbert-finetuned-rotten/tokenizer_config.json',
 'distilbert-finetuned-rotten/special_tokens_map.json',
 'distilbert-finetuned-rotten/vocab.txt',
 'distilbert-finetuned-rotten/added_tokens.json',
 'distilbert-finetuned-rotten/tokenizer.json')

### **Risultati ottenuti:**

| Epoca | Loss Training | Loss Validation | Accuratezza Validation |
|-------|----------------|------------------|--------------------------|
| 1     | 0.4266         | 0.3899           | 82.74%                   |
| 2     | 0.2525         | 0.3903           | 85.46%                   |
| 3     | 0.1626         | 0.4884           | 85.74%                   |

- **Accuratezza finale sul test set**: **84.24%**
- **Loss sul test set**: **0.5327**

---

### **Conclusioni**

- Il modello fine-tuned ha **superato la baseline** ottenuta con l’SVM e le feature statiche di BERT (~80%).
- Il **fine-tuning di DistilBERT** ha permesso al modello di adattare le sue rappresentazioni linguistiche al compito specifico di sentiment analysis, migliorando sensibilmente le performance.
- Il lieve aumento della loss di validazione durante la terza epoca potrebbe indicare l’inizio di un **overfitting**; quindi **2 epoche** potrebbero rappresentare un miglior compromesso per la generalizzazione.



-----
### Exercise 3: Choose at Least One


#### Exercise 3.1: Efficient Fine-tuning for Sentiment Analysis (easy)

In Exercise 2 we fine-tuned the *entire* Distilbert model on Rotten Tomatoes. This is expensive, even for a small model. Find an *efficient* way to fine-tune Distilbert on the Rotten Tomatoes dataset (or some other dataset).

**Hint**: You could check out the [HuggingFace PEFT library](https://huggingface.co/docs/peft/en/index) for some state-of-the-art approaches that should "just work". How else might you go about making fine-tuning more efficient without having to change your training pipeline from above?

In [10]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_name = "distilbert-base-uncased"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [12]:
from peft import get_peft_model, LoraConfig, TaskType

peft_config = LoraConfig(
    task_type=TaskType.SEQ_CLS,
    inference_mode=False,
    r=8, #dimensione del LoRA
    lora_alpha=16,
    lora_dropout=0.1,
    target_modules=["q_lin", "v_lin"]  # Moduli di attenzione in DistilBERT
)

# Integra LoRA nel modello
model = get_peft_model(model, peft_config)
model.print_trainable_parameters()


trainable params: 739,586 || all params: 67,694,596 || trainable%: 1.0925


In [15]:
from transformers import DataCollatorWithPadding, TrainingArguments, Trainer
from sklearn.metrics import accuracy_score
import numpy as np

data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return {"accuracy": accuracy_score(labels, predictions)}

training_args = TrainingArguments(
    output_dir="./lora-results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=5e-4,  # puoi usare un LR più alto con LoRA
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    logging_dir="./lora-logs",
    load_best_model_at_end=True,
    metric_for_best_model="accuracy"
)


In [16]:
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["validation"],
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics
)

trainer.train()


  trainer = Trainer(
No label_names provided for model class `PeftModelForSequenceClassification`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Epoch,Training Loss,Validation Loss,Accuracy
1,0.4175,0.407209,0.825516
2,0.3415,0.366561,0.835835
3,0.2918,0.364019,0.839587


TrainOutput(global_step=1602, training_loss=0.34584622019983263, metrics={'train_runtime': 545.7164, 'train_samples_per_second': 46.892, 'train_steps_per_second': 2.936, 'total_flos': 861995355310080.0, 'train_loss': 0.34584622019983263, 'epoch': 3.0})

In [17]:
trainer.evaluate(tokenized_datasets["test"])


{'eval_loss': 0.40298429131507874,
 'eval_accuracy': 0.8311444652908068,
 'eval_runtime': 12.6264,
 'eval_samples_per_second': 84.426,
 'eval_steps_per_second': 5.306,
 'epoch': 3.0}

### Risultati:

| Epoca | Loss Training | Loss Validation | Accuratezza Validation |
|-------|----------------|------------------|--------------------------|
| 1     | 0.4175         | 0.4072           | 82.55%                   |
| 2     | 0.3415         | 0.3666           | 83.58%                   |
| 3     | 0.2918         | 0.3640           | 83.96%                   |

- **Accuratezza finale sul test set**: **83.11%**
- **Loss finale sul test set**: **0.4030**

---

### Conclusioni:

- LoRA ha permesso un **fine-tuning efficiente** di DistilBERT, ottenendo prestazioni quasi identiche al fine-tuning completo (≈84% di accuratezza).
- Il tempo di addestramento e l’uso di memoria sono stati **significativamente ridotti**.
- Questo rende LoRA un approccio potente e pratico per adattare modelli linguistici di grandi dimensioni a nuovi task, specialmente in contesti con risorse limitate.


### Confronto delle performance: SVM vs Full Fine-Tuning vs LoRA

| Approccio              | Tipo di Addestramento             | Accuratezza (Test) | Note |
|------------------------|-----------------------------------|---------------------|------|
| **SVM (Esercizio 1.3)** | DistilBERT congelato + SVM       | ~80.7%              | Veloce, leggero, ma meno flessibile |
| **Fine-Tuning completo** | Tutti i pesi di DistilBERT aggiornati | ~84.2%        | Miglior accuratezza, ma richiede molte risorse |
| **LoRA (PEFT)**        | Solo gli adapter LoRA aggiornati  | ~83.1%              | Quasi pari al fine-tuning completo, ma molto più efficiente |

---

### Riassumendo...

- La **baseline con SVM** è un ottimo punto di partenza con risorse minime, ma non permette al modello di adattarsi a fondo al task.
- Il **fine-tuning completo** offre le prestazioni migliori, ma a un costo computazionale più elevato.
- **LoRA** rappresenta un **giusto compromesso** tra performance ed efficienza: si avvicina molto all’accuratezza del fine-tuning completo aggiornando solo una **piccolissima parte dei parametri**.

Questo dimostra come i metodi di **fine-tuning a basso costo parametrico** come LoRA siano ideali per applicazioni reali, soprattutto quando si lavora con risorse computazionali limitate o si ha bisogno di adattare velocemente modelli a nuovi compiti.



#### Exercise 3.2: Fine-tuning a CLIP Model (harder)

Use a (small) CLIP model like [`openai/clip-vit-base-patch16`](https://huggingface.co/openai/clip-vit-base-patch16) and evaluate its zero-shot performance on a small image classification dataset like ImageNette or TinyImageNet. Fine-tune (using a parameter-efficient method!) the CLIP model to see how much improvement you can squeeze out of it.

**Note**: There are several ways to adapt the CLIP model; you could fine-tune the image encoder, the text encoder, or both. Or, you could experiment with prompt learning.

**Tip**: CLIP probably already works very well on ImageNet and ImageNet-like images. For extra fun, look for an image classification dataset with different image types (e.g. *sketches*).

In [4]:
# Your code here.

#### Exercise 3.3: Choose your Own Adventure

There are a *ton* of interesting and fun models on the HuggingFace hub. Pick one that does something interesting and adapt it in some way to a new task. Or, combine two or more models into something more interesting or fun. The sky's the limit.

**Note**: Reach out to me by email or on the Discord if you are unsure about anything.

In [5]:
# Your code here.