# Práctica 6: *Fine-tuning en producción*

##**Fecha de entrega: 11 de Mayo de 2025**

### Cesar Cossio Guerrero

- Selecciona un modelo pre-entrenado como base y realiza *fine-tuning* para resolver alguna tarea de NLP que te parezca reelevante
  - Procura utilizar datasets pequeños para que sea viable
  - Recuerda las posibles tareas disponibles en HF `*For<task>`
- Desarrolla y pon en producción un prototipo del modelo
  - Incluye una URL pública donde podamos ver tu proyecto
  - Recomendamos usar framewoks de prototipado (*streamlit* o *gradio*) y el *free-tier* de *spaces* de hugging face
    - https://huggingface.co/spaces/launch
    - https://huggingface.co/docs/hub/spaces-sdks-streamlit
    - https://huggingface.co/docs/hub/spaces-sdks-gradio
- Reporta que tan bien se resolvió la tarea y que tan útil fue tu app
- Reporta retos y dificultades al realizar el *fine-tuning* y al poner tu modelo en producción

## Extra

- Utiliza [code carbon](https://codecarbon.io/#howitwork) para reportar las emisiones de tu app

# Decidí hacer un **finetunning** en el modelo **Bert** para **clasificación** en un **análisis de sentimientos**

## Cargamos las **librerías**

In [35]:
import numpy as np
import pandas as pd
import torch
from transformers import BertTokenizerFast, AutoModel
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
from torch import nn
from torch.utils.data import DataLoader
from torch.nn import CrossEntropyLoss
from torch.optim import AdamW
from datasets import load_dataset
from sklearn.model_selection import train_test_split

## Cargamos el **dataset** que usaremos para hacer el **FineTunning**

### Si **falla** la **carga** del **dataset** por favor usar el siguiente comando que actuliza la libreríá datasets

In [None]:
#pip install -U datasets   # En caso de que fallé la carga del dataset de la librería datasets

Collecting datasets
  Downloading datasets-3.6.0-py3-none-any.whl.metadata (19 kB)
Collecting fsspec<=2025.3.0,>=2023.1.0 (from fsspec[http]<=2025.3.0,>=2023.1.0->datasets)
  Downloading fsspec-2025.3.0-py3-none-any.whl.metadata (11 kB)
Downloading datasets-3.6.0-py3-none-any.whl (491 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading fsspec-2025.3.0-py3-none-any.whl (193 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: fsspec, datasets
  Attempting uninstall: fsspec
    Found existing installation: fsspec 2025.3.2
    Uninstalling fsspec-2025.3.2:
      Successfully uninstalled fsspec-2025.3.2
  Attempting uninstall: datasets
    Found existing installation: datasets 2.14.4
    Uninstalling datasets-2.14.4:
      Successfully uninstalled datasets-2.14.4
[31mERROR: pip's dependency re

In [36]:
ds = load_dataset("mteb/tweet_sentiment_extraction")

In [37]:
ds

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 27481
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 3534
    })
})

### Hay 3 categorías, solo usaré dos para que el entrenamiento sea más rápido

In [38]:
ds_filtered = ds.filter(lambda example: example['label'] in [0, 1])

### Tomaré solo el **1%** del total del dataset

In [39]:
from sklearn.model_selection import train_test_split

# Convertir el dataset de Hugging Face a un DataFrame de pandas
ds_filtered_df = pd.DataFrame(ds_filtered['train'])

# Tomar una muestra del 10% de manera estratificada
ds_sample, _ = train_test_split(
    ds_filtered_df,
    train_size=0.01,
    stratify=ds_filtered_df['label'],
    random_state=42 # Para reproducibilidad
)

print("Shape of the original dataset:", ds_filtered_df.shape)
print("Shape of the sampled dataset:", ds_sample.shape)
print("\nValue counts in the original dataset label column:")
print(ds_filtered_df['label'].value_counts(normalize=True))
print("\nValue counts in the sampled dataset label column:")
print(ds_sample['label'].value_counts(normalize=True))

Shape of the original dataset: (18899, 2)
Shape of the sampled dataset: (188, 2)

Value counts in the original dataset label column:
label
1    0.588285
0    0.411715
Name: proportion, dtype: float64

Value counts in the sampled dataset label column:
label
1    0.590426
0    0.409574
Name: proportion, dtype: float64


### Lo regreso a su forma original, DatasetDict para entrenar al modelo

In [40]:
from datasets import Dataset, DatasetDict

# Convertir el DataFrame de pandas de vuelta a un objeto Dataset de Hugging Face
ds_sample = Dataset.from_pandas(ds_sample)

# Crear un DatasetDict (aunque solo tengamos el split 'train')
dataset = DatasetDict({'train': ds_sample})

dataset

Dataset({
    features: ['text', 'label', '__index_level_0__'],
    num_rows: 188
})

In [58]:
train_df =dataset['train'].to_pandas()

In [59]:
train_df.value_counts('label')

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,111
0,77


### Usaremos solo **dos categorías** del conjunto para mejorar el **desempeño**: **positivo** y **negativo**

In [61]:
train_df.value_counts('label')

Unnamed: 0_level_0,count
label,Unnamed: 1_level_1
1,111
0,77


In [None]:
pad_len = 13

### Creamos los conjuntos de entre

In [21]:
#load model and tokenizer
bert = AutoModel.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizerFast.from_pretrained('bert-base-uncased')

### Se realiza la **tokenización** para entrenar al modelo

In [41]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

def tokenize_function(examples):
    return tokenizer(examples['text'], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

Map:   0%|          | 0/188 [00:00<?, ? examples/s]

### Se crea potencialmente un conjunto de entrenamiento, de prueba y validación. No los usé por motivos de hardware.

In [42]:
train_testvalid = tokenized_datasets['train'].train_test_split(test_size=0.2)
train_dataset = train_testvalid['train']
valid_dataset = train_testvalid['test']

In [43]:
train_dataset.shape

(150, 6)

In [44]:
train_dataloader = DataLoader(train_dataset, shuffle=True, batch_size=8)
valid_dataloader = DataLoader(valid_dataset, batch_size=8)

In [32]:
from torch.optim import AdamW

### Inicializamos el modelo **Bert uncased**

In [33]:
from transformers import BertForSequenceClassification #, AdamW

model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [45]:
from transformers import DataCollatorWithPadding

data_collator = DataCollatorWithPadding(tokenizer)

Definimos **parámetros** del entrenamiento para el **finetunnig**

In [46]:
from transformers import Trainer, TrainingArguments

training_args = TrainingArguments(
    output_dir='./results',
    eval_strategy="epoch",
    report_to=["tensorboard"]
    #learning_rate=2e-5,
    #per_device_train_batch_size=8,
    #per_device_eval_batch_size=8,
    #num_train_epochs=3,
    #weight_decay=0.01,
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=valid_dataset,
    data_collator=data_collator,
    tokenizer=tokenizer,
    # Esto es nuevo O:
    #compute_metrics=compute_metrics,
)

  trainer = Trainer(


### Se **entrena** el modelo

In [47]:
trainer.train()

Epoch,Training Loss,Validation Loss
1,No log,0.6703
2,No log,0.680856
3,No log,0.649012


TrainOutput(global_step=57, training_loss=0.5582241928368284, metrics={'train_runtime': 2331.083, 'train_samples_per_second': 0.193, 'train_steps_per_second': 0.024, 'total_flos': 118399974912000.0, 'train_loss': 0.5582241928368284, 'epoch': 3.0})

### **Salvamos el modelo**

In [48]:
trainer.save_model("./my_bert_model")

### Aquí podemos **probar** el **modelo**

In [52]:
# Define the sentence you want to predict
sentence = "I am very sad"

# Tokenize the sentence
tokens_sentence = tokenizer(
    sentence,
    max_length=13,
    padding='max_length',
    truncation=True,
    return_tensors='pt'  # Return PyTorch tensors
)

# Prepare tensors for the model and move to the correct device
# Use 'cpu' if you are not using a GPU, otherwise use 'cuda'
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
input_ids = tokens_sentence['input_ids'].to(device)
attention_mask = tokens_sentence['attention_mask'].to(device)

# Get predictions using the loaded model
model.eval()  # Set model to evaluation mode
with torch.no_grad():
    preds = model(input_ids, attention_mask)

# Convert predictions (logits) to probabilities (if needed) and then to class labels
# The model outputs raw logits, so we need to find the index of the highest logit
predicted_class_index = torch.argmax(preds.logits, dim=1).cpu().numpy()[0]

# Map the predicted class index back to a meaningful label
# Assuming your model is trained for binary classification with labels 0 and 1
# You need to know what these labels represent in your specific model
sentiment_map = {0: 'negative', 1: 'positive'} # Adjust this based on your training
predicted_sentiment = sentiment_map.get(predicted_class_index, 'unknown')

print(f"The sentence '{sentence}' is classified as: {predicted_sentiment}")


The sentence 'I am very sad' is classified as: negative


# **Load Bert Architecture**

In [57]:
# **Load the fine-tuned model**
loaded_model = BertForSequenceClassification.from_pretrained("./my_bert_model")
loaded_tokenizer = BertTokenizerFast.from_pretrained("./my_bert_model")

# Move the loaded model to the appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
loaded_model.to(device)

# Example usage with the loaded model (similar to the testing part)
sentence = "Today is horrible!"

# Tokenize the sentence using the loaded tokenizer
tokens_sentence = loaded_tokenizer(
    sentence,
    max_length=13,
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)

# Prepare tensors for the loaded model and move to the correct device
input_ids = tokens_sentence['input_ids'].to(device)
attention_mask = tokens_sentence['attention_mask'].to(device)

# Get predictions using the loaded model
loaded_model.eval()  # Set model to evaluation mode
with torch.no_grad():
    preds = loaded_model(input_ids, attention_mask)

# Convert predictions (logits) to probabilities and then to class labels
predicted_class_index = torch.argmax(preds.logits, dim=1).cpu().numpy()[0]
sentiment_map = {0: 'negative', 1: 'positive'} # Adjust this based on your training
predicted_sentiment = sentiment_map.get(predicted_class_index, 'unknown')

print(f"Using the loaded model, the sentence '{sentence}' is classified as: {predicted_sentiment}")

Using the loaded model, the sentence 'Today is horrible!' is classified as: negative


### Esta práctica se me hizo muy interesante, pero por más la más difícil. Me costó mucho encontrar un buen lugar donde correr el fine tuning. También escojer la tarea y el dataset se me hizo un poco complicado porque no sabía muy bien como podría entrenarlo. Finalmente busqué un tutorial en HuggingFace.

### Creo que es más complicado de lo que parece. Crear la app no fue difícil, realmente creo que es sencillo y muy poderoso.

### No pude entrenar muy bien el modelo por problemas de hardware pues tardaba mucho tiempo, pero me parece que aprendía bastante.