## Training a Text classifier
Models lke DistilBERT are pretrained to predict masked words in a sequence of text. We can't use these language models directly for text classification. We have two options to train a model on our Twitter dataset:
 - Feature Extraction: usage of hidden states as features and train a classifier on them, without modifiying the pretrained model
 - Fine-tuning: train model end-to-end, which also updates the parameters of the pretrained model

This notebook is about fine-tuning!

### Setup tokenizer (see previous notebook)

In [1]:
from transformers import AutoTokenizer
from datasets import load_dataset

model_ckpt = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_ckpt)
emotions = load_dataset("emotion")

# Applies tokenizer to a batch of examples, padding true adds examples with zeros and truncation true truncates examples to max context length.
def tokenize(batch):
    return tokenizer(batch["text"], padding=True, truncation=True)

emotions_encoded = emotions.map(tokenize, batched=True, batch_size=None)

### Loading a pretrained model
In this case we will use AutoModelForSequenceClassification which has a classification head that has not yet been trained.

In [2]:
from transformers import AutoModelForSequenceClassification
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
num_labels = 6 # number of emotion labels
model = (AutoModelForSequenceClassification.from_pretrained(model_ckpt, num_labels=num_labels).to(device))

  return torch._C._cuda_getDeviceCount() > 0
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.weight', 'pre_classifier.weight', 'pre_classifier.bias', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Defining the performance metrics
We need to monitor metrics during training, we need to define a compute_metrics function for the Trainer.

In [3]:
from sklearn.metrics import accuracy_score, f1_score

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    f1 = f1_score(labels, preds, average="weighted")
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc, "f1": f1}

### Training the model

In [4]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [6]:
from transformers import TrainingArguments

batch_size = 64
logging_steps = len(emotions_encoded["train"]) // batch_size
model_name = "distilbert-base-uncased-finetuned-emotion"
traning_args = TrainingArguments(output_dir=model_name,
                                    num_train_epochs=2,
                                    learning_rate=2e-5,
                                    per_device_train_batch_size=batch_size,
                                    per_device_eval_batch_size=batch_size,
                                    weight_decay=0.01,
                                    evaluation_strategy="epoch",
                                    disable_tqdm=False,
                                    logging_steps=logging_steps,
                                    push_to_hub=True,
                                    log_level="error")

In [9]:
from transformers import Trainer

trainer = Trainer(model=model, args=traning_args, compute_metrics=compute_metrics, train_dataset=emotions_encoded["train"], eval_dataset=emotions_encoded["validation"], tokenizer=tokenizer)
trainer.train()

Cloning https://huggingface.co/tobrun/distilbert-base-uncased-finetuned-emotion into local empty directory.


  0%|          | 0/500 [00:00<?, ?it/s]

KeyboardInterrupt: 

Looking at the logs, we can see our model has f1 score of 0.92, which is better as the feature-based approach!

In [None]:
# We can take a more detailed look at the training metrics by calculating the confusion matrix.
preds_output = trainer.predict(emotions_encoded["validation"])
preds_output.metrics

In [None]:
import numpy as np

# It also contains the raw predictions for each class, we can decode the predictions wiht np.argmax.
y_preds = np.argmax(preds_output.predictions, axis=1)
y_preds

### Error analysis

Before moving on, we need to dig into the model predictions against the validation set a little bit further.

In [None]:
from torch.nn.functional import cross_entropy

def forward_pass_with_label(batch):
    inputs = {k:v.to(device) for k, v in batch.items()}
    if k in tokenizer.model_input_names:
        with torch.no_grad():
            output = model(**inputs)
            pred_label = torch.argmax(output.logits, axis=1)
            loss = cross_entropy(output.logits, bach["label"].to(device), reduction="none")

    return {"loss": loss.cpu().numpy(), "predicted_label": pred_label.cpu().numpy()}


# Using the map function we can apply the function to all samples
emotions_encoded.set_format(type="torch", columns=["input_ids", "attention_mask", "label"])
emotions_encoded['validation'].map(forward_pass_with_label, batched=True, batch_size=16)

# Create dataframe 
emotions_encoded.set_format(type="pandas")
cols = ["text", "label", "predicted_label", "loss"]
df_test = emotions_encoded["validation"][:][cols]
df_test["label"] = df_test["label"].apply(lambda x: emotions.features["label"].int2str(x))
df_test["predicted_label"] = df_test["predicted_label"].apply(lambda x: emotions.features["label"].int2str(x))

# Lets' take a look at 10 highest losses
print(df_test.sort_values(by="loss", ascending=False).head(10))
# -> we can clearly see some mislabeled examples

# Let's take a look at 10 lowest lossses
print(df_test.sort_values(by="loss", ascending=True).head(10))
# -> we can clearly see that sadness is well predicted (vs joy above)
# with this information we can make targeted improvements to our dataset


### Saving and sharing the model

In [None]:
trainer.push_to_hub(commit_message="finetuned emotion classifier")

### Using the model


In [None]:
from transformers import pipeline
import pandas as pd
import matplotlib.pyplot as plt

model_id = "tobrun/distilbert-base-uncased-finetuned-emotion"
classifier = pipeline("text-classification", model=model_id)

custom_tweet = "I saw a movie yesterday and I really liked it!"
preds = classifier(custom_tweet, return_all_scores=True)

preds_df = pd.DataFrame(preds[0])
plt.bar(labels, 100 * preds_df["score"], color="C0")
plt.title(f"{custom_tweet}")
plt.ylabel("Classs probability (%)")
plt.show()