### Model Explainability

Explainability is an important aspect of MLOps. Understanding the predictions that your model makes and why is very important. This can help you detect flaws in the reasoning of the model or uncover bias and fairness issues in the potential downstream use.

We will be making use of a Python package to help us explain our models reasoning, call [transformers-interpret](https://github.com/cdpierse/transformers-interpret). It makes use of PyTorch's explainability library, [Captum](https://captum.ai/) This package allows us to look at the contribution of each token to the models prediction. So we can determine if the logic the model is using seems sound.

First, let us load in our model.

In [2]:
from transformers import AutoModelForSequenceClassification, AutoTokenizer

# Get the model name, it is username and model name to access model on the HuggingFace Hub
model_name = "teglad/DistilRoBERTaEmotionClassifier"

# Load the Model and the Tokenizer
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

Now that we have our model loaded, we can go ahead and use the interpretability library to see how the model considers the text content when predicting the class label. Let's test it with some random text.

In [6]:
from transformers_interpret import MultiLabelClassificationExplainer

# Load the Mutil-class classification explainer
# This essentially computes the gradients from the class output all the way back to the input, showing us which inputs
# had a positive or negative impact on the probability of that class.
cls_explainer = MultiLabelClassificationExplainer(model, tokenizer)

word_attributions = cls_explainer("Deep Learning models can be so difficult to understand, how do they even work?")

cls_explainer.visualize()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
,(0.23),Sadness,-0.84,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.23),Joy,-0.0,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.01),Love,-1.04,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.97),Anger,-1.93,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.99),Fear,0.34,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,


n/a,Prediction Score,Attribution Label,Attribution Score,Word Importance
,(0.23),Sadness,-0.84,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.23),Joy,-0.0,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.01),Love,-1.04,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.97),Anger,-1.93,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,
,(0.99),Fear,0.34,"#s Deep Learning models can be so difficult to understand , how do they even work ? #/s"
,,,,


Now that we have the explainer working, we can take some samples from our dataset and test them. This will help us understand how the model thinks and we make some sense of what is happening inside the black-box model.

Take a look at some of the below examples and feel free to try your own out!
The model can run locally on a CPU quite quickly, inference is much faster than training.

Note: These were pulled out of the dataset, but considering the sampling it is likely that the surprised and maybe fear were in the training or validation set because there were the under represented classes.

In [8]:
sadness = "i feel so devastated over someone i was skeptical about all along"
joy = "i feel strong confident intelligent and ready to step out into the real world"
love = "i would not have told him or even joined the company had i not had a feeling he would be supportive"
anger = "i feel very pissed annoyed and depressed at the same time about a whole lot of stuff"
fear = "i feel distraught and completely tormented every time my phone goes off i hope"
surprise = "i had a sleepless night where i kept waking up every now and then feeling dazed like where the heck am i"

In [14]:
# Run the explainer with the given text
cls_explainer(surprise)

# Visualise the output.
# Keep in mind, the token 
cls_explainer.visualize()

True Label,Predicted Label,Attribution Label,Attribution Score,Word Importance
,(0.08),Sadness,-2.26,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.34),Joy,-0.14,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.27),Love,1.35,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.09),Anger,-1.48,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.14),Fear,-1.24,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,


n/a,Prediction Score,Attribution Label,Attribution Score,Word Importance
,(0.08),Sadness,-2.26,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.34),Joy,-0.14,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.27),Love,1.35,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.09),Anger,-1.48,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,
,(0.14),Fear,-1.24,#s i had a slee pless night where i kept waking up every now and then feeling d azed like where the heck am i #/s
,,,,


How does the model do? Are there any places it falls down?

### Strange Examples

While manually inspecting the dataset, I found some strangely labelled data points.
Have a look and see what you think!

+ i miss feeling am his only special girl hed take out on a date - Joy

+ i will never feel completely content i will always long for more - Joy

Unfortunately, having conflicting samples in a dataset makes it difficult for the model to understand. It is equivalent to being taught to answer a question 2 different ways, but evidently one is not correct! How do you figure out what is correct? Models face the same issue.