# AIPI 590 - XAI | Assignment #05
### Explainable Techniques
### Yabei Zeng

#### Link to Colab: https://colab.research.google.com/github/yabeizeng1121/XAI/blob/main/Assignment5/Explainable_Techniques.ipynb

[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/yabeizeng1121/XAI/blob/main/Assignment5/Explainable_Techniques.ipynb)


In [None]:
# Please use this to connect your GitHub repository to your Google Colab notebook
# Connects to any needed files from GitHub and Google Drive
import os

# Remove Colab default sample_data
!rm -r ./sample_data

# Clone GitHub files to colab workspace
repo_name = "XAI" # Change to your repo name
git_path = 'https://github.com/yabeizeng1121/XAI.git' #Change to your path
!git clone "{git_path}"

# Install dependencies from requirements.txt file
#!pip install -r "{os.path.join(repo_name,'requirements.txt')}" #Add if using requirements.txt

# Change working directory to location of notebook
notebook_dir = 'Assignment5'
path_to_notebook = os.path.join(repo_name,notebook_dir)
%cd "{path_to_notebook}"
%ls

## Pre-trained Black Box Model
The pre-trained black box model I chosed is `BERT`
(Bidirectional Encoder Representations from Transformers)

- **Task**: Sentiment analysis on a commonly used dataset `imdb review` from NLP packages. This involves classifying texts into categories like positive, negative, or neutral based on the sentiment expressed in the text.

## Pre-trained Black Box Model Explanations
For the Black Box Model explanations, I used `SHAP` (SHapley Additive exPlanations), a powerful method based on game theory that provides insights into the contribution of each feature to the prediction of a machine learning model. SHAP is particularly effective because it is model-agnostic, meaning it can be applied to any type of model, including complex neural networks like BERT used in sentiment analysis. It explains individual predictions by quantifying the impact of each input feature (token in the case of text data) on the output, providing both local and global interpretations of the model behavior.


In [65]:
## downloading the necessary packages
!pip install datasets transformers shap nlp --quiet


[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.7 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.7 MB[0m [31m15.0 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m26.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [66]:
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
import shap
import nlp

# Load the tokenizer and model
# Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained("distilbert/distilbert-base-uncased-finetuned-sst-2-english")
model.eval()  # Set the model to evaluation mode

# Check if CUDA is available and move the model to GPU if it is
if torch.cuda.is_available():
    model.cuda()




In [67]:
# define a prediction function
def f(x):
    tv = torch.tensor([tokenizer.encode(v, padding="max_length", max_length=500, truncation=True) for v in x]).cuda()
    outputs = model(tv)[0].detach().cpu().numpy()
    scores = (np.exp(outputs).T / np.exp(outputs).sum(-1)).T
    val = sp.special.logit(scores[:, 1])  # use one vs rest logit units
    return val


# build an explainer using a token masker
explainer = shap.Explainer(f, tokenizer)

# explain the model's predictions on IMDB reviews
imdb_train = nlp.load_dataset("imdb")["train"]
shap_values = explainer(imdb_train[:10], fixed_context=1)

Downloading:   0%|          | 0.00/4.56k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.07k [00:00<?, ?B/s]

Downloading and preparing dataset imdb/plain_text (download: 80.23 MiB, generated: 127.06 MiB, post-processed: Unknown sizetotal: 207.28 MiB) to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/76cdbd7249ea3548c928bbf304258dab44d09cd3638d9da8d42480d1d1be3743...


Downloading:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

0 examples [00:00, ? examples/s]

We strongly recommend passing in an `attention_mask` since your input_ids may be padded. See https://huggingface.co/docs/transformers/troubleshooting#incorrect-output-when-padding-tokens-arent-masked.


Dataset imdb downloaded and prepared to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/76cdbd7249ea3548c928bbf304258dab44d09cd3638d9da8d42480d1d1be3743. Subsequent calls will reuse this data.


Token indices sequence length is longer than the specified maximum sequence length for this model (559 > 512). Running this sequence through the model will result in indexing errors
PartitionExplainer explainer: 11it [00:27,  3.99s/it]


In [68]:
shap.plots.text(shap_values[3])

## Analysis of the text plot
The SHAP text plot above visualizes how individual words in a text influence a model's prediction, specifically highlighting their contributions to sentiment analysis. In this visualization, words are color-coded: red for positive impact and blue for negative impact on the model's output. For example, words like "lovable", "impressive", "But", and "still" appear in red, indicating they positively affect the model’s prediction towards a more favorable sentiment. Conversely, the word "not" appears in blue, suggesting a negative influence, potentially diminishing the effect of nearby positive words.

This plot also includes a base value, representing the model’s average output across a background dataset, and shows how each word's contribution shifts the prediction from this baseline to the final output on the right. The length of each color block reflects the strength of each word's impact, providing a clear and immediate visual representation of their significance in the model's decision-making process.


## Reference
https://shap.readthedocs.io/en/latest/example_notebooks/api_examples/plots/text.html