## Interpreting a Text Classifier

As a data scientist at a movie streaming company, you've been tasked with finding ways to improve the platform's movie recommendation system. One of the biggest challenges is understanding how users feel about the movies they watch. Sure, you can look at the star ratings they give, but those aren't always the most informative. That's why you've decided to dig deeper and use interpretable AI to analyze movie reviews and understand the sentiment behind them.

<center><img src='https://media.tenor.com/VF5vI70hNv0AAAAC/film-izlemek.gif'></center>

You are given a NLP model that is trained on millions of tweets that are not necessarily movie reviews. The models extract features such as the presence of certain words or phrases that might indicate a positive or negative opinion and gives you a sentiment of a review based on these features. But the model's predictions aren't enough for you. You want to understand why it's making the predictions it is. So you use interpretable AI techniques like LIME and SHAP to get an understanding of the most important factors that drive a positive or negative opinion.

With this information in hand, your aim is to create a visualization that clearly illustrates the key factors that drive positive or negative opinions about a film. This way, you and the team can easily identify which movies are likely to be well-received by users and which ones are likely to be overlooked. By using interpretable AI to understand movie reviews, your goal is to create a powerful tool that will help the company make better movie recommendations and keep users coming back for more awesome content. Good luck on this journey!

## Installation and Imports
We will use [Ferret](https://ferret.readthedocs.io/en/latest/readme.html) -  A python package for benchmarking interpretability techniques on Transformers.

In [None]:
!pip install scikit-learn
!pip install -q ferret-xai

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.6/7.6 MB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m140.6/140.6 KB[0m [31m13.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for lime (setup.py) ... [?25l[?25hdone
  Building wheel for pytreebank (setup.py) ... [?25l[?25hdone
  Building wheel for sklearn (setup.py) ... [?25l[?25hdone


In [None]:
## The Usual Suspects
import pandas as pd
import numpy as np  
import torch

## Transformer Models
from transformers import AutoModelForSequenceClassification, AutoTokenizer
## Ferret Benchmarker
from ferret import Benchmark

## Sentiment Analysis Model
For this project, let's use the NLP model developed by hugging face🤗 for sentiment analysis. Specifically, we will use `twitter-XLM-roBERTa-base` pre-trained model which is trained on over 190M tweets and is tuned for sentiment analysis. Let us leverage on hugging face `transformers` library to load the pre-trained models. See the documentation [here](https://huggingface.co/docs/transformers/v4.25.1/en/autoclass_tutorial#autotokenizer)

In [None]:
## Before we build our transformer, lets make sure to setup the device.
## To run this notbeook via GPU: Edit -> Notebook settings -> Hardware accelerator -> GPU
## If your GPU is working, device is "cuda"
device = "cuda" if torch.cuda.is_available() else "cpu"
device

'cuda'

In [None]:
name = "cardiffnlp/twitter-xlm-roberta-base-sentiment" 

##TODO: Build pre-trained model and tokenizer using  AutoModelForSequenceClassification, AutoTokenizer
## Make sure to load the model onto the device for gpu

model = AutoModelForSequenceClassification.from_pretrained(name).to(device)
tokenizer = AutoTokenizer.from_pretrained(name)

### Build Explainerr
Using Ferret XAI, we will benchmark our pre-trained model from huggingface. See [Benchmark](https://ferret.readthedocs.io/en/latest/readme.html#visualization) documentation 

In [None]:
##TODO: Build explainer using `Benchmark` function 
explainer = Benchmark(model, tokenizer)

#### Use `score` method to predict the overall sentiment for a sample text

In [None]:
from transformers import TextClassificationPipeline
sample_text = "The movie had great narration and visuals despite a boring storyline."

##TODO: Use `score` method to obtain the class scores and print it
print(explainer.score(sample_text))

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


{'negative': 0.09874014556407928, 'neutral': 0.13105592131614685, 'positive': 0.7702038884162903}


### Generate Explanations using the Explainer
Notice that the sentiment for the `sample_text` is overall positive. We also notice small scores for 'neutral'  and 'negative' classes. Let us use Ferret XAI explainers to understand the predictions of the model. Ferret has various built-in post-hoc explainers which are variants of the ones we studied and used an in [Week 1](https://corise.com/course/interpreting-machine-learning-models/v2/module/interpreting-an-image-classifier) for image classification models. Here, we will use the same sample text for movie review and generate explanations for different sentiment classes (postive, negative and neutral). 

#### Generate explanations for positive class. 



In [None]:
## TODO: Generate explanation for postive class and show the explanations in a table
## Hint use `target` attribute to specify the class as integer. Note the three classes in the score above 

explain_posclass = explainer.explain(text=sample_text, target=2)
explainer.show_table(explain_posclass)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Token,▁The,▁movie,▁had,▁great,▁narra,tion,▁and,▁visual,s,▁de,spite,▁a,▁bor,ing,▁story,line,.
Partition SHAP,0.01,0.07,-0.02,0.29,0.08,0.02,0.05,0.06,0.03,0.04,0.12,0.01,-0.07,-0.08,0.03,0.02,-0.02
LIME,-0.06,0.07,-0.05,0.22,0.01,0.0,0.06,0.09,0.04,0.08,0.06,-0.02,-0.04,-0.05,0.02,0.06,0.07
Gradient,0.03,0.07,0.05,0.07,0.06,0.03,0.03,0.08,0.02,0.03,0.12,0.02,0.13,0.04,0.06,0.05,0.02
Gradient (x Input),-0.06,-0.1,0.0,-0.0,-0.01,-0.03,-0.03,0.07,0.04,0.05,-0.04,0.01,0.19,0.06,-0.03,-0.1,0.03
Integrated Gradient,-0.03,0.01,-0.13,-0.15,-0.1,0.01,0.0,-0.02,0.02,-0.07,0.0,-0.04,-0.03,0.0,-0.02,0.03,-0.01
Integrated Gradient (x Input),-0.03,0.08,0.12,0.19,0.06,0.04,0.06,0.05,0.0,-0.0,-0.09,0.05,-0.03,0.01,0.03,0.04,0.12


#### Generate explanations for negative class. 

In [None]:
## TODO: Generate explanation for negative class and show the explanations in a table

explain_negclass = explainer.explain(text=sample_text, target=0)
explainer.show_table(explain_negclass)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Token,▁The,▁movie,▁had,▁great,▁narra,tion,▁and,▁visual,s,▁de,spite,▁a,▁bor,ing,▁story,line,.
Partition SHAP,-0.0,-0.04,0.01,-0.19,-0.06,0.0,-0.04,-0.04,-0.0,-0.06,-0.2,-0.01,0.12,0.15,-0.01,0.02,0.03
LIME,-0.01,-0.04,0.02,-0.26,-0.07,0.02,-0.03,-0.08,-0.03,-0.1,-0.07,0.0,0.14,0.05,-0.0,0.02,-0.08
Gradient,0.03,0.06,0.04,0.06,0.06,0.03,0.03,0.07,0.02,0.03,0.11,0.02,0.16,0.05,0.06,0.05,0.02
Gradient (x Input),0.07,0.07,-0.0,0.06,-0.02,0.02,0.04,-0.06,-0.03,-0.02,0.05,-0.01,-0.15,-0.07,0.03,0.09,-0.02
Integrated Gradient,-0.02,-0.04,0.0,0.02,0.03,-0.05,-0.01,-0.0,0.08,0.12,0.01,0.12,-0.2,0.03,0.06,-0.03,0.08
Integrated Gradient (x Input),0.06,-0.0,-0.04,-0.13,-0.07,-0.04,-0.08,-0.09,0.04,0.01,0.15,-0.06,0.11,0.01,-0.02,-0.04,-0.05


## Leave-one-out
Besides different explainers, one of the standard techniques is to use the Erasure method or Leave-one-out. Here we delete words from the text iteratively and measure change in prediction probabilities. Let us create our own whitespace tokenizer.

In [None]:
sample_text = "The movie had great narration and visuals despite a boring storyline"

## TODO: Tokenize the text by splitting it into each word. Then generate the sentence by leaving the one word
## Make sure the generated sentence has no additional white spaces 

tokenize_text = sample_text.split()
loo_texts = [' '.join(word for j, word in enumerate(tokenize_text) if i != j) for i, _ in enumerate(tokenize_text)]
print(loo_texts)

['movie had great narration and visuals despite a boring storyline', 'The had great narration and visuals despite a boring storyline', 'The movie great narration and visuals despite a boring storyline', 'The movie had narration and visuals despite a boring storyline', 'The movie had great and visuals despite a boring storyline', 'The movie had great narration visuals despite a boring storyline', 'The movie had great narration and despite a boring storyline', 'The movie had great narration and visuals a boring storyline', 'The movie had great narration and visuals despite boring storyline', 'The movie had great narration and visuals despite a storyline', 'The movie had great narration and visuals despite a boring']


In [None]:
## TODO: Generate scores for each of the leave one out sentences and tabulate the scores in a Dataframe corresponding to the word omitted
scores = [explainer.score(text) for text in loo_texts]
pd.DataFrame(scores, index=tokenize_text)

Unnamed: 0,negative,neutral,positive
The,0.22385,0.206326,0.569824
movie,0.263907,0.219848,0.516245
had,0.120151,0.142724,0.737124
great,0.287786,0.29973,0.412484
narration,0.267983,0.214153,0.517864
and,0.188767,0.190426,0.620807
visuals,0.184359,0.182071,0.633569
despite,0.859677,0.100246,0.040077
a,0.112969,0.161386,0.725645
boring,0.355356,0.232932,0.411712


## Open ended explanations using Language Models(LM)

Besides the conventional methods of analyzing the given prompt, we will try open ended language models to analyze why the reviews have particular sentiment. Let us try previous `sample_text` we used and modify it slightly to make it incomplete. Then, we will use BLOOM to fill out the incomplete sentence.The architecture of BLOOM is essentially similar to GPT3 with over 176B parameters. However, we will use a variant of BLOOM with 560M parameters, which will generate text faster. For documentation of generator, refer [here](https://huggingface.co/docs/transformers/main_classes/text_generation)

In [None]:
# Import BLOOM tokenizer and generator from transformers
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer_bloom = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")
model_bloom = BloomForCausalLM.from_pretrained("bigscience/bloom-560m").to(device)

# Let's add text at the end of sample text to see why the sentence has positive sentiment. Also define a output length
prompt_text = sample_text + " has a positive sentiment. This is because"
print(prompt_text)
output_length = 100 # Feel free to change this

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because


In [None]:
## TODO: Tokenize the sentence as tensors and then use the model to generate a complete sentence with a max length
inputs = tokenizer_bloom.encode(prompt_text, return_tensors="pt").to(device)
gen1 = model_bloom.generate(inputs, max_length=200)[0]
print(tokenizer_bloom.decode(gen1, skip_special_tokens=True))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because the movie is a comedy and the characters are not bad. The movie is a good movie for the youngsters. The movie is a good movie for the adults. The movie is a good movie for the seniors. The movie is a good movie for the parents. The movie is a good movie for the teachers. The movie is a good movie for the students. The movie is a good movie for the parents. The movie is a good movie for the teachers. The movie is a good movie for the students. The movie is a good movie for the parents. The movie is a good movie for the teachers. The movie is a good movie for the students. The movie is a good movie for the parents. The movie is a good movie for the teachers. The movie is a good movie for the students. The movie is a good movie for the parents. The movie is a good movie


Notice that the sentences are somewhat repetive. Let us avoid this by adding a penalty for repetition. To understand repetition penalty (penalized sampling), refer to section 4 in this [paper](https://arxiv.org/pdf/1909.05858.pdf)

In [None]:
## TODO: Add a penalty for repetition
gen2 = model_bloom.generate(inputs, max_length=200, repetition_penalty = 1.2)[0]
print(tokenizer_bloom.decode(gen2, skip_special_tokens=True))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because the film was written by an actor who plays himself in his own life, which makes it more realistic.
In this case we have to say that there are some scenes where you can see how he feels about being alone with her (and not having any other girl). He also talks openly on what happened during their relationship but does so without making him feel guilty or ashamed of anything else as well. (He even says “It’s okay” when she asks for help).
This scene shows us exactly why they were together after all these years! They both love eachother very much; however it’s important here if someone wants them back then just ask yourself whether you’re willing enough!
I think I would like my friends too…..but I’m afraid I’ll never get around doing something similar again…
If your looking at me now please don’t tell anyone!! You’re going crazy!!! And that’s ok – I’ve got no


In [None]:
## TODO: We will also use sampling to predict the next word in the sequence to make the sentence more common
gen3 = model_bloom.generate(inputs, max_length=200, repetition_penalty = 1.2, do_sample=True)[0]
print(tokenizer_bloom.decode(gen3, skip_special_tokens=True))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because an actor always brings lots of new faces to the blockbuster style movies, like that in Vibe 2 where they starred as two brothers named Alistair Blarney (the one who played John Cusp) & Michael Pinsky aka Jack Lang.
It wasn’t exactly fun or realistic nor memorable but I still love it for this reason :) (This made me wish there were some actual actors acting onscreen)
I have been following your work since you turned out Dustin Button’s “Dead By Deed” with him playing Rusty Reid before he got caught at high-school by his girlfriend Julia Kelley. (he just started starring again.) Then after all those years I’ve come back looking over so many photos… And really not any more!
Yea that’s what I’m saying though – i’ve never seen anyone else doing anything other than do my favorites
That would


In text-generation, a good model tries to sample from a huge pool of words. While always selecting the word with highest likelihood will result in repetitions, selecting at random may lead to vague or uncommon sentences. A common practice is to use top-k or top-p approaches (see [article]([here](https://docs.cohere.ai/docs/controlling-generation-with-top-k-top-p))) to limit the sample of words and eliminate long tails. Now, explore the model generator by tuning the hyper-parameters. Refer the [blog post](https://huggingface.co/blog/how-to-generate) on 'how to generate'

In [None]:
## TODO: Tweak the hyper parameters for our use case. 
## Use top_p and top_k to have better sampling of words
gen4 = model_bloom.generate(inputs, max_length=200, repetition_penalty = 1.2, do_sample=True, top_p=0.9)[0]
print(tokenizer_bloom.decode(gen4))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because the characters get their way when they are dealing with things like revenge, love or death.
A man who can deal easily will be able to overcome whatever adversity that comes his way: not being an idiot nor losing someone important for himself while at work just about every day in life (for more on this topic you have read my review of “Do Not Call Me God”).
Once he gets over everything there was happening during last time out date but now no one around him makes any effort
If you’re looking forward toward meeting some new people after all your relationships don’t seem working then here’s something worth considering as soon it finally happens: it’s been two years since I made such huge decisions regarding how much sex I’d want – although I’ve changed quite many times yet…
When we look back our thoughts surrounding each other were still pretty negative… We didn’t talk enough? Wel

In [None]:
## TODO: Tweak the hyper parameters for our use case. 
## Use top_p and top_k to have better sampling of words
gen5 = model_bloom.generate(inputs, max_length=200, repetition_penalty = 1.2, do_sample=True, top_k=50)[0]
print(tokenizer_bloom.decode(gen5))

The movie had great narration and visuals despite a boring storyline has a positive sentiment. This is because the characters (or at least me I suppose) are well built, strong in their humanity.
We know these times that we should only speak after having actually come across or have seen such an instance of how good life would be without this terrible tragedy to happen unforeseenly!
Amen – you said everything it means by human beings here–even death they never get what was happening so for us all our lives were taken as fatal! Well now it’s time? But when does everyone become better?
For those who can’t stop talking about sadness like your mum didn’t realise he’s getting one tomorrow…but i will write down every bit…
You tell her not yet she’ll grow old enough….I love being able read more than someone else but if anything makes anyone less happy today maybe reading fiction while watching Netflix brings peace into my heart too!!! x</s>


## Outro

Well done Data Scientist! Now that we've seen different methods for analzying  the sentiment predicts, its tiem to answer some questions!

1. What are your thoughts on Interpretable AI for Text Classification?
2. Compare the various explanations. Which method do you agree with most, why?
3. Do you think the Language Models(Open Ended explanations) capture the sentiment well and explain them? Did fine tuning the parameters help and what worked the best for you?

1. Providing classification results without any explicit additional information won't convince people to trust your predictions. How certain are the predictions? What are the main drivers towards a specific decision? XAI has definitely its reason for being in text classification. 
2. I like the SHAP and LIME approaches here. Compared to the others they do not only seem to be able to correctly classify positive/ negative words leading to a specifc sentiment, but at the same time they provide their explanations in a contrastive and comprehensive way.
3. Language Models(Open Ended explanations) can be a good approach to get an even deeper understanding of the to be classified domain. However, you need to be cautious. LM tends to starts halluzinating very quickly (e.g. without having specified any film content, it starts to talk about Darth Vader). That leads me to the conclusion, that they might be helpful, but just as a complement to other explainable methods.
-> *penalizing repetition*, *sampling* definitely improved the results dramatically as the text became way more natural; *top_k* and *top_p* also seemed to be helpful, even though it's difficult to generalize which of these two approaches works better. 

## Bonus
Kudos👏! It is amazing you made it here. In the bonus section, let us apply our model to some real world data. 



*   You can use either use hugging face `imdb` dataset or a review from anywhere for any movie. Get the sentiment for the review and see which words are most important for the sentiment by the methods we used earlier
*   The `cardiffnlp/twitter-xlm-roberta-base-sentiment` is not trained on imdb dataset. Lets us see if using a model trained on imdb dataset can give us better results. You can try `distilbert-imdb` from [here](https://huggingface.co/lvwerra/distilbert-imdb) or other models trained/fined-tuned on imdb from [here](https://huggingface.co/datasets/imdb)



---


Answer the following questions once you complete the analysis:

1. Do you think XAI is useful to understand the sentiment behind real-world movie reviews? 
2. Based on your observations, would your recommendation change from what it was previously?
3. Does training/fine-tuning help a model to understand and interpret sentiment better?

1. Yes, I think XAI does increase transparency. However, the longer the text inputs, the more most of the methods hit their limits as their explanatory power decreases. 
2. LIME and IG once again achieves solid results; in contrast to images SHAP also provides helpful insights. All of them classify "great" to be positive, whereas "boring" is classified as negative.
3. Finetuning helped to get more constrastive results for LIME and SHAP; however, IG results suffered from finetuning. 

In [None]:
## Here is starter code to download the imdb dataset. You could also try any review for any movie.
from datasets import load_dataset
dataset = load_dataset("imdb")
dataset['test'][10]

Downloading builder script:   0%|          | 0.00/4.31k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

Downloading and preparing dataset imdb/plain_text to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1...


Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Dataset imdb downloaded and prepared to /root/.cache/huggingface/datasets/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

{'text': 'This flick is a waste of time.I expect from an action movie to have more than 2 explosions and some shooting.Van Damme\'s acting is awful. He never was much of an actor, but here it is worse.He was definitely better in his earlier movies. His screenplay part for the whole movie was probably not more than one page of stupid nonsense one liners.The whole dialog in the film is a disaster, same as the plot.The title "The Shepherd" makes no sense. Why didn\'t they just call it "Border patrol"? The fighting scenes could have been better, but either they weren\'t able to afford it, or the fighting choreographer was suffering from lack of ideas.This is a cheap low type of action cinema.',
 'label': 0}

## Xlm-roberta model

In [None]:
# take review #10 as a test sample
sample = dataset['test'][10]['text']
print(explainer.score(sample))

{'negative': 0.9212380051612854, 'neutral': 0.0584341436624527, 'positive': 0.020327821373939514}


In [None]:
# highlight words being classified as negative
explain_neg_class = explainer.explain(text=sample, target=0)
explainer.show_table(explain_neg_class)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,▁This,▁f,lick,▁is,▁a,▁was,te,▁of,▁time,.,I,▁expect,▁from,▁an,▁action,▁movie,▁to,▁have,▁more,▁than,▁2,▁explo,sions,▁and,▁some,▁shooting,..1,Van,▁Da,mme,',s,▁ac,ting,▁is.1,▁a.1,w,ful,..2,▁He,▁never,▁was.1,▁much,▁of.1,▁an.1,▁actor,",",▁but,▁here,▁it,▁is.2,▁worse,..3,He,▁was.2,▁definitely,▁better,▁in,▁his,▁earlier,▁movies,..4,▁His,▁screen,play,▁part,▁for,▁the,▁whole,▁movie.1,▁was.3,▁probably,▁not,▁more.1,▁than.1,▁one,▁page,▁of.2,▁stupid,▁non,sense,▁one.1,▁li,ners,..5,The,▁whole.1,▁dialog,▁in.1,▁the.1,▁film,▁is.3,▁a.2,▁disa,ster,",.1",▁same,▁as,▁the.2,▁plot,..6,The.1,▁title,"▁""",The.2,▁She,pher,d,"""",▁makes,▁no,▁sense,..7,▁Why,▁didn,'.1,t,▁they,▁just,▁call,▁it.1,"▁"".1",B,order,▁patrol,"""?",▁The,▁fighting,▁scene,s.1,▁could,▁have.1,▁been,▁better.1,",.2",▁but.1,▁either,▁they.1,▁were,n,'.2,t.1,▁able,▁to.1,▁afford,▁it.2,",.3",▁or,▁the.3,▁fighting.1,▁cho,reo,graph,er,▁was.4,▁suffering,▁from.1,▁lack,▁of.3,▁ideas,..8,This,▁is.4,▁a.3,▁cheap,▁low,▁type,▁of.4,▁action.1,▁cinema,..9
Partition SHAP,0.01,0.01,0.01,0.02,0.02,0.02,0.02,0.01,0.01,0.01,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.03,0.03,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.02,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
LIME,0.01,0.01,0.0,0.01,0.01,0.04,0.02,0.01,0.0,0.01,-0.01,-0.01,-0.0,-0.0,-0.0,-0.0,0.0,0.0,-0.0,-0.0,-0.0,0.0,0.0,0.0,-0.01,0.01,0.01,-0.0,0.0,-0.0,0.0,0.0,0.01,-0.0,0.01,0.02,0.03,0.01,0.0,-0.0,-0.0,0.0,0.0,0.0,-0.0,0.01,-0.01,-0.0,-0.0,-0.0,0.01,0.03,0.0,-0.0,-0.0,-0.0,-0.01,-0.0,-0.0,-0.01,-0.01,0.0,0.0,0.0,-0.0,-0.0,-0.0,0.0,0.0,-0.0,-0.0,-0.0,-0.0,0.0,-0.0,-0.0,-0.01,0.0,0.05,0.03,0.03,-0.01,0.01,-0.0,0.0,0.01,-0.0,-0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.02,-0.0,-0.0,0.0,-0.0,-0.0,-0.0,0.0,-0.0,-0.01,0.0,-0.0,-0.0,0.0,-0.01,0.01,0.01,0.01,0.0,0.02,-0.0,0.01,-0.01,0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.01,0.0,-0.0,0.01,0.0,-0.0,-0.0,0.0,0.01,-0.0,-0.0,-0.0,-0.0,0.01,0.01,-0.0,0.0,-0.0,-0.0,-0.0,0.01,-0.0,-0.0,0.0,-0.0,0.01,0.0,-0.0,0.0,-0.01,-0.0,0.03,0.0,0.01,0.01,0.01,0.0,0.02,0.01,0.0,0.02,0.01,0.01,0.0,0.0,0.01,0.0
Gradient,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.03,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.01,0.01
Gradient (x Input),-0.0,-0.0,0.01,-0.0,-0.0,-0.01,-0.01,-0.01,-0.0,-0.01,-0.01,-0.0,-0.0,-0.0,-0.0,-0.01,0.0,-0.0,-0.0,-0.0,-0.0,0.0,0.0,-0.0,-0.01,-0.0,-0.01,0.0,-0.0,0.01,0.02,0.0,-0.0,-0.0,-0.0,0.0,-0.0,0.0,-0.01,-0.0,0.0,-0.0,0.0,-0.0,-0.01,0.02,-0.01,-0.01,0.0,-0.01,-0.02,0.01,-0.01,-0.01,0.01,0.0,0.01,-0.0,-0.0,0.01,0.0,-0.0,0.0,-0.01,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.01,-0.0,0.0,0.0,0.0,-0.0,-0.0,-0.0,0.0,0.0,-0.0,-0.0,-0.01,-0.01,-0.0,-0.0,-0.0,-0.01,-0.0,-0.0,0.0,-0.03,-0.01,-0.0,-0.0,0.0,-0.0,0.0,-0.0,-0.01,0.0,-0.01,-0.01,-0.01,-0.0,0.0,-0.0,0.01,-0.01,0.01,-0.01,0.02,-0.02,-0.0,-0.0,-0.01,0.01,0.03,-0.01,0.02,0.01,0.0,0.01,-0.09,-0.0,-0.02,0.0,0.0,-0.0,-0.0,0.0,0.0,0.0,0.0,0.01,-0.0,-0.0,0.0,-0.0,-0.0,0.01,-0.0,0.01,-0.01,0.0,-0.0,-0.0,0.0,-0.0,0.01,0.0,-0.0,-0.0,0.0,-0.0,-0.0,-0.0,-0.0,-0.01,-0.01,-0.0,-0.0,0.03,0.0,0.0,-0.0,-0.0,-0.0,-0.01
Integrated Gradient,0.01,-0.0,0.03,-0.01,-0.01,0.05,-0.01,-0.02,-0.0,0.0,-0.0,0.01,0.0,0.01,-0.01,0.0,0.0,0.01,-0.0,0.01,-0.01,-0.01,0.0,0.01,0.0,-0.0,0.01,0.01,0.0,-0.01,0.0,-0.01,-0.0,0.0,-0.01,0.01,0.04,-0.02,0.02,0.0,0.0,0.0,0.0,0.0,-0.0,0.0,-0.0,-0.0,-0.0,-0.0,0.0,-0.01,0.0,0.0,-0.01,0.0,0.0,0.0,-0.01,-0.0,-0.0,0.01,-0.0,-0.01,-0.0,0.0,-0.0,-0.01,0.01,0.0,-0.0,0.02,-0.0,0.01,-0.0,0.0,0.0,-0.0,-0.02,-0.02,0.02,0.0,0.0,-0.0,0.0,0.0,0.01,0.0,-0.0,-0.0,-0.0,-0.0,0.0,0.01,0.0,0.0,-0.01,0.0,-0.01,-0.0,-0.0,0.0,0.0,0.0,-0.0,-0.01,0.01,-0.0,0.0,-0.0,0.0,0.0,0.01,-0.01,-0.0,-0.0,-0.0,0.0,-0.01,-0.01,-0.0,-0.01,-0.0,0.0,-0.0,0.0,0.0,0.0,0.01,-0.0,0.0,0.0,0.0,-0.0,0.01,-0.0,-0.0,0.0,0.0,0.0,0.0,0.0,0.01,-0.0,-0.01,0.0,-0.0,-0.0,0.0,-0.0,-0.0,0.0,0.01,0.0,0.0,-0.01,-0.0,-0.01,0.01,0.0,0.01,-0.01,-0.0,0.0,-0.01,0.0,0.01,-0.01,-0.01,0.03,0.0
Integrated Gradient (x Input),0.0,0.01,-0.0,-0.0,-0.01,0.07,0.05,0.01,-0.0,-0.0,0.0,-0.0,0.0,0.0,-0.0,0.01,0.0,0.0,-0.0,-0.0,-0.0,0.01,0.0,0.0,-0.0,0.01,0.01,0.01,0.0,0.01,0.0,-0.0,0.0,0.01,-0.0,0.01,0.05,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,-0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,-0.0,-0.0,-0.01,-0.01,-0.02,0.0,0.0,0.0,-0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.05,0.01,0.03,0.0,0.0,0.0,0.01,0.01,0.01,-0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,-0.0,0.0,0.0,0.0,-0.0,0.01,0.01,0.0,0.0,0.0,0.0,-0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,-0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,-0.0,0.0,-0.0,-0.0,-0.01,0.0,0.0,-0.0,-0.0,-0.01,-0.01,-0.0,0.0,-0.0,-0.0,0.0,-0.0,-0.0,0.0,-0.01,-0.01,-0.0,-0.0,-0.0,0.0,0.0,0.0,-0.0,-0.0,0.0,0.0,0.02,0.0,0.01,0.01,-0.01,-0.0,-0.0,-0.0,-0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.01


#### Leave-one-out

In [None]:
# Helper function to remove special characters
import re

def remove_special_characters(text):
    # remove whitespaces and any html tags
    pattern_WSPACE = re.compile(r'\s+', re.IGNORECASE)
    pattern_TAGS = re.compile(r"<[^>]+>")
    
    # keep only ASCII + European characters and whitespace, no digits
    pattern_ASCII = re.compile(r'[^A-Za-zÀ-ž ]', re.IGNORECASE)
    
    # keep punctuation
    RE_ASCII = re.compile(r'[^A-Za-zÀ-ž,.!? ]', re.IGNORECASE)
    RE_SINGLECHAR = re.compile(r'\b[A-Za-zÀ-ž,.!?]\b', re.IGNORECASE)

    # replace the leftmost non-overlapping occurrences of pattern in string by the replacement
    text = re.sub(pattern_TAGS, ' ', text)
    text = re.sub(pattern_ASCII, ' ', text)
    text = re.sub(pattern_WSPACE, ' ', text)
    return text

In [None]:
## TODO: Tokenize the text by splitting it into each word. Then generate the sentence by leaving the one word
## Make sure the generated sentence has no additional white spaces 
tokenize_text = remove_special_characters(sample).split()
loo_texts = [' '.join(word for j, word in enumerate(tokenize_text) if i != j) for i, _ in enumerate(tokenize_text)]
print(loo_texts)

['flick is a waste of time I expect from an action movie to have more than explosions and some shooting Van Damme s acting is awful He never was much of an actor but here it is worse He was definitely better in his earlier movies His screenplay part for the whole movie was probably not more than one page of stupid nonsense one liners The whole dialog in the film is a disaster same as the plot The title The Shepherd makes no sense Why didn t they just call it Border patrol The fighting scenes could have been better but either they weren t able to afford it or the fighting choreographer was suffering from lack of ideas This is a cheap low type of action cinema', 'This is a waste of time I expect from an action movie to have more than explosions and some shooting Van Damme s acting is awful He never was much of an actor but here it is worse He was definitely better in his earlier movies His screenplay part for the whole movie was probably not more than one page of stupid nonsense one line

In [None]:
## TODO: Generate scores for each of the leave one out sentences and tabulate the scores in a Dataframe corresponding to the word omitted
scores = [explainer.score(text) for text in loo_texts]
pd.DataFrame(scores, index=tokenize_text)

Unnamed: 0,negative,neutral,positive
This,0.944181,0.041848,0.013971
flick,0.941131,0.044068,0.014801
is,0.944321,0.041441,0.014239
a,0.944105,0.041502,0.014393
waste,0.926557,0.056035,0.017408
...,...,...,...
low,0.941124,0.044057,0.014819
type,0.942300,0.043046,0.014654
of,0.942141,0.043197,0.014662
action,0.941951,0.043350,0.014699


## Distilbert model

In [None]:
name = "lvwerra/distilbert-imdb" 

##TODO: Build pre-trained model and tokenizer using  AutoModelForSequenceClassification, AutoTokenizer
## Make sure to load the model onto the device for gpu

model_imdb = AutoModelForSequenceClassification.from_pretrained(name).to(device)
tokenizer_imdb = AutoTokenizer.from_pretrained(name)

In [None]:
##TODO: Build explainer using `Benchmark` function 
explainer_imdb = Benchmark(model_imdb, tokenizer_imdb)

In [None]:
##TODO: Use `score` method to obtain the class scores and print it
print(explainer_imdb.score(sample))

{'NEGATIVE': 0.9958101511001587, 'POSITIVE': 0.004189860541373491}


In [None]:
# highlight words being classified as negative
explain_neg_class = explainer_imdb.explain(text=sample, target=0)
explainer_imdb.show_table(explain_neg_class)

Explainer:   0%|          | 0/6 [00:00<?, ?it/s]

Unnamed: 0,this,flick,is,a,waste,of,time,.,i,expect,from,an,action,movie,to,have,more,than,2,explosions,and,some,shooting,..1,van,dam,##me,',s,acting,is.1,awful,..2,he,never,was,much,of.1,an.1,actor,",",but,here,it,is.2,worse,..3,he.1,was.1,definitely,better,in,his,earlier,movies,..4,his.1,screenplay,part,for,the,whole,movie.1,was.2,probably,not,more.1,than.1,one,page,of.2,stupid,nonsense,one.1,liner,##s,..5,the.1,whole.1,dial,##og,in.1,the.2,film,is.3,a.1,disaster,",.1",same,as,the.3,plot,..6,the.4,title,"""",the.5,shepherd,""".1",makes,no,sense,..7,why,didn,'.1,t,they,just,call,it.1,""".2",border,patrol,""".3",?,the.6,fighting,scenes,could,have.1,been,better.1,",.2",but.1,either,they.1,weren,'.2,t.1,able,to.1,afford,it.2,",.3",or,the.7,fighting.1,choreographer,was.3,suffering,from.1,lack,of.3,ideas,..8,this.1,is.4,a.2,cheap,low,type,of.4,action.1,cinema,..9
Partition SHAP,0.02,0.01,0.01,0.01,0.11,0.02,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.03,0.06,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,-0.0,-0.0,-0.01,-0.01,-0.01,-0.01,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,-0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.01,-0.0,-0.0,-0.0,-0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.02,0.01,0.01,0.01,0.01,0.01,0.01
LIME,0.0,-0.0,-0.01,-0.01,0.06,0.01,-0.0,0.0,-0.0,-0.0,-0.01,-0.0,0.0,0.01,0.01,-0.0,-0.0,0.0,0.0,-0.0,0.0,0.01,0.01,0.01,-0.01,-0.0,0.0,0.0,0.01,0.01,0.01,0.05,0.01,-0.0,-0.0,0.0,-0.0,-0.01,0.0,0.01,0.0,-0.01,-0.0,-0.0,-0.0,0.04,-0.0,-0.0,-0.0,-0.03,-0.03,-0.01,-0.01,0.0,-0.01,-0.02,-0.0,0.0,0.0,-0.01,-0.0,0.01,-0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.02,0.01,0.0,-0.0,-0.0,0.01,0.0,0.01,0.0,-0.0,0.0,-0.0,0.01,0.01,0.0,0.01,0.01,-0.01,0.0,0.0,0.02,0.0,0.0,-0.0,0.0,-0.0,0.0,-0.0,0.01,0.01,0.01,0.01,-0.0,0.01,0.01,0.01,0.0,-0.0,0.01,-0.0,-0.01,0.0,-0.0,0.0,0.01,-0.01,0.0,-0.0,0.01,-0.01,0.0,0.01,-0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,-0.0,0.0,0.0,0.0,-0.0,0.0,0.01,0.0,0.01,-0.0,0.01,0.0,-0.0,0.02,0.01,0.01,0.0,0.0,-0.0,-0.01
Gradient,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
Gradient (x Input),-0.0,0.01,-0.0,0.0,0.01,-0.01,0.0,0.0,0.0,0.01,-0.0,0.0,0.01,0.02,-0.0,0.0,0.0,0.0,-0.0,0.01,-0.0,0.0,0.01,0.0,0.0,0.01,0.0,-0.0,-0.0,0.01,-0.0,0.01,0.0,-0.0,0.01,-0.01,0.0,-0.01,0.0,0.01,-0.0,0.0,0.0,-0.0,-0.0,0.01,0.0,-0.0,-0.01,0.01,0.01,-0.01,0.0,-0.0,0.02,0.0,0.0,0.02,-0.0,-0.0,-0.0,0.01,0.02,-0.01,0.0,-0.0,0.0,0.0,0.0,0.01,-0.01,0.01,0.02,0.0,0.01,-0.0,0.0,-0.0,0.01,0.02,0.01,-0.01,-0.0,0.02,-0.0,0.0,0.01,-0.0,-0.0,-0.0,-0.0,0.01,0.0,-0.0,-0.0,0.0,-0.0,0.0,0.0,0.01,-0.0,0.01,0.0,0.01,0.01,-0.0,0.01,0.0,0.0,0.01,-0.0,0.0,0.0,0.01,0.0,0.01,-0.0,0.01,0.01,0.01,0.0,0.0,0.01,-0.0,0.0,0.0,0.0,0.02,-0.0,0.01,0.01,-0.0,0.0,-0.0,-0.0,0.0,-0.0,0.01,0.02,-0.01,0.01,-0.0,0.0,-0.01,0.01,0.0,-0.0,-0.0,0.0,0.01,0.0,0.0,-0.01,0.01,0.01,0.0
Integrated Gradient,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01,0.01
Integrated Gradient (x Input),-0.01,-0.0,-0.01,-0.01,0.01,-0.01,-0.0,-0.01,-0.01,0.0,-0.01,-0.01,0.01,0.01,-0.01,-0.0,-0.0,-0.0,-0.0,0.0,-0.01,-0.0,-0.0,-0.01,-0.01,0.0,0.0,-0.0,-0.01,-0.0,-0.01,0.0,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,0.0,-0.01,-0.01,-0.01,-0.01,-0.01,0.0,-0.01,-0.01,-0.01,-0.0,0.0,-0.01,-0.01,-0.01,0.0,-0.01,-0.01,-0.0,-0.01,-0.01,-0.01,0.0,0.01,-0.01,-0.01,-0.01,-0.0,-0.0,-0.01,-0.0,-0.01,0.0,0.01,-0.01,-0.0,-0.0,-0.01,-0.01,0.0,0.0,-0.0,-0.01,-0.01,0.0,-0.01,-0.01,0.0,-0.01,-0.0,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,-0.0,-0.01,0.0,-0.0,-0.0,-0.01,0.0,-0.01,-0.0,-0.0,-0.0,0.0,-0.01,-0.0,-0.0,-0.01,-0.0,0.0,0.0,-0.0,-0.01,-0.01,0.0,-0.0,0.0,-0.0,-0.01,0.0,-0.01,-0.01,-0.01,-0.01,0.0,-0.0,0.0,0.0,-0.01,-0.01,-0.01,-0.01,-0.01,-0.01,0.0,0.0,-0.01,-0.0,-0.01,-0.0,-0.01,-0.0,-0.01,-0.01,-0.01,-0.01,-0.0,-0.0,-0.0,-0.01,0.01,-0.0,-0.01


#### Leave-one-out

In [None]:
## TODO: Generate scores for each of the leave one out sentences and tabulate the scores in a Dataframe corresponding to the word omitted
scores = [explainer_imdb.score(text) for text in loo_texts]
pd.DataFrame(scores, index=tokenize_text)

Unnamed: 0,NEGATIVE,POSITIVE
This,0.995793,0.004207
flick,0.996030,0.003970
is,0.996146,0.003854
a,0.995961,0.004039
waste,0.994383,0.005617
...,...,...
low,0.996088,0.003912
type,0.996207,0.003793
of,0.996161,0.003839
action,0.996164,0.003836


## Open ended explanations

In [None]:
# Import BLOOM tokenizer and generator from transformers
from transformers import BloomTokenizerFast, BloomForCausalLM
tokenizer_bloom = BloomTokenizerFast.from_pretrained("bigscience/bloom-560m")
model_bloom = BloomForCausalLM.from_pretrained("bigscience/bloom-560m").to(device)

# Let's add text at the end of sample text to see why the sentence has negative sentiment. Also define a output length
prompt_text = sample + " has a negative sentiment. This is because"
print(prompt_text)
output_length = 100 # Feel free to change this

This flick is a waste of time.I expect from an action movie to have more than 2 explosions and some shooting.Van Damme's acting is awful. He never was much of an actor, but here it is worse.He was definitely better in his earlier movies. His screenplay part for the whole movie was probably not more than one page of stupid nonsense one liners.The whole dialog in the film is a disaster, same as the plot.The title "The Shepherd" makes no sense. Why didn't they just call it "Border patrol"? The fighting scenes could have been better, but either they weren't able to afford it, or the fighting choreographer was suffering from lack of ideas.This is a cheap low type of action cinema. has a negative sentiment. This is because


In [None]:
## TODO: Tokenize the sentence as tensors and then use the model to generate a complete sentence with a max length
inputs = tokenizer_bloom.encode(prompt_text, return_tensors="pt").to(device)
gen6 = model_bloom.generate(inputs, max_length=200, repetition_penalty = 1.2, do_sample=True, top_p=0.9)[0]
print(tokenizer_bloom.decode(gen6))

This flick is a waste of time.I expect from an action movie to have more than 2 explosions and some shooting.Van Damme's acting is awful. He never was much of an actor, but here it is worse.He was definitely better in his earlier movies. His screenplay part for the whole movie was probably not more than one page of stupid nonsense one liners.The whole dialog in the film is a disaster, same as the plot.The title "The Shepherd" makes no sense. Why didn't they just call it "Border patrol"? The fighting scenes could have been better, but either they weren't able to afford it, or the fighting choreographer was suffering from lack of ideas.This is a cheap low type of action cinema. has a negative sentiment. This is because I don't want my family watching me with bad blood on their faces.
You should watch these two things first before you start buying them if your thinking that something like this will make up all its
