Movie Review Dataset. This is a dataset of containing 5,331 positive and 5,331 negative processed sentences from Rotten Tomatoes movie reviews.

**default**

text: a string feature.

label: a classification label, with possible values including neg (0), pos (1).

**Data Distribution**

name = default

train	= 8530

validation	= 1066

test = 1066

In [None]:
!pip install -U datasets fsspec

In [1]:
from datasets import load_dataset

ds = load_dataset("rotten_tomatoes")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [2]:
ds

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 1066
    })
})

In [4]:
ds["train"][0, -1]

{'text': ['the rock is destined to be the 21st century\'s new " conan " and that he\'s going to make a splash even greater than arnold schwarzenegger , jean-claud van damme or steven segal .',
  'things really get weird , though not particularly scary : the movie is all portent and no content .'],
 'label': [1, 0]}

These short reviews are either labeled as positive (1) or negative (0). This
 means that we will focus on binary sentiment classification.

In [10]:
import pandas as pd
import numpy as np

In [2]:
ds.column_names

{'train': ['text', 'label'],
 'validation': ['text', 'label'],
 'test': ['text', 'label']}

In [7]:
df = pd.DataFrame(ds['train'][:])
print(df.head())

                                                text  label
0  the rock is destined to be the 21st century's ...      1
1  the gorgeously elaborate continuation of " the...      1
2                     effective but too-tepid biopic      1
3  if you sometimes like to go to the movies to h...      1
4  emerges as something rare , an issue movie tha...      1


##Using Task-specifc models

In [5]:
 from transformers import pipeline
 # Path to our HF model
 model_path = "cardiffnlp/twitter-roberta-base-sentiment-latest"


In [6]:
# Load model into pipeline
pipe = pipeline(
model=model_path,
tokenizer=model_path,
return_all_scores=True,
device="cuda:0")

Some weights of the model checkpoint at cardiffnlp/twitter-roberta-base-sentiment-latest were not used when initializing RobertaForSequenceClassification: ['roberta.pooler.dense.bias', 'roberta.pooler.dense.weight']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cuda:0


In [9]:
import numpy as np
from tqdm import tqdm
from transformers.pipelines.pt_utils import KeyDataset

# Run inference
y_pred = []

for output in tqdm(pipe(KeyDataset(ds["test"], "text")), total=len(ds["test"])):
    negative_score = output[0]["score"]
    positive_score = output[2]["score"]

    # Assign class based on the higher score
    assignment = np.argmax([negative_score, positive_score])
    y_pred.append(assignment)


100%|██████████| 1066/1066 [00:12<00:00, 86.27it/s]


In [10]:
from sklearn.metrics import classification_report

def evaluate_performance(y_true, y_pred):
    """Create and print the classification report"""
    performance = classification_report(
        y_true,
        y_pred,
        target_names=["Negative Review", "Positive Review"]
    )
    print(performance)

In [12]:
evaluate_performance(ds["test"]["label"], y_pred)

                 precision    recall  f1-score   support

Negative Review       0.76      0.88      0.81       533
Positive Review       0.86      0.72      0.78       533

       accuracy                           0.80      1066
      macro avg       0.81      0.80      0.80      1066
   weighted avg       0.81      0.80      0.80      1066



Using another model specifically trained for reviews sentiment

In [13]:
from transformers import pipeline
from datasets import load_dataset
from transformers.pipelines.pt_utils import KeyDataset
import numpy as np
from tqdm import tqdm
from sklearn.metrics import classification_report, confusion_matrix

ds = load_dataset("rotten_tomatoes")

# Load text classification pipeline with all class scores
pipe = pipeline(
    "text-classification",
    model="distilbert/distilbert-base-uncased-finetuned-sst-2-english",
    top_k=None,
    device=0     # Use GPU if available; use device=-1 for CPU
)

#  inference
y_pred = []
for output in tqdm(pipe(KeyDataset(ds["test"], "text")), total=len(ds["test"])):
    scores = {item["label"]: item["score"] for item in output}

    assignment = np.argmax([scores["NEGATIVE"], scores["POSITIVE"]])
    y_pred.append(assignment)

# Ground truth labels
y_true = ds["test"]["label"]


def evaluate_performance(y_true, y_pred):
    """Create and print the classification report and confusion matrix"""
    print("Classification Report:")
    print(classification_report(y_true, y_pred, target_names=["Negative Review", "Positive Review"]))

    print("\nConfusion Matrix:")
    print(confusion_matrix(y_true, y_pred))


evaluate_performance(y_true, y_pred)


Device set to use cuda:0
100%|██████████| 1066/1066 [00:05<00:00, 202.46it/s]

Classification Report:
                 precision    recall  f1-score   support

Negative Review       0.89      0.90      0.90       533
Positive Review       0.90      0.89      0.90       533

       accuracy                           0.90      1066
      macro avg       0.90      0.90      0.90      1066
   weighted avg       0.90      0.90      0.90      1066


Confusion Matrix:
[[481  52]
 [ 58 475]]





##using Embedding Models

In [5]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sentence-transformers/all-mpnet-base-v2")

In [6]:
#converting text into emebdings
train_embeddings = model.encode(ds["train"]["text"],
show_progress_bar=True)
test_embeddings = model.encode(ds["test"]["text"],
show_progress_bar=True)

Batches:   0%|          | 0/267 [00:00<?, ?it/s]

Batches:   0%|          | 0/34 [00:00<?, ?it/s]

In [7]:
#this will show that each embedding(8530) cotains 768 values
train_embeddings.shape

(8530, 768)

Now that we got embeddings features we will train a classifier now

In [24]:
from sklearn.linear_model import LogisticRegression
# Train a logistic regression on our train embeddings
clf = LogisticRegression(random_state=42)
#random state control the randomness
clf.fit(train_embeddings, ds["train"]["label"])

In [26]:
# Predict previously unseen instances
y_pred = clf.predict(test_embeddings)
evaluate_performance(ds["test"]["label"], y_pred)

Classification Report:
                 precision    recall  f1-score   support

Negative Review       0.85      0.86      0.85       533
Positive Review       0.86      0.85      0.85       533

       accuracy                           0.85      1066
      macro avg       0.85      0.85      0.85      1066
   weighted avg       0.85      0.85      0.85      1066


Confusion Matrix:
[[457  76]
 [ 82 451]]


Zero Shot Classification

In [8]:
# Create embeddings for our labels
label_embeddings = model.encode(["A negative movie review",  "A positive movie review"])

In [11]:
 from sklearn.metrics.pairwise import cosine_similarity
 # Find the best matching label for each document
 sim_matrix = cosine_similarity(test_embeddings, label_embeddings)
 y_pred = np.argmax(sim_matrix, axis=1)

In [14]:
evaluate_performance(ds["test"]["label"], y_pred)

Classification Report:
                 precision    recall  f1-score   support

Negative Review       0.89      0.90      0.90       533
Positive Review       0.90      0.89      0.90       533

       accuracy                           0.90      1066
      macro avg       0.90      0.90      0.90      1066
   weighted avg       0.90      0.90      0.90      1066


Confusion Matrix:
[[481  52]
 [ 58 475]]


Using T5 model - Text-to-Text Transfer Transformer

In [15]:
# Load our model
pipe = pipeline("text2text-generation",
model="google/flan-t5-small",
device="cuda:0")

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/308M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.54k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.42M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/2.20k [00:00<?, ?B/s]

Device set to use cuda:0


In [17]:
 # Prepare prompt
 prompt = "Is the following sentence positive or negative? "
 data = ds.map(lambda example: {"t5": prompt + example['text']})
 data

Map:   0%|          | 0/8530 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

Map:   0%|          | 0/1066 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 't5'],
        num_rows: 8530
    })
    validation: Dataset({
        features: ['text', 'label', 't5'],
        num_rows: 1066
    })
    test: Dataset({
        features: ['text', 'label', 't5'],
        num_rows: 1066
    })
})

In [18]:
#  inference
y_pred = []

for output in tqdm(pipe(KeyDataset(data["test"], "t5")), total=len(data["test"])):
    text = output[0]["generated_text"].strip().lower()
    y_pred.append(0 if text == "negative" else 1)

100%|██████████| 1066/1066 [00:41<00:00, 25.79it/s]


In [19]:
evaluate_performance(data["test"]["label"], y_pred)

Classification Report:
                 precision    recall  f1-score   support

Negative Review       0.83      0.85      0.84       533
Positive Review       0.85      0.83      0.84       533

       accuracy                           0.84      1066
      macro avg       0.84      0.84      0.84      1066
   weighted avg       0.84      0.84      0.84      1066


Confusion Matrix:
[[453  80]
 [ 90 443]]
