# Introduction

We have previously discussed sentiment classification, which is the task of automatically determining the sentiment expressed in a piece of text. Sentiment classification plays an important role in natural language processing and has numerous practical applications in fields such as social media analysis, customer feedback analysis, and market sentiment prediction. Previously, you have implemented a sentiment classifier using Naive Bayes, a Feed Forward Neural Network, and an RNN-based model. However, in this assignment, you will be evaluating a Transformers-based sentiment classifier provided by Hugging Face, utilizing their pipeline feature. This assessment will be conducted on the IMDB movie reviews dataset.

The IMDB dataset, a popular benchmark dataset in sentiment analysis, consists of a collection of movie reviews labeled with their corresponding sentiment (positive or negative). Each review is preprocessed and represented as a sequence of words, with the task being to predict the sentiment polarity based on this textual input.

Transformer models, as accessed through Hugging Face's `pipeline('text-classification')`, are exceptionally well-suited for text classification tasks, including sentiment analysis. Unlike Long Short-Term Memory networks (LSTMs), Transformers leverage self-attention mechanisms. This architecture allows them to process entire sequences of text simultaneously, which is highly advantageous for understanding context and dependencies in textual data.

By the end of this assignment, you will have gained valuable insights into the use of Hugging Face for sentiment classification tasks.

In [3]:
from transformers import pipeline
from datasets import load_dataset
from tqdm import tqdm
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix
from sklearn.model_selection import train_test_split


# Load the dataset 

In [9]:
imdb = load_dataset("imdb")
# We will use only the first 1024 instances of the testset as using more data will reduce the inference speed drastically
x_test = [instance['text'] for instance in imdb["test"]]
y_test = [instance['label'] for instance in imdb["test"]]

_, x_test, _, y_test = train_test_split(
    x_test,  # Features
    y_test,  # Labels
    test_size=0.05,  # 5% for testing
    random_state=42,  # For reproducibility
    stratify=y_test       # Stratify split based on the labels to maintain balance
)


Downloading readme: 100%|██████████| 7.81k/7.81k [00:00<00:00, 11.5MB/s]
Downloading data: 100%|██████████| 21.0M/21.0M [00:02<00:00, 8.92MB/s]
Downloading data: 100%|██████████| 20.5M/20.5M [00:02<00:00, 9.48MB/s]
Downloading data: 100%|██████████| 42.0M/42.0M [00:04<00:00, 10.4MB/s]
Generating train split: 100%|██████████| 25000/25000 [00:00<00:00, 174472.46 examples/s]
Generating test split: 100%|██████████| 25000/25000 [00:00<00:00, 209159.25 examples/s]
Generating unsupervised split: 100%|██████████| 50000/50000 [00:00<00:00, 217959.07 examples/s]


# Loading the model

## <span style="color:red"><b>Task 1</b></span>
Load the `pipeline('text-classification')` to ceate a classifier, and then use it to predict sentiments on the testset:

In [5]:
### START CODE HERE ###
classifier = pipeline('text-classification')
### END CODE HERE ###


No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
2024-06-12 17:41:39.267981: I metal_plugin/src/device/metal_device.cc:1154] Metal device set to: Apple M2
2024-06-12 17:41:39.268015: I metal_plugin/src/device/metal_device.cc:296] systemMemory: 16.00 GB
2024-06-12 17:41:39.268028: I metal_plugin/src/device/metal_device.cc:313] maxCacheSize: 5.33 GB
2024-06-12 17:41:39.268050: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:305] Could not identify NUMA node of platform GPU ID 0, defaulting to 0. Your kernel may not have been built with NUMA support.
2024-06-12 17:41:39.268063: I tensorflow/core/common_runtime/pluggable_device/pluggable_device_factory.cc:271] Created TensorFlow device (/job:localhost/replica:0/task:

# Evaluation

## <span style="color:red"><b>Task 2</b></span>

Calculate precision, recall, f1-score, and accuracy.
How is the performance obrained compared to the previous sentiment classifiers you may have build in this unit?


In [10]:
### START CODE HERE ###
predictions = []
for text in tqdm(x_test):
    result = classifier(text)[0]
    predictions.append(1 if result['label'] == 'POSITIVE' else 0)

# Evaluate the model
accuracy = accuracy_score(y_test, predictions)
precision = precision_score(y_test, predictions)
recall = recall_score(y_test, predictions)
f1 = f1_score(y_test, predictions)
conf_matrix = confusion_matrix(y_test, predictions)

# Display the results
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")
print("Confusion Matrix:")
print(conf_matrix)



  0%|          | 3/1250 [00:00<04:02,  5.14it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (963 > 512). Running this sequence through the model will result in indexing errors
100%|██████████| 1250/1250 [05:00<00:00,  4.15it/s]

Accuracy: 0.8880
Precision: 0.9217
Recall: 0.8480
F1 Score: 0.8833
Confusion Matrix:
[[580  45]
 [ 95 530]]





### Hugging Face pipeline('text-classification'):
- **Accuracy**: 0.8880
- **Precision**: 0.9217
- **Recall**: 0.8480
- **F1-score**: 0.8833

### RNN:
- **Accuracy**: 0.8658
- **Precision**: 0.8423
- **Recall**: 0.9000
- **F1-score**: 0.8702

### Comparison:

1. **Accuracy**:
   - Hugging Face: 0.8880
   - RNN: 0.8658
   - The Hugging Face pipeline achieves higher accuracy compared to the RNN model.

2. **Precision**:
   - Hugging Face: 0.9217
   - RNN: 0.8423
   - The Hugging Face pipeline has significantly higher precision, indicating fewer false positives compared to the RNN model.

3. **Recall**:
   - Hugging Face: 0.8480
   - RNN: 0.9000
   - The RNN model achieves higher recall, indicating it correctly identifies more positive instances compared to the Hugging Face pipeline.

4. **F1-score**:
   - Hugging Face: 0.8833
   - RNN: 0.8702
   - The Hugging Face pipeline slightly outperforms the RNN model in terms of the F1-score, showing a better balance between precision and recall.

### Conclusion:

The Hugging Face pipeline('text-classification') outperforms the RNN model in terms of accuracy, precision, and F1-score. However, the RNN model has a higher recall. The higher precision and F1-score of the Hugging Face model suggest that it makes fewer false positive errors and has a better balance between precision and recall. The Hugging Face pipeline leverages powerful pre-trained Transformer models, which can capture contextual dependencies more effectively than the RNN, leading to better overall performance in sentiment classification tasks on the IMDB dataset.

# Congratulations!


Congratulations on completing the assignment! Your dedication and effort are commendable. By successfully working through this coding exercise, you should have gained valuable insights into the application of RNN-based neural networks for sentiment classification tasks and acquired practical skills in designing, training, and evaluating deep learning models on real-world datasets.






# Acknowledgement

## About the Author

This notebook was authored by Mohamed Reda Bouadjenek. He is a Senior Lecturer (Assistant Professor) of Applied Artificial Intelligence in the School of Information Technology at Deakin University, Australia.



## Disclaimer 

Even though your code passes all unit test cases, it does not guarantee absolute correctness. The complexity of real-world scenarios can sometimes lead to unforeseen edge cases that may not have been covered by the test suite. As a result, it's essential to exercise caution and conduct thorough testing to ensure the robustness and reliability of the code in all possible cases.

## Version History
- Version 1.0 (Initial Release): Released on 06/05/2024.

## Contact Information

- **Email:** reda.bouadjenek@deakin.edu.au
- **GitHub:** https://github.com/rbouadjenek/

---
