# III - Comparison with a State-of-the-Art LLM

In this notebook, we aim to compare the performance obtained in `2_prediction_benchmark.ipynb` with that of a recent large language model (LLM) for sentiment classification on our IMDB dataset.

We will use pretrained deep learning models available on the Hugging Face platform. Among the available models, we choose the **DistilBERT base uncased fine-tuned on SST-2**. This model is a version of the base **DistilBERT** that has been fine-tuned on the **Stanford Sentiment Treebank v2 (SST-2)** dataset.

The **SST-2** corpus consists of 11,855 individual sentences extracted from movie reviews. These sentences were parsed using the Stanford Parser and further expanded into 215,154 unique phrases, each annotated for sentiment (positive or negative) by three human judges. The dataset originates from the work of Pang and Lee (2005) and allows for fine-grained sentiment analysis based on syntactic structure.

This model is particularly well-suited for **binary sentiment classification** in English, which is exactly the task we aim to perform here.

> **Note:** The original **DistilBERT** model is a *distilled* version of BERT, introduced in the paper:
>
> **Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2020)**. *DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter*. arXiv preprint [arXiv:1910.01108](https://arxiv.org/abs/1910.01108)
>
> The purpose of model distillation is to compress large models like BERT into smaller ones while preserving most of their performance. This makes the model faster and more lightweight, significantly reducing training and inference time.

In summary, the DistilBERT SST-2 model is a lightweight yet powerful model specifically optimized for sentiment classification on movie reviews, making it an ideal benchmark for our task.

----

In [2]:
!pip install huggingface_hub[hf_xet]

Collecting hf-xet>=0.1.4 (from huggingface_hub[hf_xet])
  Using cached hf_xet-1.0.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (494 bytes)
Using cached hf_xet-1.0.3-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (53.8 MB)
Installing collected packages: hf-xet
Successfully installed hf-xet-1.0.3


In [11]:
import torch
import transformers
import pandas as pd
import numpy as np
from transformers import pipeline
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification


In [3]:
# chargement des données d'entrainement et de test
df_train = pd.read_parquet("data/df_train.parquet")
df_test = pd.read_parquet("data/df_test.parquet")

In [4]:
df_test.head()

Unnamed: 0,texte,label
0,I remember seeing this movie 34 years ago and ...,1
1,Rocketship X-M should be viewed by any serious...,0
2,"""When I die, someone will bury me. And if they...",1
3,"S l o w, l o n g, d u l l. . .<br /><br />Oh m...",0
4,Leonard Rossiter and Frances de la Tour carry ...,0


In [5]:
df_test_sample = df_test.iloc[:1].copy()
df_test_sample.head()

Unnamed: 0,texte,label
0,I remember seeing this movie 34 years ago and ...,1


In [12]:
# Chargement du modèle depuis TensorFlow weights
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english", 
    from_tf=True  # Utiliser cette option pour charger depuis les poids TensorFlow
)


Exception ignored in: <function tqdm.__del__ at 0x7ff2f4a2db40>
Traceback (most recent call last):
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/std.py", line 1147, in __del__
    self.close()
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/notebook.py", line 286, in close
    self.disp(bar_style='danger', check_delay=False)
AttributeError: 'tqdm' object has no attribute 'disp'
Exception ignored in: <function tqdm.__del__ at 0x7ff2f4a2db40>
Traceback (most recent call last):
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/std.py", line 1147, in __del__
    self.close()
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/notebook.py", line 286, in close
    self.disp(bar_style='danger', check_delay=False)
AttributeError: 'tqdm' object has no attribute 'disp'
Exception ignored in: <function tqdm.__del__ at 0x7ff2f4a2db40>
Traceback (most recent call last):

OSError: Can't load tokenizer for 'distilbert-base-uncased-finetuned-sst-2-english'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'distilbert-base-uncased-finetuned-sst-2-english' is the correct path to a directory containing all relevant files for a DistilBertTokenizerFast tokenizer.

In [None]:


# Exemple de phrase
phrase = "I really enjoyed this movie, it was fantastic!"

# Prédire le sentiment
resultat = sentiment_classifier(phrase)

# Afficher le résultat
print(resultat)

Exception ignored in: <function tqdm.__del__ at 0x7ff2f4a2db40>
Traceback (most recent call last):
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/std.py", line 1147, in __del__
    self.close()
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/notebook.py", line 286, in close
    self.disp(bar_style='danger', check_delay=False)
AttributeError: 'tqdm' object has no attribute 'disp'
Exception ignored in: <function tqdm.__del__ at 0x7ff2f4a2db40>
Traceback (most recent call last):
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/std.py", line 1147, in __del__
    self.close()
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/tqdm/notebook.py", line 286, in close
    self.disp(bar_style='danger', check_delay=False)
AttributeError: 'tqdm' object has no attribute 'disp'
Exception ignored in: <function tqdm.__del__ at 0x7ff2f4a2db40>
Traceback (most recent call last):

ValueError: Could not load model distilbert-base-uncased-finetuned-sst-2-english with any of the following classes: (<class 'transformers.models.auto.modeling_auto.AutoModelForSequenceClassification'>,). See the original errors:

while loading with AutoModelForSequenceClassification, an error is thrown:
Traceback (most recent call last):
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/transformers/pipelines/base.py", line 291, in infer_framework_load_model
    model = model_class.from_pretrained(model, **kwargs)
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 571, in from_pretrained
    return model_class.from_pretrained(
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 279, in _wrapper
    return func(*args, **kwargs)
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4260, in from_pretrained
    checkpoint_files, sharded_metadata = _get_resolved_checkpoint_files(
  File "/home/onyxia/work/ensae_3A_S2_NLP/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1080, in _get_resolved_checkpoint_files
    raise EnvironmentError(
OSError: distilbert-base-uncased-finetuned-sst-2-english does not appear to have a file named pytorch_model.bin but there is a file for TensorFlow weights. Use `from_tf=True` to load this model from those weights.




In [None]:
# 2. Échantillon aléatoire de 1000 lignes
df_sample = df_test.sample(n=1, random_state=42).copy()

# 3. Appliquer l'analyse de sentiment
df_sample["predicted"] = [
    sentiment_classifier(x, truncation=True, max_length=128)[0]['label'].lower() for x in df_sample["texte"]
]

# 4. Convertir les labels 1/0 en "positive"/"negative"
df_sample["true"] = df_sample["label"].apply(lambda x: "positive" if x == 1 else "negative")

# 5. Afficher les métriques
print("Accuracy Score:", accuracy_score(df_sample["true"], df_sample["predicted"]))
print("\nClassification Report:\n", classification_report(df_sample["true"], df_sample["predicted"]))
print("\nConfusion Matrix:\n", confusion_matrix(df_sample["true"], df_sample["predicted"]))

KeyboardInterrupt: 