# Sentiment analysis 

## 1. Textblob-FR

Documentation: https://textblob.readthedocs.io/en/dev/

### Imports

In [4]:
import sys
from textblob import Blobber
from textblob_fr import PatternTagger, PatternAnalyzer

### Création d'une fonction `get_sentiment`

In [5]:
tb = Blobber(pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())

def get_sentiment(input_text):
    blob = tb(input_text)
    polarity, subjectivity = blob.sentiment
    polarity_perc = f"{100*abs(polarity):.0f}"
    subjectivity_perc = f"{100*subjectivity:.0f}"
    if polarity > 0:
        polarity_str = f"{polarity_perc}% positive"
    elif polarity < 0:
        polarity_str = f"{polarity_perc}% negative"
    else:
        polarity_str = "neutral"
    if subjectivity > 0:
        subjectivity_str = f"{subjectivity}% subjective"
    else:
        subjectivity_str = "perfectly objective"
    print(f"This text is {polarity_str} and {subjectivity_str}.")

### Analyser le sentiment d'une phrase

In [6]:
get_sentiment("Ce journal est vraiment super intéressant.")

This text is 65% positive and 0.75% subjective.


In [7]:
get_sentiment("Cette phrase est négative et je ne suis pas content !")

This text is 41% negative and 0.6% subjective.


## 2. Utilisation de transformers

Documentation: https://github.com/TheophileBlard/french-sentiment-analysis-with-bert

**!!** Si le code ne tourne pas sur votre machine, vous pouvez le tester directement sur Google Colab en utilisant [ce lien](https://colab.research.google.com/github/TheophileBlard/french-sentiment-analysis-with-bert/blob/master/colab/french_sentiment_analysis_with_bert.ipynb) **!!**

Le modèle peut également être testé en ligne sur [HuggingFace](https://huggingface.co/tblard/tf-allocine)

### Installation des librairies et imports

In [8]:
!pip install tensorflow
!pip install sentencepiece
!pip install transformers

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline

Collecting tensorflow
  Downloading tensorflow-2.14.0-cp311-cp311-macosx_10_15_x86_64.whl.metadata (3.9 kB)
Collecting absl-py>=1.0.0 (from tensorflow)
  Downloading absl_py-2.0.0-py3-none-any.whl.metadata (2.3 kB)
Collecting astunparse>=1.6.0 (from tensorflow)
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting flatbuffers>=23.5.26 (from tensorflow)
  Downloading flatbuffers-23.5.26-py2.py3-none-any.whl.metadata (850 bytes)
Collecting gast!=0.5.0,!=0.5.1,!=0.5.2,>=0.2.1 (from tensorflow)
  Downloading gast-0.5.4-py3-none-any.whl (19 kB)
Collecting google-pasta>=0.1.1 (from tensorflow)
  Downloading google_pasta-0.2.0-py3-none-any.whl (57 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m57.5/57.5 kB[0m [31m848.8 kB/s[0m eta [36m0:00:00[0m [36m0:00:01[0m
[?25hCollecting h5py>=2.9.0 (from tensorflow)
  Downloading h5py-3.10.0-cp311-cp311-macosx_10_9_x86_64.whl.metadata (2.5 kB)
Collecting libclang>=13.0.0 (from tensorflow)
  Downloading libcla

  from .autonotebook import tqdm as notebook_tqdm
2023-10-24 20:30:05.663576: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


### Chargement du modèle

In [9]:
tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine", use_pt=True)
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")

sentiment_analyser = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

Downloading (…)okenizer_config.json: 100%|██████████| 2.00/2.00 [00:00<00:00, 2.38kB/s]
Downloading (…)lve/main/config.json: 100%|██████████| 666/666 [00:00<00:00, 1.18MB/s]
Downloading (…)tencepiece.bpe.model: 100%|██████████| 811k/811k [00:00<00:00, 1.82MB/s]
Downloading (…)cial_tokens_map.json: 100%|██████████| 210/210 [00:00<00:00, 534kB/s]
Downloading tf_model.h5: 100%|██████████| 445M/445M [00:47<00:00, 9.43MB/s] 
All model checkpoint layers were used when initializing TFCamembertForSequenceClassification.

All the layers of TFCamembertForSequenceClassification were initialized from the model checkpoint at tblard/tf-allocine.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFCamembertForSequenceClassification for predictions without further training.


### Analyser le sentiment d'une phrase

In [10]:
sentiment_analyser("Ce journal est vraiment super intéressant.")

[{'label': 'POSITIVE', 'score': 0.9936434030532837}]

In [11]:
sentiment_analyser("Cette phrase est négative et je ne suis pas content !")

[{'label': 'NEGATIVE', 'score': 0.9664189219474792}]