# Sentiment analysis 

## 1. Textblob-FR

Documentation: https://textblob.readthedocs.io/en/dev/

### Imports

In [23]:
import sys
from textblob import Blobber
from textblob_fr import PatternTagger, PatternAnalyzer
# Importer les bibliothèques nécessaires
import os

# Chemin vers les fichiers de l'année 1969
year = 1969
data_path = '../data/txt'
txts = [f for f in os.listdir(data_path) if os.path.isfile(os.path.join(data_path, f)) and str(year) in f]

# Sélectionner arbitrairement 10 phrases dans les articles de l'année 1969
selected_phrases = []
for txt in txts[:10]:
    with open(os.path.join(data_path, txt), 'r', encoding='utf-8') as f:
        content = f.read()
        sentences = content.split('.')
        for sentence in sentences[:5]:
            selected_phrases.append(sentence.strip())

# Analyser le sentiment des phrases sélectionnées
for phrase in selected_phrases:
    print(f"Analyse de la phrase : {phrase}")
    get_sentiment(phrase)
    sentiment_analyser(phrase)




Analyse de la phrase : Ets VANDEN BOUDE s
This text is neutral and perfectly objective.
Analyse de la phrase : a
This text is neutral and perfectly objective.
Analyse de la phrase : *700, chaussée de Mons Anderlectit * Bruxelles 7 demandent : ELECTRICIENS QUALIFIES (installation basse tension) PLOMBIERS
This text is 20% negative and perfectly objective.
Analyse de la phrase : sachant conduire si possible MAGASINIERS (pour dépôt rue de la Roue) Place stable et d’avenir dans une firme en » continuelle expansion
This text is 0% negative and 0.425% subjective.
Analyse de la phrase : Bonne rémunération
This text is 70% positive and 0.7% subjective.
Analyse de la phrase : 13 SAMEDI 22 NOVEMBRE 1969 LE SOIE itt Devenir eccountant chez ITT, c'est être, en mesure, professionnellement, de prendre ses responsabilités
This text is neutral and perfectly objective.
Analyse de la phrase : C'est aussi savoir travailler en équipe
This text is neutral and perfectly objective.
Analyse de la phrase : Comm

### Création d'une fonction `get_sentiment`

In [25]:
tb = Blobber(pos_tagger=PatternTagger(), analyzer=PatternAnalyzer())

def get_sentiment(input_text):
    blob = tb(input_text)
    polarity, subjectivity = blob.sentiment
    polarity_perc = f"{100*abs(polarity):.0f}"
    subjectivity_perc = f"{100*subjectivity:.0f}"
    if polarity > 0:
        polarity_str = f"{polarity_perc}% positive"
    elif polarity < 0:
        polarity_str = f"{polarity_perc}% negative"
    else:
        polarity_str = "neutral"
    if subjectivity > 0:
        subjectivity_str = f"{subjectivity}% subjective"
    else:
        subjectivity_str = "perfectly objective"
    print(f"This text is {polarity_str} and {subjectivity_str}.")


### Analyser le sentiment d'une phrase

In [26]:
get_sentiment("Ce journal est vraiment super intéressant.")

This text is 65% positive and 0.75% subjective.


In [18]:
get_sentiment("Cette phrase est négative et je ne suis pas content !")

This text is 41% negative and 0.6% subjective.


## 2. Utilisation de transformers

Documentation: https://github.com/TheophileBlard/french-sentiment-analysis-with-bert

**!!** Si le code ne tourne pas sur votre machine, vous pouvez le tester directement sur Google Colab en utilisant [ce lien](https://colab.research.google.com/github/TheophileBlard/french-sentiment-analysis-with-bert/blob/master/colab/french_sentiment_analysis_with_bert.ipynb) **!!**

Le modèle peut également être testé en ligne sur [HuggingFace](https://huggingface.co/tblard/tf-allocine)

### Installation des librairies et imports

In [19]:
!pip install tensorflow
!pip install sentencepiece
!pip install transformers

from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


[33mDEPRECATION: textract 1.6.5 has a non-standard dependency specifier extract-msg<=0.29.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of textract or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063[0m[33m
[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32;49m23.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Chargement du modèle

In [20]:
tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine", use_pt=True)
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")

sentiment_analyser = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)


All model checkpoint layers were used when initializing TFCamembertForSequenceClassification.

All the layers of TFCamembertForSequenceClassification were initialized from the model checkpoint at tblard/tf-allocine.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFCamembertForSequenceClassification for predictions without further training.


### Analyser le sentiment d'une phrase

In [21]:
sentiment_analyser("Ce journal est vraiment super intéressant.")

[{'label': 'POSITIVE', 'score': 0.9936434030532837}]

In [22]:
sentiment_analyser("Cette phrase est négative et je ne suis pas content !")

[{'label': 'NEGATIVE', 'score': 0.9664189219474792}]