# Sentiment analysis for reviews

In our project, we have made significant progress by collecting reviews from various locations using a Google Maps scraper. After performing data cleaning, we decided to translate the reviews from French to English and employ BERT for sentiment analysis.

We chose BERT because it is a state-of-the-art language model that excels in natural language processing tasks, including sentiment analysis. By leveraging BERT's powerful language understanding capabilities, we can obtain accurate sentiment predictions.

Translating the reviews to English offers advantages such as access to a wider range of NLP resources and a broader audience. English is widely used in NLP, providing a rich ecosystem of tools and pre-trained models. Additionally, analyzing sentiment in English ensures our results can be easily understood and shared globally.

In the upcoming sections, we will implement sentiment analysis using BERT. This will help us gain valuable insights into customer sentiments towards different locations, enabling businesses to make data-driven decisions and enhance user experiences.

For further reference, you can explore our repository on sentiment analysis of Ryanair airline reviews using VADER: [**Sentiment Analysis of Ryanair Airline Reviews**](https://github.com/yasirech-chammakhy/Sentiment-Analysis-of-Ryanair-Airline-Reviews). It showcases VADER's implementation and provides insights into sentiment expressed in Ryanair reviews.


In [1]:
import pandas as pd

In [2]:
df = pd.read_csv('../data/all_cities_cleaned_english.csv')

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17206 entries, 0 to 17205
Data columns (total 16 columns):
 #   Column                   Non-Null Count  Dtype  
---  ------                   --------------  -----  
 0   bank                     17206 non-null  object 
 1   categoryName             17205 non-null  object 
 2   city                     17206 non-null  object 
 3   totalScore               17206 non-null  float64
 4   rank                     17206 non-null  int64  
 5   cid                      17206 non-null  float64
 6   publishedAtDate          17206 non-null  object 
 7   reviewsCount             17206 non-null  int64  
 8   reviewsDistribution      17206 non-null  object 
 9   textTranslated           9257 non-null   object 
 10  reviewId                 17206 non-null  object 
 11  reviewerId               17206 non-null  float64
 12  reviewerNumberOfReviews  17206 non-null  float64
 13  stars                    17206 non-null  float64
 14  lat                   

##  Clean the text

In [4]:
from utils import clean_review

Using region Rabat-Sale-Kenitra server backend.



In [5]:
# Cleaning the text in the textTranslated column
df['cleaned_text'] = df['textTranslated'].apply(clean_review)

In [6]:
df.head()

Unnamed: 0,bank,categoryName,city,totalScore,rank,cid,publishedAtDate,reviewsCount,reviewsDistribution,textTranslated,reviewId,reviewerId,reviewerNumberOfReviews,stars,lat,lng,cleaned_text
0,Attijariwafa Bank,Banque,Agadir,1.8,119,1.589799e+19,2023-02-24,5,"{'oneStar': 4, 'twoStar': 0, 'threeStar': 0, '...","Outstanding service, especially if you ask for...",ChdDSUhNMG9nS0VJQ0FnSUNodDhhd3pRRRAB,1.052275e+20,1.0,1.0,30.41016,-9.559908,outstanding service especially ask assistance ...
1,Attijariwafa Bank,Banque,Agadir,1.8,119,1.589799e+19,2023-02-21,5,"{'oneStar': 4, 'twoStar': 0, 'threeStar': 0, '...","Deplorable service, every time I come to this ...",ChZDSUhNMG9nS0VJQ0FnSUNoazVqWmVREAE,1.023462e+20,1.0,1.0,30.41016,-9.559908,deplorable service every time come agency prep...
2,Attijariwafa Bank,Banque,Agadir,1.8,119,1.589799e+19,2023-01-06,5,"{'oneStar': 4, 'twoStar': 0, 'threeStar': 0, '...",Very good service by the director who helped m...,ChZDSUhNMG9nS0VJQ0FnSUNCcDdPeUdnEAE,1.110533e+20,2.0,5.0,30.41016,-9.559908,good service director helped hour fairly diffi...
3,Attijariwafa Bank,Banque,Agadir,1.8,119,1.589799e+19,2020-11-13,5,"{'oneStar': 4, 'twoStar': 0, 'threeStar': 0, '...",Zero is bad telephone service,ChZDSUhNMG9nS0VJQ0FnSUNpdzdhWk1REAE,1.031901e+20,1.0,1.0,30.41016,-9.559908,zero bad telephone service
4,Attijariwafa Bank,Banque,Agadir,1.8,119,1.589799e+19,2020-10-05,5,"{'oneStar': 4, 'twoStar': 0, 'threeStar': 0, '...",Zero service,ChdDSUhNMG9nS0VJQ0FnSURDbGMtcDZnRRAB,1.0161e+20,45.0,1.0,30.41016,-9.559908,zero service


In this step, we performed extensive cleaning on the text data in the reviews column by removing special characters and numericals, converting all characters to lowercase, tokenizing each review, removing stopwords, and lemmatizing each word in every review. By doing so, we created a new column called "cleaned reviews" which was a prerequisite for the sentiment analysis.

## Generating Sentiment Scores for Cleaned text