### Support Vector Classifier (SVC) algorithm. 

The goal of this project is to predict the sentiment expressed in the reviews and gain valuable insights into customer perceptions.

Through vectorization techniques, I convert the textual data into numerical representations suitable for the SVC algorithm, allowing for effective sentiment analysis.

In [19]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**For Ignoring Warnings**

In [20]:
import warnings
warnings.filterwarnings("ignore")

**For Sentiment Analysis**

In [21]:
import pandas as pd
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer

**For Building Model**

In [22]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report

In [23]:
mcd = pd.read_csv("./McDonald_s_Reviews.csv", encoding="latin-1")

* <h2>Data Exploration:</h2>
To understand the dataset, I examined its columns, inspected the first few rows using the head() function, and reviewed a sample of records. This exploration provided insights into the dataset's structure and contents.

In [24]:
mcd.columns

Index(['reviewer_id', 'store_name', 'category', 'store_address', 'latitude ',
       'longitude', 'rating_count', 'review_time', 'review', 'rating'],
      dtype='object')

<h3>short Description of each column</h3>

* reviewer_id: Unique identifier for each reviewer (anonymized)
* store_name: Name of the McDonald's store
* category: Category or type of the store
* store_address: Address of the store
* latitude: Latitude coordinate of the store's location
* longitude: Longitude coordinate of the store's location
* rating_count: Number of ratings/reviews for the store
* review_time: Timestamp of the review
* review: Textual content of the review
* rating: Rating provided by the reviewer

In [25]:
mcd.head(10)

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
0,1,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Why does it look like someone spit on my food?...,1 star
1,2,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,It'd McDonalds. It is what it is as far as the...,4 stars
2,3,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,5 days ago,Made a mobile order got to the speaker and che...,1 star
3,4,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a month ago,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,5 stars
4,5,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,2 months ago,"I repeat my order 3 times in the drive thru, a...",1 star
5,6,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 weeks ago,I work for door dash and they locked us all ou...,1 star
6,7,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,If I could give this location a zero on custo...,1 star
7,8,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,a year ago,Came in and ordered a Large coffee w/no ice. T...,1 star
8,9,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,Went thru drive thru. Ordered. Getting home no...,1 star
9,10,McDonald's,Fast food restaurant,"13749 US-183 Hwy, Austin, TX 78750, United States",30.460718,-97.792874,1240,3 months ago,"I'm not really a huge fan of fast food, but I ...",4 stars


In [26]:
mcd.sample(10)

Unnamed: 0,reviewer_id,store_name,category,store_address,latitude,longitude,rating_count,review_time,review,rating
8649,8650,McDonald's,Fast food restaurant,"210 5th S, Salt Lake City, UT 84106, United St...",40.759057,-111.897383,3243,a year ago,I love nothing more than McDonalds fries. Itï¿...,1 star
6154,6155,McDonald's,Fast food restaurant,"490 8th Ave, New York, NY 10001, United States",40.752529,-73.992876,3902,a year ago,This is one of the busiest stores in the city....,2 stars
22922,22923,McDonald's,Fast food restaurant,"5725 W Irlo Bronson Memorial Hwy, Kissimmee, F...",28.333508,-81.513738,5566,10 months ago,Food was really fresh and tasty. Staff was ve...,5 stars
18439,18440,McDonald's,Fast food restaurant,"1100 N US Hwy 377, Roanoke, TX 76262, United S...",33.009318,-97.222925,998,4 years ago,"My food was fresh, right on time.",5 stars
28200,28201,McDonald's,Fast food restaurant,"5725 W Irlo Bronson Memorial Hwy, Kissimmee, F...",28.333508,-81.513738,5567,a year ago,DOOR IS LOCKED EVERY SINGLE DAY!\nHIGLY RUDE A...,1 star
30931,30932,McDonald's,Fast food restaurant,"9814 International Dr, Orlando, FL 32819, Unit...",28.423814,-81.461242,5468,3 years ago,"Great service, went the extra mile without pro...",5 stars
6828,6829,McDonald's,Fast food restaurant,"490 8th Ave, New York, NY 10001, United States",40.752529,-73.992876,3902,2 years ago,Finely a place that uses Hot milk for my coffee,5 stars
14409,14410,McDonald's,Fast food restaurant,"25200 I-10 Lot 2, San Antonio, TX 78257, Unite...",29.676267,-98.63458,1460,3 years ago,At lunch time fast and great service,5 stars
12499,12500,McDonald's,Fast food restaurant,"1044 US-11, Champlain, NY 12919, United States",44.98141,-73.45982,1306,3 years ago,Tookos with impact na ideology,5 stars
23337,23338,McDonald's,Fast food restaurant,"5725 W Irlo Bronson Memorial Hwy, Kissimmee, F...",28.333508,-81.513738,5566,5 years ago,"Having kiosks, instead of human cashier, reall...",2 stars


In [27]:
mcd.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33396 entries, 0 to 33395
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   reviewer_id    33396 non-null  int64  
 1   store_name     33396 non-null  object 
 2   category       33396 non-null  object 
 3   store_address  33396 non-null  object 
 4   latitude       32736 non-null  float64
 5   longitude      32736 non-null  float64
 6   rating_count   33396 non-null  object 
 7   review_time    33396 non-null  object 
 8   review         33396 non-null  object 
 9   rating         33396 non-null  object 
dtypes: float64(2), int64(1), object(7)
memory usage: 2.5+ MB


* <h2>Sentiment Score Calculation:</h2>
I utilized the SentimentIntensityAnalyzer, a popular sentiment analysis tool, to calculate sentiment scores for each review. This tool assigns a sentiment score based on the text's positivity, negativity, and neutrality.

In [28]:
sia = SentimentIntensityAnalyzer()

In [30]:
# Performing sentiment analysis on each review
sentiments = []
for review in mcd['review']:
    sentiment = sia.polarity_scores(review)
    sentiments.append(sentiment)

* <h2>Sentiment Classification:</h2>
Based on the compound score, I classified the reviews into different sentiment categories. For instance, if the compound score was above a certain threshold (e.g., 0.5), I labeled the review as positive. Conversely, if the compound score was below another threshold (e.g., -0.5), I labeled it as negative. Reviews with compound scores within the intermediate range were considered neutral.


In [31]:
sentiment_labels = []
for sentiment in sentiments:
    compound_score = sentiment['compound']
    if compound_score >= 0.05:
        sentiment_labels.append('Positive')
    elif compound_score <= -0.05:
        sentiment_labels.append('Negative')
    else:
        sentiment_labels.append('Neutral')

In [32]:
# Add the sentiment labels to the DataFrame
mcd['sentiment'] = sentiment_labels

In [33]:
mcd[['review', 'sentiment']]

Unnamed: 0,review,sentiment
0,Why does it look like someone spit on my food?...,Positive
1,It'd McDonalds. It is what it is as far as the...,Positive
2,Made a mobile order got to the speaker and che...,Negative
3,My mc. Crispy chicken sandwich was ï¿½ï¿½ï¿½ï¿...,Neutral
4,"I repeat my order 3 times in the drive thru, a...",Negative
...,...,...
33391,They treated me very badly.,Negative
33392,The service is very good,Positive
33393,To remove hunger is enough,Negative
33394,"It's good, but lately it has become very expen...",Positive


In [34]:
X = mcd['review']
y = mcd['sentiment']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

2. <h2>Vectorization:</h2>
I applied vectorization techniques to convert the textual data into a numerical representation suitable for machine learning algorithms. This process involved transforming the reviews into a format that captures their features and patterns effectively.

In [35]:
vectorizer = TfidfVectorizer()
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

3. <h2>Model Training (Support Vector Classifier):</h2>
I utilized the Support Vector Classifier (SVC) algorithm to train my sentiment analysis model. SVC is a powerful machine learning algorithm commonly used for classification tasks. By training the model on the labeled training data, it learned to predict the sentiment of reviews based on their features.

In [36]:
model = SVC()
model.fit(X_train_tfidf, y_train)

In [37]:
y_pred = model.predict(X_test_tfidf)

In [38]:
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:")
print(classification_report(y_test, y_pred))

Accuracy: 0.8868263473053892
Classification Report:
              precision    recall  f1-score   support

    Negative       0.81      0.88      0.84      1937
     Neutral       0.88      0.82      0.85      1244
    Positive       0.93      0.91      0.92      3499

    accuracy                           0.89      6680
   macro avg       0.88      0.87      0.87      6680
weighted avg       0.89      0.89      0.89      6680



4. <h2>Sentiment Prediction Function:</h2>
To enhance usability, I created a function that takes a review as input and predicts its sentiment. This function utilizes the trained SVC model to analyze the input review's features and classify it as positive, negative, or neutral. The function provides the sentiment prediction as the output.


In [39]:
def predict_sentiment(review):
    review_tfidf = vectorizer.transform([review])
    sentiment = model.predict(review_tfidf)
    return sentiment[0]

5. <h2>Sample Testing:</h2>
To assess the model's performance, I conducted sample testing.

In [40]:
new_review = "This restaurant has excellent service and delicious food."
predicted_sentiment = predict_sentiment(new_review)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Positive


In [41]:
new_review2 = "This restaurant sucks."
predicted_sentiment = predict_sentiment(new_review2)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Negative


In [42]:
new_review3 = "This is fine"
predicted_sentiment = predict_sentiment(new_review3)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Positive


In [43]:
new_review4 = "This is dull"
predicted_sentiment = predict_sentiment(new_review4)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Neutral


In [44]:
new_review5 = "its bad"
predicted_sentiment = predict_sentiment(new_review5)
print("Predicted sentiment:", predicted_sentiment)

Predicted sentiment: Negative


In [None]:
import pickle

with open('mcdmodel.pkl', 'wb') as file:
    pickle.dump(model, file)


In [None]:
with open('vectorizer.pkl', 'wb') as file:
    pickle.dump(vectorizer, file)