# Best models performance

This is the last part of the project where the best models (for binary and multilabel classification) were evaluated in details. __SupportVectorClassifier__ algorithm turned out to be the best model among others for both cases. 

In [3]:
import numpy as np
import pandas as pd
import pickle

from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

In [4]:
pd.set_option("max_colwidth", 1000)

__Loading dataset__

In [70]:
data = pd.read_csv('Data_ML.csv', index_col='Unnamed: 0')

## 1. Binary model evaluation

In [11]:
# Keeping only positive and negative reviews
data_binary = data[data.rating != 0]

X = data_binary.loc[:, 'review']
y = data_binary.loc[:, 'rating']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=100)

### 1.1. Binary model performance

In [14]:
model_binary = pickle.load(open("models/113SVM.pkl", 'rb'))

threshold = 0.8

# predicting probabilites that a review is positive
y_pred_proba_pos = model_binary.predict_proba(X_test)[:,1]

# introducing calculated threshold
y_pred = np.where(y_pred_proba_pos > threshold, 1, -1)

# confusion matrix
cm = confusion_matrix(y_test, y_pred) 

print(cm)
print(classification_report(y_test, y_pred))

print(
    f'number of reviews that are positive and are predicted as positive: {cm[1][1]}')
print(
    f'number of reviews that are negative and are predicted as negative: {cm[0][0]}')
print(
    f'number of reviews that are negative and are predicted as positive: {cm[0][1]}')
print(
    f'number of reviews that are positive and are predicted as negative: {cm[1][0]}')

[[ 387   18]
 [  40 8417]]
              precision    recall  f1-score   support

          -1       0.91      0.96      0.93       405
           1       1.00      1.00      1.00      8457

    accuracy                           0.99      8862
   macro avg       0.95      0.98      0.96      8862
weighted avg       0.99      0.99      0.99      8862

number of reviews that are positive and are predicted as positive: 8417
number of reviews that are negative and are predicted as negative: 387
number of reviews that are negative and are predicted as positive: 18
number of reviews that are positive and are predicted as negative: 40


### 1.2. Misclassified (preprocessed) reviews for binary model

In [20]:
data_evaluation_binary=pd.DataFrame(
    np.hstack([y_test.values.reshape(-1, 1), y_pred.reshape(-1, 1)]),
    columns=['true', 'predicted'])

data_evaluation_binary['review'] = X_test.values

# indexes of misclassified reviews
idx = data_evaluation_binary['true'] != data_evaluation_binary['predicted']

data_misclassified_binary = data_evaluation_binary[idx]

In [111]:
# false positive
data_misclassified_binary[data_misclassified_binary.true == -1].sample(5).head()

Unnamed: 0,true,predicted,review
7985,-1,1,great everything worked bought canon powershot s1 32 mp digital camera silver new problem camera month warranty ran went use couldnt get lcd show wanted photograph fact lcd screen totally black trying get work gave sent canon fixed cost fixed 2 12 years later problem camera camera took good photos movies little awkward carry maybe bought lemonread full review
7147,-1,1,unsatisfying didnt come charger no instructions use long last long takes charge
8737,-1,1,bad not worknice camera give one star unable open battery cover looks like glued something product self good old one canon a1000is 10 years no problem a1000is older model 2009 a1100is 2013 model not much diference cosmetic higher mp 121 mp a1000is 10mread full review
4819,-1,1,particular unit purchased repair repair end very expensive could not afford fixed still sitting shelfmy mistake there’s good ratings
5340,-1,1,no conection inside


In [112]:
# false negative
data_misclassified_binary[data_misclassified_binary.true == 1].sample(5).head()

Unnamed: 0,true,predicted,review
7296,1,-1,best case hands case literally keeps phone charged all day use work school socially it’s awesome i’ve tried brands don’t waste money get real deal
3522,1,-1,nice little helper tendency forget turn timer helps talk dont stop
1340,1,-1,norelco 7500 not good shaving short hair go spot 5 6 7 times buy norelco 30 shave better blades worn new blades blades cost favor h v e rread full review
1056,1,-1,love needs driver update 3 various computers beware change standard line keyboard though layout give fewer mistakes cause problems standared keyboardthe one thing bad drivers come limited compatiibility many potential features cant usedplug play not let get still love itread full review
2841,1,-1,heat 23 hours usage heats ears really irritates


### 1.3. Predicting neutral reviews by binary model

In [58]:
neutral = data[data.rating == 0].review.values
# probabilty that a review is positive
neutral_pred_prob_pos = model_binary.predict_proba(neutral)[:,1]
neutral_df = pd.DataFrame(
    neutral_pred_prob_pos.reshape(-1, 1),
    columns=['positive probability'])

neutral_df['neutral review'] = neutral

In [114]:
neutral_df.sample(10)

Unnamed: 0,positive probability,neutral review
927,0.890738,not going buy stainless steel kitchen scale actually surface scale 01 mm thick metal sticker 40 mm thick glass far away stainless steel mininum capacity 5 g capacitive buttons great doesnt always respond touch otherwise well built no additional sounds heard turning itread full review
716,0.56245,bad quality sound first beats studio came not wireless sounded much better studio 3 version very unhappy product expected much beats studio1 turned better studio3 😔read full review
1059,0.872771,not good phone works well battery speaker little bad think ineed change
1006,0.999999,cracks easy cracks way easy
231,0.098892,phone doesn’t work described put verizon sim card right reset phone showed signal bars less 10 seconds disappeared even took verizon store didn’t even know wasting time mineread full review
1370,0.971011,good product works expected product good seemed bit powered used without blade place used 2x1 piece wood went like hot knife butter heavier expect without battery not much corded version going prosumer work areas without power good option especially brought rigid 18v product line already wish purchased brushless version long term maintenance issues may come upread full review
578,0.598874,not right model monitor model monitor suppose receive mg28uq received mg28u similar features still would nice receive model listed
1137,0.83663,hairstyck good hair stuck kit 🙁we remove always
1247,0.970591,it’s must takes little get used it’s definitely way go little bit tough get used not impossible
1304,0.138677,syncros not impressed earpiece staying walking even sitting falls all time


### 1.4. Binary model test

Predicting sentiment of a given review

In [115]:
#the review taken from ebay
review = """
Lightweight with great features Its cool its nice its
convenient i needed it for work im always on the go
I love it because i use it all day and it rarely dies
I didnt think i needed it til i got it haven't stop using it since
"""

print(f'review: {review}\n')
review = np.array([review.lower()])
sentiment = model_binary.predict(review)[0]
if sentiment == 1:
    print('the review is positive')
else:
    print('the review is negative')

review: Lightweight with great features Its cool its nice its convenient i needed it for work im always on the go I love it because i use it all day and it rarely dies I didnt think i needed it til i got it haven't stop using it since

the review is positive


## 2. Multiclass model evaluation

In [71]:
model_multiclass = pickle.load(open("models/213SVM.pkl", 'rb'))

X = data.loc[:, 'review']
y = data.loc[:, 'rating']

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=100)

### 2.1. Multiclass model performance

In [84]:
y_pred = model_multiclass.predict(X_test)
cm = confusion_matrix(y_test, y_pred) 

print(cm)
print(classification_report(y_test, y_pred))

print(f'number of reviews that are positive and are predicted as positive: {cm[2][2]}')
print(f'number of reviews that are negative and are predicted as negative: {cm[1][1]}')
print(f'number of reviews that are neutral and are predicted as neutral: {cm[0][0]}')

print(f'number of reviews that are positive and are predicted as negative: {cm[2][0]}')
print(f'number of reviews that are positive and are predicted as neutral: {cm[2][1]}')

print(f'number of reviews that are negative and are predicted as positive: {cm[0][2]}')
print(f'number of reviews that are negative and are predicted as neutral: {cm[0][1]}')

print(f'number of reviews that are neutral and are predicted as positive: {cm[1][2]}')
print(f'number of reviews that are neutral and are predicted as negative: {cm[1][0]}')

[[ 360    2   43]
 [  13  235   41]
 [   8    4 8445]]
              precision    recall  f1-score   support

          -1       0.94      0.89      0.92       405
           0       0.98      0.81      0.89       289
           1       0.99      1.00      0.99      8457

    accuracy                           0.99      9151
   macro avg       0.97      0.90      0.93      9151
weighted avg       0.99      0.99      0.99      9151

number of reviews that are positive and are predicted as positive: 8445
number of reviews that are negative and are predicted as negative: 235
number of reviews that are neutral and are predicted as neutral: 360
number of reviews that are positive and are predicted as negative: 8
number of reviews that are positive and are predicted as neutral: 4
number of reviews that are negative and are predicted as positive: 43
number of reviews that are negative and are predicted as neutral: 2
number of reviews that are neutral and are predicted as positive: 41
number o

In [8]:
def model_summary(model, X_test, y_test):
    """
    Function that prints classification report and confusion matrix for a given model
    """
    
    y_pred = model.predict(X_test)
    print('\nconfusion matrix\n', confusion_matrix(y_test, y_pred))

    print('\nclassification report\n',classification_report(y_test, y_pred))

### 2.2. Misclassified (preprocessed) reviews for multiclass model

In [86]:
data_evaluation_mc=pd.DataFrame(
    np.hstack([y_test.values.reshape(-1, 1),
               y_pred.reshape(-1,1)]),
    columns=['true','predicted'])

data_evaluation_mc['review'] = X_test.values
idx = data_evaluation_mc['true'] != data_evaluation_mc['predicted']
data_misclassified_mc = data_evaluation_mc[idx]

In [117]:
# this cell shows misclassfied labels for given true and predicted sentiment
# 1 - positive
# 0 - neutral
# -1 - negative

true_label = 0
predicted_label = 1

data_misclassified_mc[
    (data_misclassified_mc.true == true_label) 
    & (data_misclassified_mc.predicted == predicted_label)
].head()

Unnamed: 0,true,predicted,review
271,0,1,ok ok
379,0,1,fine locked few times shut reboot fine
506,0,1,good mouse bad software pros excellent feel responsiveness upped potential immediatelycons software called ghub logitechs proprietary software not user friendly tried many days get downloaded onto computer unsuccessful far way access device program software answer mouse good software interface notread full review
540,0,1,isnt bose quality expected all bose products owned pro easy pair use multiple devices time battery lifecon not fit well get quality sound expected even optional earpieces therefore doesnt fit comfortable especially running workouts jabra elite active 65t much better fit sound also betterread full review
562,0,1,never new color look like like anyway uses battery fast


In [108]:
#the review taken from ebay
review ="""
Apple AirPod Pros It is all good but the quality on them is a 
little wacky isnt really working the best in between the songs"""

print(f'review: {review}')
review = np.array([review])
sentiment = model_multiclass.predict(review)
if sentiment == 1:
    print('the review is positive')
elif sentiment == 0:
    print('the review is neutral')   
else:
    print('the review is negative')

review: Apple AirPod Pros It is all good but the quality on them is a little wacky isnt really working the best in between the songs
the review is positive


# Final summary and conclusion

1. Without a doubt SupportVectorClassifier turned out to be the best model in both cases. XGBoost was the second best model and its performance was slightly worser compared to SVC.
2. Looking at neutral reviews it seems reasonable to build a binary model - very often these reviews can be classified as either positive or negative
3. Looking at misclassified reviews one can infere that sometimes the model predicts a sentiment better that a rating given by a customer. E-commerce websites could use AI models for determining reviews sentiments regardless of customers ratings. 