In [11]:
import pandas as pd
import numpy as np
df = pd.read_csv('\\Stars Predition\\dataset\\Amazon_Unlocked_Mobile.csv')
df = df.head(500)
df = df[["Reviews","Rating","Product Name"]]
df = df.reset_index()
df

Unnamed: 0,index,Reviews,Rating,Product Name
0,0,I feel so LUCKY to have found this used (phone...,5,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
1,1,"nice phone, nice up grade from my pantach revu...",4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
2,2,Very pleased,5,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
3,3,It works good but it goes slow sometimes but i...,4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
4,4,Great phone to replace my lost phone. The only...,4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
...,...,...,...,...
495,495,good phone with a top hardware and relative lo...,5,4G-Unlocked Huawei Honor 6 5.0 TFT LTPS Screen...
496,496,Like phone its nice for price sometimes it get...,5,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."
497,497,So to be put to use by the beneficiary,4,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."
498,498,Attractive phone but did not work upon arrival...,1,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."


The data we'll use consists of only three columns; the product name is not crucial. We've selected only 500 samples from the entire dataset. Following that, we'll import the VADER analyzer, an acronym for Valence Aware Dictionary and Sentiment Reasoner. This will provide us with four values {'neg','neu','pos','compound'} based on the analysis of each word, assigning it a score between -1 and 1.

In [3]:
from nltk.sentiment import SentimentIntensityAnalyzer
from tqdm.notebook import tqdm

sia = SentimentIntensityAnalyzer()

In [4]:
res = {}
for i, row in tqdm(df.iterrows(),total=len(df)):
    text = row['Reviews']
    index = row['index']
    res[index] = sia.polarity_scores(text) #output of polarity_scores() : {'neg','neu','pos','compound'}


  0%|          | 0/500 [00:00<?, ?it/s]

In [5]:
vaders = pd.DataFrame(res).T
vaders = vaders.reset_index()
vaders = vaders.merge(df,how='left').drop(['neg','neu','pos'],axis=1)
vaders

Unnamed: 0,index,compound,Reviews,Rating,Product Name
0,0,0.8783,I feel so LUCKY to have found this used (phone...,5,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
1,1,0.9231,"nice phone, nice up grade from my pantach revu...",4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
2,2,0.4927,Very pleased,5,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
3,3,0.9185,It works good but it goes slow sometimes but i...,4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
4,4,0.2942,Great phone to replace my lost phone. The only...,4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
...,...,...,...,...,...
495,495,0.3818,good phone with a top hardware and relative lo...,5,4G-Unlocked Huawei Honor 6 5.0 TFT LTPS Screen...
496,496,0.8069,Like phone its nice for price sometimes it get...,5,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."
497,497,0.4767,So to be put to use by the beneficiary,4,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."
498,498,-0.1815,Attractive phone but did not work upon arrival...,1,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."


We have excluded 'neg,' 'neu,' and 'pos' scores, focusing solely on the 'compound' score. The data will be categorized into the following intervals for sorting:

* [ -1, -0.5) : 1 Star

* [-0.5, 0) : 2 Stars

* [0] : 3 Stars

* (0, 0.5) : 4 Stars

* [0.5, 1] : 5 Stars

In [6]:
rate_value = []
for i in range(len(vaders)):
    rvalue = vaders['compound'][i]
    if(rvalue<=1 and rvalue>=0.5):
        rate_value.append('5 Stars')
    elif(rvalue<0.5 and rvalue>0):
        rate_value.append('4 Stars')
    elif(rvalue==0):
        rate_value.append('3 Stars')
    elif(rvalue<0 and rvalue>=-0.5):
        rate_value.append('2 Stars')
    else:
        rate_value.append('1 Star')
rate_value=pd.DataFrame(rate_value).rename(columns={0:'Predicted Stars'})
rate_value

Unnamed: 0,Predicted Stars
0,5 Stars
1,5 Stars
2,4 Stars
3,5 Stars
4,4 Stars
...,...
495,4 Stars
496,5 Stars
497,4 Stars
498,2 Stars


In [7]:
results = pd.DataFrame(rate_value).reset_index()
results = results.merge(vaders,how='left').drop(['compound'],axis=1)
results

Unnamed: 0,index,Predicted Stars,Reviews,Rating,Product Name
0,0,5 Stars,I feel so LUCKY to have found this used (phone...,5,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
1,1,5 Stars,"nice phone, nice up grade from my pantach revu...",4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
2,2,4 Stars,Very pleased,5,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
3,3,5 Stars,It works good but it goes slow sometimes but i...,4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
4,4,4 Stars,Great phone to replace my lost phone. The only...,4,"""CLEAR CLEAN ESN"" Sprint EPIC 4G Galaxy SPH-D7..."
...,...,...,...,...,...
495,495,4 Stars,good phone with a top hardware and relative lo...,5,4G-Unlocked Huawei Honor 6 5.0 TFT LTPS Screen...
496,496,5 Stars,Like phone its nice for price sometimes it get...,5,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."
497,497,4 Stars,So to be put to use by the beneficiary,4,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."
498,498,2 Stars,Attractive phone but did not work upon arrival...,1,"5.0"" Cell Phones Unlocked Android 5.1 MTK6580 ..."


in the final resultes we can see the preictoins we made and the real rating the accuracy may not be the best but it considered very good!.