## Sentiment Analysis Using Support Vector Machine

#### Data Class

In [1]:
import random
# creating enumerations using class 
#here it is associated with sentiment 
class Sentiment:
    Negative = 'NEGATIVE'
    Neutral = 'NEUTRAL'
    Positive = 'POSITIVE' 
        
class Review:
    def __init__(self, text, score):
        self.text = text
        self.score = score
        self.sentiment = self.get_sentiment()
    
    def get_sentiment(self):
        if self.score <= 2:
            return Sentiment.Negative
        elif self.score == 3:
            return Sentiment.Neutral
        else:
            return Sentiment.Positive
        
class ReviewContainer:
    def __init__(self,reviews):
        self.reviews = reviews
    
    def evenly_distribute(self):
        negative = list(filter(lambda x : x.sentiment == Sentiment.Negative, self.reviews))
        positive = list(filter(lambda x : x.sentiment == Sentiment.Positive, self.reviews))
        neutral = list(filter(lambda x : x.sentiment == Sentiment.Neutral, self.reviews))
        positive_shrunk = positive[:len(negative)]
        self.reviews = negative + positive_shrunk 
        random.shuffle(self.reviews)

#### Load the file

In [2]:
import json
#productReview.json contains 10261 reviews of products sold through Amazon
file_name='./productReview.json'

reviews=[]

with open(file_name,encoding='utf-8-sig') as f:
    for line in f:
        review = json.loads(line)
        #Creating a list of objects instead of just appending data to list, making it simpler for data handling.
        reviews.append(Review(review['reviewText'],review['overall']))

#### Prepare the datasets

In [3]:
#Spliting into train and test dataset using ‘train_test_split’ class 
from sklearn.model_selection import train_test_split

#'test size' parameter decides the size of the data that has to be split as the test dataset here it is 0.03 i.e 33%. 
#'random_state=42' parameter ensures the result would be the same each time in train and test datasets while executing code.

training, test = train_test_split(reviews,test_size=0.33,random_state=42)

train_data= ReviewContainer(training)
test_data= ReviewContainer(test)
train_data.evenly_distribute()
test_data.evenly_distribute()

print(len(train_data.reviews))
print(len(test_data.reviews))


624
310


In [4]:
# features 
train_x=[x.text for x in train_data.reviews]
train_y=[x.sentiment for x in train_data.reviews]
#labels 
test_x=[x.text for x in test_data.reviews]
test_y=[x.sentiment for x in test_data.reviews]

#### Bags of Words Vectorization

In [5]:
#Formatting text document to matrix of token counts and TF-IDF features.
from sklearn.feature_extraction.text import CountVectorizer,TfidfVectorizer

vectorizer = TfidfVectorizer()

#Learn the vocabulary dictionary and return term-document matrix.
train_x_vec=vectorizer.fit_transform(train_x) 

#Transform documents to document-term matrix.
test_x_vec=vectorizer.transform(test_x)
                                        
print(train_x[0])
train_x_vec.toarray()

With drum pads, I basically either get 0 velocity, or max.  And I have to really hammer on them to trigger.  Hurts after a while.Likes:+ Size+ Assignable knobs.+ Works good w/ Ableton LiveDislikes:- Drum pads!Keys are pretty cheesy but workable and expected for this price and size.I'm looking to upgrade after a month.. Want better drum pads.  :-/


array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.]])

### Classification

#### SVM

In [6]:
from sklearn import svm
from sklearn.metrics import accuracy_score

clf_svm=svm.SVC(kernel='linear')
clf_svm.fit(train_x_vec,train_y)
predicted_svm = clf_svm.predict(test_x_vec)

### Evaluation

#### Accuracy

In [7]:
print('Accuracy of Linear SVM is :' , accuracy_score(test_y,predicted_svm))

Accuracy of Linear SVM is : 0.8032258064516129


#### F1 Score

In [8]:
from sklearn.metrics import f1_score
print('F1 score of Linear SVM is :',f1_score(test_y,predicted_svm,average=None,labels=[Sentiment.Positive,Sentiment.Negative]))

F1 score of Linear SVM is : [0.79322034 0.81230769]


### Realtime Input

In [18]:
#checking_input = input('Enter the string :')
#checking_input=[checking_input]
#check_transform= vectorizer.transform(checking_input)
#print('The input statement is :',clf_svm.predict(check_transform))
#clf_svm.predict(vectorizer.transform(["happy"]))

In [19]:
import pickle 
file = open('SentimentAnalysis_predict.pkl','wb')
pickle.dump(clf_svm,file)

In [20]:
file = open('SentimentAnalysis_transform.pkl','wb')
pickle.dump(vectorizer,file)