# File 11: Final Error Correction Model

### Input Files:
- 09-phase-1-prediction.csv
- 10-positive-word-score.csv
- 10-negative-word-score.csv

### Output Files:


### Steps:
1. loading required libraries
1. loading required dataframes
1. making a dictionary of word-value pairs
1. calculate sentence score
1. creating a dataset
1. create X and Y to train model
1. splitting dataset for training and testing
1. training model
1. saving model

In [1]:
# loading required libraries
import pandas as pd
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score
from joblib import dump, load

In [2]:
# loading required dataframes
df = pd.read_csv("../../db/09-phase-1-prediction.csv")
pos = pd.read_csv("../../db/10-positive-word-score.csv")
neg = pd.read_csv("../../db/10-negative-word-score.csv")
df.TEXT = df.TEXT.astype('str')

In [3]:
df

Unnamed: 0,USER,RATING,TEXT,ORIGINAL,SENTIMENT,OUTPUT,PRED1
0,TLeC,1,bear watch thought ua loss embarrass,@caregiving I couldn't bear to watch it. And ...,0,0.384,0
1,robrobbierobert,1,count idk either never talk anymor,"@octolinz16 It it counts, idk why I did either...",0,0.137,0
2,lovesongwriter,1,holli death scene hurt sever watch film wri di...,Hollis' death scene will hurt me severely to w...,0,0.130,0
3,starkissed,1,ahh ive alway want see rent love soundtrack,@LettyA ahh ive always wanted to see rent lov...,0,0.661,1
4,Ljelli3166,1,blagh class tomorrow,blagh class at 8 tomorrow,0,0.275,0
...,...,...,...,...,...,...,...
6416,sstallard81992,1,boyfriend tom birthday night,http://twitpic.com/2yf0y - Me and my boyfriend...,1,0.861,1
6417,maricelarios,1,morn slack two day twitter finish good run rea...,Morning! I have slacked for two days in twitte...,1,0.795,1
6418,Codepope,1,sweet altruism finest,@bensummers Isn't that sweet of them.... Altru...,1,0.975,1
6419,christyku,1,um milk father udder quot milk mother quot rin...,"@jakrose Um, milk *fathers* don't have udders....",1,0.818,1


In [4]:
# making a dictionary of word-value pairs
lst = pos.values.tolist()
for row in neg.values.tolist() :
    lst.append(row)
    
dictionary = dict(lst)

In [5]:
# calculate sentence score
score = []

for x in range(len(df)) :
    text = df.TEXT[x]
    array = []
    for word in text.split() :
        if word in dictionary :
            array.append(dictionary[word] * 10)
        else :
            array.append(0)
    
    total = sum(array)        
    score.append(0 if total<0 else 1)

In [6]:
# creating a dataset
dataset = pd.DataFrame(
    list(zip(
        range(len(df)),
        df.USER.values.tolist(),
        df.ORIGINAL.values.tolist(),
        df.TEXT.values.tolist(),
        df.RATING.values.tolist(),
        df.OUTPUT.values.tolist(),
        df.SENTIMENT.values.tolist(),
        df.PRED1.values.tolist(),
        score)),
    columns = [ 'INDEX', 'USER', 'ORIGINAL', 'TEXT', 'RATING', 'OUTPUT', 'SENTIMENT', 'PRED1', 'SENT_SCORE'],
)
dataset.to_csv('../../db/11-final-table.csv', index=False)

In [7]:
dataset

Unnamed: 0,INDEX,USER,ORIGINAL,TEXT,RATING,OUTPUT,SENTIMENT,PRED1,SENT_SCORE
0,0,TLeC,@caregiving I couldn't bear to watch it. And ...,bear watch thought ua loss embarrass,1,0.384,0,0,0
1,1,robrobbierobert,"@octolinz16 It it counts, idk why I did either...",count idk either never talk anymor,1,0.137,0,0,0
2,2,lovesongwriter,Hollis' death scene will hurt me severely to w...,holli death scene hurt sever watch film wri di...,1,0.130,0,0,0
3,3,starkissed,@LettyA ahh ive always wanted to see rent lov...,ahh ive alway want see rent love soundtrack,1,0.661,0,1,0
4,4,Ljelli3166,blagh class at 8 tomorrow,blagh class tomorrow,1,0.275,0,0,0
...,...,...,...,...,...,...,...,...,...
6416,6416,sstallard81992,http://twitpic.com/2yf0y - Me and my boyfriend...,boyfriend tom birthday night,1,0.861,1,1,0
6417,6417,maricelarios,Morning! I have slacked for two days in twitte...,morn slack two day twitter finish good run rea...,1,0.795,1,1,0
6418,6418,Codepope,@bensummers Isn't that sweet of them.... Altru...,sweet altruism finest,1,0.975,1,1,1
6419,6419,christyku,"@jakrose Um, milk *fathers* don't have udders....",um milk father udder quot milk mother quot rin...,1,0.818,1,1,0


In [8]:
# create X and Y to train model
dataset = dataset[['RATING', 'OUTPUT', 'SENT_SCORE', 'PRED1', 'SENTIMENT']]
X = dataset[['OUTPUT', 'RATING', 'SENT_SCORE']]
Y = dataset['SENTIMENT']

In [9]:
# splitting dataset for training and testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2)

In [10]:
# training model
clf = SVC()
clf.fit(X_train, Y_train)
pred = clf.predict(X_test)

In [11]:
# saving model
print(accuracy_score(Y_test, pred)*100)
dump(clf, '../classification-model/clf')

78.28793774319067


['../classification-model/clf']

In [12]:
classifier = load('../classification-model/clf')
pred1 = classifier.predict(X)

In [14]:
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [15]:
print(classification_report(Y_test, pred))

              precision    recall  f1-score   support

           0       0.82      0.72      0.77       637
           1       0.75      0.85      0.80       648

    accuracy                           0.78      1285
   macro avg       0.79      0.78      0.78      1285
weighted avg       0.79      0.78      0.78      1285

