# Lawnmower assignment

We will predict the lawn mower ownership using rising mower dataset.

## 1. Setup

In [1]:
# Common imports
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.svm import SVC
from sklearn import preprocessing

np.random.seed(1)

# 2. Load the data

We will use the AirBNB data that we cleaned in last class (the original, not the one that you altered for last weeks exercise).

In [2]:
# Uncomment the following snippet of code to debug problems with finding the .csv file path
# This snippet of code will exit the program and print the current working directory.
#import os
#print(os.getcwd())

In [14]:
df = pd.read_csv('RidingMowers.csv')
print(df.shape)
# head of the data
print(df.head())

#label encoding
labelencoder = preprocessing.LabelEncoder()
df['Ownership'] = labelencoder.fit_transform(df['Ownership'])

    
from sklearn.model_selection import train_test_split    
#X= df['Income','Lot_Size']
X= df.drop(['Ownership'],axis=1)
y=df['Ownership']
X_train, X_test,y_train, y_test = train_test_split(X,y ,
                                   random_state=104, 
                                   test_size=0.66, 
                                   shuffle=True)

(24, 3)
   Income  Lot_Size Ownership
0    60.0      18.4     Owner
1    85.5      16.8     Owner
2    64.8      21.6     Owner
3    61.5      20.8     Owner
4    87.0      23.6     Owner


In [15]:
y_test

11    1
21    0
18    0
23    0
22    0
10    1
20    0
2     1
15    0
19    0
0     1
17    0
12    0
13    0
9     1
7     1
Name: Ownership, dtype: int64

## 3. Model the data

First, we will create a dataframe to hold all the results of our models.

In [16]:
performance = pd.DataFrame({"model": [], "Accuracy": [], "Precision": [], "Recall": [], "F1": []})

### 3.1 Fit and test a Logistic Regression model

In [17]:
#LINEAR KERNEL
svm_lin_model = SVC(kernel="linear",probability=True)
_ = svm_lin_model.fit(X_train, np.ravel(y_train))
model_preds = svm_lin_model.predict(X_test)
c_matrix = confusion_matrix(y_test, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"linear svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])
performance

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.8125,0.714286,0.833333,0.769231


In [18]:
#RBF KERNEL
svm_rbf_model = SVC(kernel="rbf", C=10, gamma='scale',probability=True)
_ = svm_rbf_model.fit(X_train, np.ravel(y_train))
model_preds = svm_rbf_model.predict(X_test)
c_matrix = confusion_matrix(y_test, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"rbf svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

In [19]:
#Polynomial kernel
svm_poly_model = SVC(kernel="poly", degree=3,probability=True, coef0=1, C=10)
_ = svm_poly_model.fit(X_train, np.ravel(y_train))
model_preds = svm_poly_model.predict(X_test)
c_matrix = confusion_matrix(y_test, model_preds)
TP = c_matrix[1][1]
TN = c_matrix[0][0]
FP = c_matrix[0][1]
FN = c_matrix[1][0]
performance = pd.concat([performance, pd.DataFrame({'model':"poly svm", 
                                                    'Accuracy': [(TP+TN)/(TP+TN+FP+FN)], 
                                                    'Precision': [TP/(TP+FP)], 
                                                    'Recall': [TP/(TP+FN)], 
                                                    'F1': [2*TP/(2*TP+FP+FN)]
                                                     }, index=[0])])

## 5.0 Summary

Sorted by accuracy, the best models are:

In [20]:
performance.sort_values(by=['Accuracy'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,rbf svm,0.375,0.375,1.0,0.545455
0,poly svm,0.625,0.5,1.0,0.666667
0,linear svm,0.8125,0.714286,0.833333,0.769231


Sorted by Precision, the best models are:

In [21]:
performance.sort_values(by=['Precision'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,rbf svm,0.375,0.375,1.0,0.545455
0,poly svm,0.625,0.5,1.0,0.666667
0,linear svm,0.8125,0.714286,0.833333,0.769231


Sorted by Recall, the best models are:

In [22]:
performance.sort_values(by=['Recall'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,linear svm,0.8125,0.714286,0.833333,0.769231
0,rbf svm,0.375,0.375,1.0,0.545455
0,poly svm,0.625,0.5,1.0,0.666667


Sorted by F1, the best models are:

In [23]:
performance.sort_values(by=['F1'])

Unnamed: 0,model,Accuracy,Precision,Recall,F1
0,rbf svm,0.375,0.375,1.0,0.545455
0,poly svm,0.625,0.5,1.0,0.666667
0,linear svm,0.8125,0.714286,0.833333,0.769231


### So which model is the 'best' and the one you wish to choose?

This is very much depending on the profit or loss associated with FP, FN, TP and TN. We will discuss this in the next class.

Among the results generated which is sorted according to accuracy,precision,recall and F1. Poly SVM has highest accuracy value but considering other factors linear SVM tends to be the top and best fit model. For best analyses, we need to consider TP,TN,FP,FN values

In [25]:
import pickle

pickle.dump(svm_poly_model, open('/Users/yeswanthkumarlekkala/Desktop/SEM-2/DSP/25 feb/RidingMowersassignment_pickle.csv', 'wb'))

Since the linear Svm has 0.8125 highest accuracy its concluded that linear svm is the best model. 
