# Linear SVC Assignment

In [2]:
import numpy as np
import pandas as pd

from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

### Import the admissions data set (admissions.csv).

In [12]:
data = pd.read_csv('https://tf-assets-prod.s3.amazonaws.com/tf-curric/data-science/admissions.csv')
data.head()

Unnamed: 0,GRE,TOEFL,SchoolRank,SOP,LOR,GPA,Research,Admitted
0,337,118,4,4.5,4.5,9.65,1,1
1,324,107,4,4.0,4.5,8.87,1,1
2,316,104,3,3.0,3.5,8.0,1,1
3,322,110,3,3.5,2.5,8.67,1,1
4,314,103,2,2.0,3.0,8.21,0,0


### Split the data into training and test sets, with the test set comprising 30% of the data.  Use `'Admitted'` as the target.

In [13]:
X = data.drop('Admitted', axis=1)
y = data['Admitted']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

### Generate an SVC model with a linear kernel. Set the regularization parameter (C) = 10. Check the score for both train and test sets. 

In [14]:
svc = SVC(kernel='linear', C=10)
fit = svc.fit(X_train, y_train)
print('train score for c=10:', svc.score(X_train, y_train))
print('test score for c=10:', svc.score(X_test, y_test))

train score for c=10: 0.8642857142857143
test score for c=10: 0.8916666666666667


### Choose some other values for C and show the difference between the scores for the train and test sets.

In [15]:
svc = SVC(kernel='linear', C=100)
fit = svc.fit(X_train, y_train)
print('train score for c=100:', svc.score(X_train, y_train))
print('test score for c=100:', svc.score(X_test, y_test))

print()

svc = SVC(kernel='linear', C=1)
fit = svc.fit(X_train, y_train)
print('train score for c=1:', svc.score(X_train, y_train))
print('test score for c=1:', svc.score(X_test, y_test))

print()

svc = SVC(kernel='linear', C=0.1)
fit = svc.fit(X_train, y_train)
print('train score for c=0.1:', svc.score(X_train, y_train))
print('test score for c=0.1:', svc.score(X_test, y_test))

train score for c=100: 0.8535714285714285
test score for c=100: 0.8916666666666667

train score for c=1: 0.8678571428571429
test score for c=1: 0.8833333333333333

train score for c=0.1: 0.8428571428571429
test score for c=0.1: 0.85


### What if we switched up the target variable? Let assume that we know whether a student was admitted. Let's try to predict what their SchoolRank was. 

Create an SVC model with a linear kernel with the SchoolRank field as the target variable. Report both the train and the test scores.

In [17]:
X = data.drop('SchoolRank', axis=1)
y = data['SchoolRank']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

svc = SVC(kernel='linear', C=1)
fit = svc.fit(X_train, y_train)
print('train score for c=1:', svc.score(X_train, y_train))
print('test score for c=1:', svc.score(X_test, y_test))

train score for c=1: 0.625
test score for c=1: 0.5166666666666667


### Show confusion matrices for the training and test sets, and a classification report for the test set. What trends do you notice?

In [23]:
train_pred = fit.predict(X_train)
test_pred = fit.predict(X_test)
print(confusion_matrix(y_train, train_pred), '\n')
print(confusion_matrix(y_test, test_pred), '\n')
print(classification_report(y_test, test_pred))

[[ 7  8  0  0  0]
 [ 6 38 23  1  1]
 [ 0 15 86  1  2]
 [ 0  2 17 12 18]
 [ 0  0  4  7 32]] 

[[ 2  8  1  0  0]
 [ 4 18 15  1  0]
 [ 0  5 24  0  0]
 [ 0  2  7  8  8]
 [ 0  0  4  3 10]] 

              precision    recall  f1-score   support

           1       0.33      0.18      0.24        11
           2       0.55      0.47      0.51        38
           3       0.47      0.83      0.60        29
           4       0.67      0.32      0.43        25
           5       0.56      0.59      0.57        17

    accuracy                           0.52       120
   macro avg       0.51      0.48      0.47       120
weighted avg       0.53      0.52      0.50       120



The accuracy and precision of the model for the test data is much lower in a multi-class model, which makes sense as there are more decision boundaries to make and those may overlap.