### Comparing with sklearn's SVC model

In this demo, we run sklearn's SVC model using a rbf kernel and compare it to the implementation
in this repo using the same kernel.

The dataset used is vowels dataset from the website https://statweb.stanford.edu/~tibs/ElemStatLearn/.

In [1]:
import pandas as pd
import numpy as np

train = pd.read_csv('https://web.stanford.edu/~hastie/ElemStatLearn/datasets/vowel.train', index_col=0)
test = pd.read_csv('https://web.stanford.edu/~hastie/ElemStatLearn/datasets/vowel.test', index_col=0)
x_train = train.drop('y', axis=1).values
y_train = train['y'].values
x_test = test.drop('y', axis=1).values
y_test = test['y'].values

display(train.head())
display(test.head())

Unnamed: 0_level_0,y,x.1,x.2,x.3,x.4,x.5,x.6,x.7,x.8,x.9,x.10
row.names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,1,-3.639,0.418,-0.67,1.779,-0.168,1.627,-0.388,0.529,-0.874,-0.814
2,2,-3.327,0.496,-0.694,1.365,-0.265,1.933,-0.363,0.51,-0.621,-0.488
3,3,-2.12,0.894,-1.576,0.147,-0.707,1.559,-0.579,0.676,-0.809,-0.049
4,4,-2.287,1.809,-1.498,1.012,-1.053,1.06,-0.567,0.235,-0.091,-0.795
5,5,-2.598,1.938,-0.846,1.062,-1.633,0.764,0.394,-0.15,0.277,-0.396


Unnamed: 0_level_0,y,x.1,x.2,x.3,x.4,x.5,x.6,x.7,x.8,x.9,x.10
row.names,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,1,-1.149,-0.904,-1.988,0.739,-0.06,1.206,0.864,1.196,-0.3,-0.467
2,2,-2.613,-0.092,-0.54,0.484,0.389,1.741,0.198,0.257,-0.375,-0.604
3,3,-2.505,0.632,-0.593,0.304,0.496,0.824,-0.162,0.181,-0.363,-0.764
4,4,-1.768,1.769,-1.142,-0.739,-0.086,0.12,-0.23,0.217,-0.009,-0.279
5,5,-2.671,3.155,-0.514,0.133,-0.964,0.234,-0.071,1.192,0.254,-0.471


##### Fitting a sklearn kernel SVM

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
from sklearn.multiclass import OneVsOneClassifier

scaler = StandardScaler().fit(x_train)
x_train_scaled = scaler.transform(x_train)
x_test_scaled = scaler.transform(x_test)

model = SVC(kernel='rbf', gamma=0.1)
%time model.fit(x_train_scaled, y_train)

CPU times: user 10.1 ms, sys: 251 µs, total: 10.4 ms
Wall time: 10.8 ms


SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=0.1, kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [3]:
from sklearn.metrics import accuracy_score

yhat = model.predict(x_test_scaled)
print('Accuracy on test', accuracy_score(yhat, y_test))

Accuracy on test 0.6125541125541125


##### Fitting the model from this library

In [4]:
from lib.model import Model
from sklearn.metrics import accuracy_score

options = dict(
    l2_lambda = 1,
    objective = 'huber_hinge',
    kernel = dict(
        fn = 'rbf',
        gamma = 1
    )
)

model = Model(options)
%time model.fit(x_train, y_train)

100%|██████████| 55/55 [00:00<00:00, 113.30it/s]

CPU times: user 475 ms, sys: 7.98 ms, total: 483 ms
Wall time: 487 ms





In [5]:
yhat = model.predict(x_test)
print('Accuracy on test', accuracy_score(yhat, y_test))

Accuracy on test 0.6147186147186147


The test accuracies are similar to each other, however, sklearn's model runs much faster.