8. Train a LinearSVC on a linearly separable dataset. Then train an SVC and a
SGDClassifier on the same dataset. See if you can get them to produce roughly
the same model.

In [2]:
from sklearn.svm import LinearSVC, SVC
from sklearn.linear_model import SGDClassifier
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris['data'], iris['target']

In [3]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X = scaler.fit_transform(X)

In [4]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, stratify=y)

In [5]:
from sklearn.multiclass import OneVsRestClassifier, OneVsOneClassifier

# OVR approximately 0.866 accuracy, OVO approximately 0.966 accuracy

svc_1 = OneVsOneClassifier(LinearSVC(loss='hinge', tol=0.001)) # OVR by default
svc_2 = SVC(kernel='linear', max_iter=1000) # OVO by default
svc_3 = OneVsOneClassifier(SGDClassifier(loss='hinge')) # Looks like OVR by default

svc_1.fit(X_train, y_train)

In [6]:
from sklearn.metrics import accuracy_score

svc_2.fit(X_train, y_train)
svc_3.fit(X_train, y_train)

accuracy_score(y_test, svc_1.predict(X_test)), accuracy_score(y_test, svc_2.predict(X_test)), accuracy_score(y_test, svc_3.predict(X_test))

(0.9666666666666667, 0.9666666666666667, 0.9666666666666667)

9. Train an SVM classifier on the MNIST dataset. Since SVM classifiers are binary
classifiers, you will need to use one-versus-the-rest to classify all 10 digits. You
may want to tune the hyperparameters using small validation sets to speed up the
process. What accuracy can you reach?

In [7]:
from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', version=1, return_X_y=True, as_frame=False)
X, y = mnist

In [8]:
X = X / 255

In [9]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True, stratify=y)

In [10]:
svc_mnist = SGDClassifier(loss='hinge')
svc_mnist.fit(X_train, y_train)

In [11]:
y_pred = svc_mnist.predict(X_test)
accuracy_score(y_test, y_pred)

0.9161428571428571

10. Train an SVM regressor on the California housing dataset.

In [24]:
from sklearn.datasets import fetch_california_housing

housing = fetch_california_housing()
X, y = housing['data'], housing['target']

In [17]:
scaler = StandardScaler()
X = scaler.fit_transform(X)

In [25]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=True)

In [31]:
from sklearn.svm import SVR

svr = SVR()
svr.fit(X_train, y_train)

In [34]:
from sklearn.metrics import root_mean_squared_error

y_pred = svr.predict(X_test)
print(f'{y.mean()} +- {root_mean_squared_error(y_test, y_pred)}')

2.068558169089147 +- 1.155414893040421
