In [None]:
%load_ext nb_black

In [None]:
import numpy as np
import pandas as pd

from sklearn.datasets import make_moons
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split, GridSearchCV
from mlxtend.plotting import plot_decision_regions

import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

X, y = make_moons(n_samples=5000, noise=0.3)

Build a dataframe from our `X` and `y` components.  Name the columns `x1`, `x2`, and `y`.

Make a scatterplot of `x1` x `x2` colored by `y`.

Perform a train/test split (use the original `X` and `y` for simplicity)
* Use 20% of the data for testing
* Stratify the split by your class labels to ensure equal proportions of both labels in train/test

In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

## Linear SVM

* Build an SVM classifier with a linear kernel
* Print the train and test accuracy

* Plot the decision boundary

In [None]:
plot_decision_regions(X_train, y_train, clf=model, scatter_kwargs={"alpha": 0.05})
plt.show()

## Polynomial SVM

* Build an SVM classifier with a poly kernel
* Print the train and test accuracy
* Plot the decision boundary

In [None]:
model = _____
model.fit(X_train, y_train)

print(f"Train score: {model.score(X_train, y_train)}")
print(f"Test score: {model.score(X_test, y_test)}")

plot_decision_regions(X_train, y_train, clf=model, scatter_kwargs={"alpha": 0.05})
plt.show()

Loop through varying degrees of polynomials and show the accuracy/plot for each.  This could be a use case for a function.  Because we want to redo the same process over and over.

An aside about `kwargs`.  This abbrevation stands for **K**ey **W**ord **ARG**ument**S**.  If you want your function to accept a lot of different arguments, but you don't want to limit them you might use `kwargs`.  In this case of our function.  We want to pass keyword arguments to the model.  Inside the function, we'll print these out and see that they end up as a dictionary in the function.  We then use a double asterisk to pass the functions to our model.

In [None]:
def svm_fit_score_plot(**kwargs):
    print(kwargs)

    model = SVC(**kwargs)
    model.fit(X_train, y_train)

    print(f"\nTrain score: {model.score(X_train, y_train)}")
    print(f"Test score: {model.score(X_test, y_test)}")

    plot_decision_regions(X_train, y_train, clf=model, scatter_kwargs={"alpha": 0.05})
    plt.show()

Complete the `for` loop to pass each value of `degrees` to our custom function.

In [None]:
degrees = [1, 2, 5, 10]
for ______:
    svm_fit_score_plot(kernel="poly", degree=_____)

If we didn't care about plotting a better solution than our function would be to use [`GridSearchCV`](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html) from `sklearn`.

* Define a dictionary named `grid` with the key as `"degree"` and the value as a list of values: 1, 2, & 5.
* Create a `GridSearchCV` object with `SVC` using the `"poly"` `kernel`
* Fit the model to your training data

In [None]:
grid = _____
____
____

With the `verbose` option we see that a total of `15` models were fit.  This number is specifically from the `Fitting 5 folds for each of 3 candidates` verbage.  This means we did 3 different models (the number of degree options we gave the model) and we fit each of these models 5 times (1 per fold, aka a new train/test split 5 times).

The parameters that led to the average accuracy on each of these test sets is stored in the `best_params_` attribute.  In this case, we see that a polynomial of degree 1 is the most accurate. A polynomial of degree one is a line.  So all we did was a fancy linear kernel.

In [None]:
model.best_params_

We know of a another hyperparameter though, `C`.   We can add this to our `GridSearchCV` to evaluate the best combination of `C` and `degree`.  This adds a lot more models to build.  This is something to consider when grid searching.  We added 3 values for `C` so this triples the number of models to be fit (1 new model for each combination of `degree` and `C`.

* Add `C` to our parameters dictionary with the values `0.1`, `1`, `10`
* Re-run the grid search with these parameter options
* Print out the best parameters.

In [None]:
grid = {"degree": [1, 2, 5], "C": [0.1, 1, 10]}

model = GridSearchCV(SVC(kernel="poly"), params, verbose=1)
model.fit(X_train, y_train)

model.best_params_

Hey, we're not linear anymore.  Let's view this new decision boundary. With our custom function.

In [None]:
C = model_cv.best_params_["C"]
degree = model_cv.best_params_["degree"]

svm_fit_score_plot(kernel="poly", C=C, degree=degree)

## RBF SVM (radial basis function)

* Build an SVM classifier with a rbf kernel
* Print the train and test accuracy
* Plot the decision boundary

In [None]:
model = SVC(kernel="rbf")
model.fit(X_train, y_train)

print(f"Train score: {model.score(X_train, y_train)}")
print(f"Test score: {model.score(X_test, y_test)}")

plot_decision_regions(X_train, y_train, clf=model, scatter_kwargs={"alpha": 0.05})
plt.show()

Oooh, that's pretty and it works pretty well in comparison to other types.  Let's vary the value of `C` and see what happens.

* Define a list of c values using `0.1`, `1`, `10`, `100`
* Write a `for` loop to pass each of these values to our custom function (use the rbf kernel in each iteration.

The shape of our decision regions changes pretty drastically, but our accuracy doesn't.  This is where the principle of parsimony should come into play.  The simpler the model the better.  `C` is essentially a measure of how complex the model is.  So if you have models with similar accuracy, in general, you should choose the simpler model (in this case we would choose the lower value of `C` unless a higher value shows to perform way better).

## Grid Search all the things

What's the best `kernel`, `degree`, `C`?  We've seen a grid search for `degree` and `C`.  We can add `kernel` to this search as well.

* Add `kernel` to our grid search `params` dictionary; use every kernel we've looked at in this notebook
* Perform the grid search
* Print the best parameters

135 models! That's a lot of work being done for us.  What's the best model with these parameter options?

* Pass in the best parameters to our custom function to `score` it and view the decision boundary