**<span style="color:#448844">Note</span>** This notebook is meant to be interactive. Launch this notebook in Jupyter to see its full potential.

Name:

Section:

# Support Vector Machines Exercise
This exercise will guide you in implementing Support Vector Machines (SVM). At the end, you will also see the effect of hyperparameters on your model.

## Instructions
* Read each cell and implement the TODOs sequentially. The markdown/text cells also contain instructions which you need to follow to get the whole notebook working.
* Do not change the variable names unless the instructor allows you to.
* Answer all the markdown/text cells with "A: " on them. The answer must strictly consume one line only.
* You are expected to search how to some functions work on the Internet or via the docs. 
* There are commented markdown cells that have crumbs. Do not delete them or separate them from the cell originally directly below it.  
* You may add new cells for "scrap work" as long as the crumbs are not separated from the cell below it.
* The notebooks will undergo a "Restart and Run All" command, so make sure that your code is working properly.
* You are expected to understand the data set loading and processing separately from this class.
* You may not reproduce this notebook or share them to anyone.

In [None]:
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')

plt.rcParams['figure.figsize'] = (12.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'

%load_ext autoreload
%autoreload 2

__This notebook will follow this format:__
* We will create toy datasets to try out SVM models with different kernel types
* We will see how we can regularize our SVM model
* We will train, validate, and test our SVM model with the breast cancer dataset.

# Creating an SVM model with different kernels

## Generating a linearly separable dataset
Let's create a linearly separable dataset before we get to more difficult datasets.

In [None]:
from sklearn.datasets import make_blobs
np.random.seed(42)

centers = [[1, 2], [-1, -4]]

X_linear, y_linear = make_blobs(n_samples=100, centers=centers)
y_linear[y_linear == 0] = -1

plt.scatter(X_linear[:,0], X_linear[:,1],c=y_linear)
print("Shape of X_linear", X_linear.shape)
print("Shape of y_linear", y_linear.shape)

## Train an SVM model
We will use `sklearn`'s `SVC` model from the `svm` package. To start, create an SVM model with a __linear kernel.__

In [None]:
from sklearn import svm
# svm.SVC?

In [None]:
# write code here
svc_linear = None

Train it with our `X_linear` and `y_linear`

In [None]:
# write code here


Get the predictions

In [None]:
# write code here
predictions = None

predictions

We will be computing for the accuracy multiple times in this notebook, so let's create a function for this.

`compute_accuracy()` will compute for the accuracy given two vectors of equal length

__Inputs:__
- `predictions`: A numpy array of shape `(N,)` consisting of `N` samples representing the predicted values
- `actual`: A numpy array of shape `(N,)` consisting of `N` samples representing the actual (target) values

__Outputs:__
- `accuracy`: A scalar representing the percentage of elements where `predictions` and `actual` match out of the total number of elements

In [None]:
def compute_accuracy(predictions, actual):
    # write code here
    return None

In [None]:
print("Linear SVC accuracy: ", compute_accuracy(predictions, y_linear),"%")

**Sanity Check:** This is a linearly separable data, so linear kernel should get a 100% accuracy here.

## Visualize our model
Let's try to visualize the decision boundary and the margin. To do this, we need to get the __weights__ and __bias/y-intercept__ of our model.

Get the weights of our model

In [None]:
# write code here
W = None

W

Get the bias from the model

In [None]:
# write code here
b = None

b

The following code will plot our decision boundary (using `W` and `b`) and the fat margin

In [None]:
plt.figure(figsize=(15,12))

# plotting our data
plt.scatter(X_linear[:,0], X_linear[:,1],c=y_linear)

# plot decision boundary for 2D case
x_1 = np.min(X_linear[:,0])
y_1 = (-b - W[0]*x_1) / W[1]
x_2 = np.max(X_linear[:,0])
y_2 = (-b - W[0]*x_2) / W[1]

plt.plot([x_1, x_2], [y_1, y_2],'g',label="Decision Boundary")

# plot margins
x_1_a = np.min(X_linear[:,0])
y_1_a = (1 - b - W[0]*x_1) / W[1]
x_2_a = np.max(X_linear[:,0])
y_2_a = (1 - b - W[0]*x_2) / W[1]
plt.plot([x_1_a, x_2_a], [y_1_a, y_2_a],'--m')

x_1_b = np.min(X_linear[:,0])
y_1_b = (-1 - b - W[0]*x_1) / W[1]
x_2_b = np.max(X_linear[:,0])
y_2_b = (-1 - b - W[0]*x_2) / W[1]
plt.plot([x_1_b, x_2_b], [y_1_b, y_2_b],'--m', label="Margins")
plt.legend()

plt.title("Linear SVM")

plt.savefig("SVM linear with decision boundary.png")

**Sanity Check:** You should see the boundary (green line) clearly cutting the data with a large margin on either side.

## Support vectors
From the lecture, we learned that SVM retains the training instances which "define" the boundary and uses them to predict the new instance's class. Let's see those training instances which SVMs refer to as **support vectors**.

Get the number of support vectors in the model

In [None]:
# write code here


**Sanity Check:** You should see an array of two numbers. The numbers will tell you the number of support vectors for both classes.

__Question #1:__ How many support vectors did the model need in total?

<!--crumb;qna;Question: How many support vectors did the model need in total?-->

A: 

We can also get the features of the actual support vectors

In [None]:
# write code here


**Sanity Check:** You should see the coordinates/feature values of the chosen support vectors. Verify their positions in the visualization above. They should be the nearest points to the margin.

For the questions below, feel free to modify the visualization code above to plot the support vectors.

__Question #2:__ How many support vectors does the class in `yellow` need?

<!--crumb;qna;Question: How many support vectors does the class in yellow need?-->

A: 

__Question #3:__ How many support vectors does the class in `violet` need?

<!--crumb;qna;Question: How many support vectors does the class in violet need?-->

A: 

____

# Different kernels
We can extend SVM's to produce non-linear decision boundaries through kernels. This is similar to the feature transform that you did for the polynomial regression. The difference is that kernels gives you a way to get the same output without explicitly performing the feature transform (which may be expensive to compute specially for very high dimensional transforms). It can even represent an infinite dimensional transform such as the Gaussian / Radial Basis Function (RBF) kernel which _in theory_ can linearly separate any data. However, without proper tuning and regularization, we risk overfitting to the training data which makes your classifier useless since it cannot generalize to unseen data.

To apply kernels, we simply replace all instances of the inner product $\langle \cdot,\cdot \rangle$ with the kernel $K(\cdot,\cdot)$. 

Note that $$W = \sum_{i=1}^N \alpha_i y_i x_i$$

which implies that $$f(z) = W^Tz+b = \sum_{i=1}^N \alpha_i y_i \langle x_i, z\rangle +b$$ 

So you will need to modify the some of the functions to apply the kernel.

Some commonly used kernels:

- __Gaussian / Radial Basis Function (RBF) kernel__: $K(x,z) = \exp \bigl( -\frac{\Vert x-z \Vert^2}{2\sigma^2}\bigr)$
- __Polynomial kernel__: $K(x,z) = (x^Tz+c)^d$, where $d$ is the degree of the polynomial and $c$ is a hyperparameter set by the user

## Radial basis function kernel
### Generating a non-linearly separable dataset

In [None]:
def generate_dummy_circle_data(num_points):
    np.random.seed(42)
    
    r = np.random.uniform(0,0.2,num_points)
    theta = np.random.uniform(0,2*np.pi,num_points)
    inner_circle = np.array([r*np.sin(theta), r*np.cos(theta)]).T
    
    r = np.random.uniform(0.5,0.7,num_points)
    theta = 2*np.pi*np.arange(num_points)/num_points
    outer_circle = np.array([r*np.sin(theta), r*np.cos(theta)]).T

    X = np.concatenate((inner_circle,outer_circle),axis=0)
    y = np.concatenate((np.ones(num_points), np.zeros(num_points)),axis=0)
    
    randIdx = np.arange(X.shape[0])
    np.random.shuffle(randIdx)
    
    X = X[randIdx]
    y = y[randIdx].astype(int)
    
    return X, y

X_circle,y_circle = generate_dummy_circle_data(100)

print("Shape of X_circle", X_circle.shape)
print("Shape of y_circle", y_circle.shape)

plt.scatter(X_circle[:,0],X_circle[:,1],c=y_circle)

### Train model
This data can be easily separated using an `rbf` kernel. Create an `rbf`-kernel SVM, and keep the other parameters to their default values for now. Train it on the circle dataset (`X_circle` and `y_circle`).

In [None]:
# write code here
svc_rbf = None


Get the model predictions on `X_circle`

In [None]:
# write code here
predictions = None

predictions

Then get the accuracy

In [None]:
print("RBF SVC accuracy: ", compute_accuracy(predictions, y_circle),"%")

**Sanity check:** Using `rbf` here will give you a perfect accuracy.

Get the number of support vectors per class

In [None]:
# write code here


__Question #4:__ How many support vectors does the model need?

<!--crumb;qna;Question: How many support vectors does the model need?-->

A: 

### Visualize our model

The code below will visualize the SVM's (with an `rbf` kernel) decision boundary.

In [None]:
plt.figure(figsize=(15,12))

# visualize the decision boundary
x_min, x_max = X_circle[:, 0].min() - 0.2, X_circle[:, 0].max() + 0.2
y_min, y_max = X_circle[:, 1].min() - 0.2, X_circle[:, 1].max() + 0.2
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))

x_test = np.squeeze(np.stack((xx.ravel(),yy.ravel()))).T

Z = svc_rbf.predict(x_test)
Z = Z.reshape(xx.shape)

plt.contourf(xx, yy, Z)
plt.scatter(X_circle[:, 0], X_circle[:, 1], c=y_circle, edgecolors='black')
plt.xlim([x_min,x_max])
plt.ylim([y_min,y_max])

plt.title("RBF kernel SVM")
plt.savefig("SVM rbf kernel")

**Sanity Check:** You should see the data cropped in the middle, separating the classes from each other.

## Polynomial kernel

### Generating a polynomial dataset
The following cell just creates a dataset that we know is non-linearly separable.

In [None]:
np.random.seed(1)

y1 = [val**2 + np.random.random()*5 for val in np.arange(0,10,1)]
y2 = [val**2 + np.random.random()*5 + 30 for val in np.arange(0,10,1)]

X_poly = np.zeros((2*10,2))
X_poly[:,0] = np.append(np.arange(0,10,1),np.arange(0,10,1))
X_poly[:,1] = np.append(y1,y2)

y_poly = np.zeros((2*10,))
y_poly[0:10] = -1
y_poly[10:20] = 1

plt.scatter(X_poly[:,0],X_poly[:,1],c=y_poly)

### Using a polynomial kernel
The visualization above is a clear indicator that the data cannot be (properly) separated by a line.

Create an SVM model with a `poly` kernel. Set the polynomial degree to `3` for now, $\gamma$ to 2. Train it with our polynomial data.

In [None]:
# svm.SVC?

In [None]:
# write code here
svc_poly = None


Get the predictions

In [None]:
# write code here
predictions = None

predictions

Let's compute for the accuracy

In [None]:
print("Polynomial SVC accuracy: ", compute_accuracy(predictions, y_poly),"%")

### Visualize our model

The code below will visualize the polynomial SVM's decision boundary.

In [None]:
plt.figure(figsize=(15,12))

x_min, x_max = X_poly[:, 0].min() - 0.2, X_poly[:, 0].max() + 0.2
y_min, y_max = X_poly[:, 1].min() - 0.2, X_poly[:, 1].max() + 0.2

idxPlus=y_poly[y_poly<0]
idxMin=y_poly[y_poly>0]
plt.scatter(X_poly[:,0],X_poly[:,1],c=y_poly)


X2,Y2 = np.mgrid[x_min:x_max:100j,y_min:y_max:100j]
Z = svc_poly.decision_function(np.c_[X2.ravel(),Y2.ravel()])
Z = Z.reshape(X2.shape)
plt.contourf(X2,Y2,Z > 0,alpha=0.4)

plt.contour(X2,Y2,Z,colors=['k','k','k'], linestyles=['--','-','--'],levels=[-1,0,1])
plt.scatter(svc_poly.support_vectors_[:,0],svc_poly.support_vectors_[:,1],s=120,facecolors='none')
plt.scatter(X_poly[:,0],X_poly[:,1],c=y_poly,s=50,alpha=0.95);

plt.title("Polynomial kernel SVM")
plt.savefig("SVM polynomial kernel")

___

# Regularizing our SVM models

## Tuning the hyperparameter `degree`
We have two regularizers called $\gamma$ and $C$. But before we get to that, let's try to fit our polynomial `SVC` to our circle dataset with varying values for `degree`.

The degrees will be as follows:

In [None]:
degree_vals = [1, 3, 5, 10, 15, 20]

degree_vals

### Visualizing the decision boundary as gamma increases

In the code below, initialize an `SVC` with an `poly` kernel. Set the hyperparameter `degree` to match the `degree` for that iteration.

Then fit it to our data `X_circle` and `y_circle`

In [None]:
x_min, x_max = X_circle[:, 0].min() - 1, X_circle[:, 0].max() + 1
y_min, y_max = X_circle[:, 1].min() - 1, X_circle[:, 1].max() + 1
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                     np.linspace(y_min, y_max, 200))

# We will let our classifier predict this set of values
x_test = np.squeeze(np.stack((xx.ravel(),yy.ravel()))).T

plt.figure(figsize=(15,10))
plt_ctr = 1

for degree in degree_vals:

    # write code here
    clf = None
    
    
    Z = clf.predict(x_test)
    Z = Z.reshape(xx.shape)
    
    plt.subplot(3,3,plt_ctr)
    plt.contourf(xx, yy, Z)
    plt.scatter(X_circle[:, 0], X_circle[:, 1], c=y_circle, edgecolors='black')
    
    plt.xlim([x_min,x_max])
    plt.ylim([y_min,y_max])
    
    plt.title("poly kernel, degree=" + str(degree))
    
    plt_ctr += 1

__Question #5:__ Was the `SVC` with a `poly` kernel able to separate the classes of the circle dataset?

<!--crumb;qna;Question: Was the SVC with a poly kernel able to separate the classes of the circle dataset?-->

A: 

## Preparing our datasets for experiments on $\gamma$ and $C$ 

Before we proceed to tuning our $\gamma$ and $C$ values, let's create a dataset for these experiments 

__Mixed dataset.__ This dataset will have a large overlap between the two classes.

In [None]:
np.random.seed(1)
centers = [[0, 0], [1, 0]]
X_mixed, y_mixed = make_blobs(n_samples=30, centers=centers)
y_mixed[y_mixed == 0] = -1
plt.scatter(X_mixed[:,0], X_mixed[:,1],c=y_mixed)

print("Shape of X_mixed",X_mixed.shape)
print("Shape of y_mixed", y_mixed.shape)

This dataset is intentionally made to have a large overlap so we can see how our model will change its boundary when we tune our hyperparameters

__Outlier dataset.__ This dataset is the same as the linear dataset (`X_linear`, `y_linear`) but will also contain two outliers

In [None]:
X_outlier = np.concatenate((X_linear, [[-1,-2],[-0.5, -1.75]]), axis=0)
y_outlier = np.concatenate((y_linear, [0, 0]))

plt.scatter(X_outlier[:,0], X_outlier[:,1], c=y_outlier)

## Tuning Hyperparameter $\gamma$ (gamma)

From the docs:

> The behavior of the model is very sensitive to the gamma parameter. If gamma is too large, the radius of the area of influence of the support vectors only includes the support vector itself and no amount of regularization with C will be able to prevent overfitting.

> When gamma is very small, the model is too constrained and cannot capture the complexity or “shape” of the data. The region of influence of any selected support vector would include the whole training set. The resulting model will behave similarly to a linear model with a set of hyperplanes that separate the centers of high density of any pair of two classes.

For the following section we will try out the following $\sigma$ values. The $\gamma$ = $1/\sigma$

In [None]:
sigma_vals = np.arange(0.0001, 0.9, 0.1)
sigma_vals

### Visualizing the decision boundary as $\gamma$ increases

`visualize_gamma_boundary()` will visualize the decision boundary of an `SVC` with varying degrees of $\sigma$

__Inputs:__
- `kernel`: could either be `poly`, or `rbf`
- `sigma_vals`: A numpy array of shape `(S,)` consisting of `S` numbers representing the $\sigma$ values we want to try out
- `X`: A numpy array of shape `(N, 2)` consisting of `N` samples and `2` dimensions representing the data features `X`
- `y`: A numpy array of shape `(N,)` consisting of `N` samples representing the class of each sample


The code has been filled up except for the part where the `SVC` is initialized and trained on `X` and `y`. __Your tasks are:__
- Initialize an `SVC` with the `kernel`. Do not forget to set `gamma` hyperparameter to the inverse of that iteration's `sigma` value. 
- Fit the model to the input data `X` and `y`

In [None]:
def visualize_gamma_boundary(kernel, sigma_vals, X, y):
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))

    # We will let our classifier predict this set of values
    x_test = np.squeeze(np.stack((xx.ravel(),yy.ravel()))).T

    plt.figure(figsize=(15,10))
    plt_ctr = 1

    for sigma in sigma_vals:

        # write code here
        clf = 
        

        Z = clf.predict(x_test)
        Z = Z.reshape(xx.shape)

        plt.subplot(3,3,plt_ctr)
        plt.contourf(xx, yy, Z)
        plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='black')

        plt.xlim([x_min,x_max])
        plt.ylim([y_min,y_max])

        plt.title(kernel + " kernel, gamma = " + "{:.2f}".format(1/sigma))

        plt_ctr += 1

__Mixed dataset__ Let's try to run an SVM with an `rbf` kernel on the mixed dataset.

In [None]:
visualize_gamma_boundary("rbf", sigma_vals, X_mixed, y_mixed)

**Sanity Check:** $\gamma = 1/\sigma$

As $\gamma$ increases, the standard deviation decreases. What you should see are blobs with small sizes (small standard deviation/$\sigma$) when $\gamma$ is large. As $\gamma$ decreases (and $\sigma$ increases), then the blobs look like they are clustering together.

__Outlier dataset__ We can also check out how $\gamma$ will handle the outlier dataset with a `poly` kernel.

In [None]:
visualize_gamma_boundary("rbf", sigma_vals, X_outlier, y_outlier)

__Question #6:__ What range of `gamma` values better fit our outlier data? Smaller or bigger `gamma` values?

<!--crumb;qna;Question: What range of gamma values better fit our outlier data? Smaller or bigger gamma values?-->

A: 

## Tuning Hyperparameter `C`


Recall: The dual optimization problem for support vector machines that we want to solve.<br />

$$
\begin{array}{lll}
    \max_\alpha & \quad \sum_{i=1}^N \alpha_i - \frac{1}{2}\sum_{i=1}^N\sum_{j=1}^N y_i y_j \alpha_i \alpha_j \langle x_i, x_j \rangle \\
    \text{such that} & \quad 0 \leq \alpha_i \leq C, \forall_i \\
    \quad & \quad \sum_{i=1}^N \alpha_i y_i = 0
\end{array}
$$
Here, we see that all $\alpha$'s must not only be more than or equal to 0, we bound it to $0 \leq \alpha_i \leq C$

from sklearn's documentation:

>The `C` parameter trades off misclassification of training examples against simplicity of the decision surface. A low `C` makes the decision surface smooth, while a high `C` aims at classifying all training examples correctly by giving the model freedom to select more samples as support vectors.

Here, you can think of `C` as the inverse of our previous regularization parameter $\lambda$, so $C = 1/\lambda$


$$
\begin{align}
 \alpha_i = 0 & \implies y_i (W^Tx_i + b) \geq 1 \\
 \alpha_i = C & \implies y_i (W^Tx_i + b) \leq 1 \\
 0 < \alpha_i < C & \implies y_i (W^Tx_i + b) = 1
\end{align}
$$

For the following section we will try out the following `C` values

In [None]:
C_vals = [0.0001, 0.001, 0.01 , 0.1, 1.0, 1.5, 2.0, 30, 50, 1000, 2000, 5000]

### Visualizing the decision boundary as gamma increases

`visualize_gamma_boundary()` will visualize the decision boundary of an `SVC` with varying degrees of $C$

__Inputs:__
- `kernel`: could either be `poly`, or `rbf`
- `C_vals`: A numpy array of shape `(S,)` consisting of `S` numbers representing the `C` values we want to try out
- `X`: A numpy array of shape `(N, 2)` consisting of `N` samples and `2` dimensions representing the data features `X`
- `y`: A numpy array of shape `(N,)` consisting of `N` samples representing the class of each sample


The code has been filled up except for the part where the `SVC` is initialized and trained on `X` and `y`. __Your tasks are:__
- Initialize an `SVC` with the `kernel`. Do not forget to set the `C` hyperparameter to that iteration's `C` value. 
- Then fit it to the input data `X` and `y`.


In [None]:
def visualize_C_boundary(kernel, C_vals, X, y):
    # visualize the decision boundary as C varies
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.linspace(x_min, x_max, 200),
                         np.linspace(y_min, y_max, 200))

    # This will be our test set
    x_test = np.squeeze(np.stack((xx.ravel(),yy.ravel()))).T

    plt.figure(figsize=(15,10))
    plt_ctr = 1

    for C in C_vals:

        # write code here
        clf = 
        

        Z = clf.predict(x_test)
        Z = Z.reshape(xx.shape)

        plt.subplot(4,3,plt_ctr)
        plt.contourf(xx, yy, Z)
        plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors='black')

        plt.xlim([x_min,x_max])
        plt.ylim([y_min,y_max])

        plt.title(kernel + " kernel, C = " + str(C))

        plt_ctr += 1

__Mixed dataset__ Let's try to run an SVM with an `rbf` kernel on the mixed dataset.

In [None]:
visualize_C_boundary("rbf", C_vals, X_mixed, y_mixed)

**Sanity Check:**

You should see a more complex model as you increase C. In the case of SVMswith an `rbf` kernel, you should see more blobs created to section off the the classes (yellows from the violets).

__Outlier dataset__ Let's try to run an SVM with a `poly` kernel on the outlier dataset.

In [None]:
visualize_C_boundary("poly", C_vals, X_outlier, y_outlier)

____

# Hyperparameter tuning our models 

For this section, we will use the __Wisconsin breast cancer dataset__. It has `569 instances` and `30 features`. The features are characteristics measured from images of cell nuclei of aspirated breast masses:

* radius (mean)
* texture (mean)
* perimeter (mean)
* area (mean)
* smoothness (mean)
* compactness (mean)
* concavity (mean)
* concave points (mean)
* symmetry (mean)
* fractal dimension (mean)
* radius (standard error)
* texture (standard error)
* perimeter (standard error)
* area (standard error)
* smoothness (standard error)
* compactness (standard error)
* concavity (standard error)
* concave points (standard error)
* symmetry (standard error)
* fractal dimension (standard error)
* radius (worst)
* texture (worst)
* perimeter (worst)
* area (worst)
* smoothness (worst)
* compactness (worst)
* concavity (worst)
* concave points (worst)
* symmetry (worst)
* fractal dimension (worst)

Our goal is to determine whether the cell nuclei of the aspirated breast mass is __malignant__ or __benign__.

There are `212` malignant, and `357` benign samples. We will be stratifying our dataset when we split our test data to make sure we get the proper representation for our training, validation, and test sets.

You can learn more about the dataset here: https://scikit-learn.org/stable/datasets/toy_dataset.html#breast-cancer-dataset

In [None]:
from sklearn import datasets

In [None]:
data_breast_cancer = datasets.load_breast_cancer()
data_breast_cancer.keys()

The features `X` is stored in `data_breast_cancer.data`, and the labels `y` is stored in `data_breast_cancer.target`. 

Accessing `data_breast_cancer.target_names` will return the mapping of the labels, while `data_breast_cancer.feature_names` will return the names of the features.

In [None]:
data_breast_cancer.target_names

In [None]:
data_breast_cancer.feature_names

## Preparing our dataset
Load our `X` and `y` before splitting our train and test sets

In [None]:
# write code here
X_cancer = None

X_cancer.shape

In [None]:
# write code here
y_cancer = None

y_cancer.shape

Split out train and test sets. The test set should be `20%` of the original data, and we have to make sure that the data is stratified. Set the random state to `42` as well.

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
# write code here
X_train, X_test, y_train, y_test = None

print("X_train shape : ", X_train.shape)
print("y_train shape : ", y_train.shape)
print("X_test shape : ", X_test.shape)
print("y_test shape : ", y_test.shape)

Before we proceed, let's have a quick look at our data

In [None]:
import pandas as pd

temp_df = pd.DataFrame(X_train, columns=data_breast_cancer.feature_names)
temp_df 

In [None]:
temp_df.describe()

From the results above, we can see that the features are scaled differently. We don't want the model to think that one feature is important simply because of the scale of the features, so we will standardize our data before we start modelling.

## Scaling our features
We will first scale our features before modelling. We will use `StandardScaler` to standardize our features. 

Standardizing will make sure that each of our features will each have a $\mu=0$ and a $\sigma^2=1$.

In [None]:
from sklearn.preprocessing import StandardScaler

Create a `StandardScaler` object

In [None]:
# write code here
scaler = None

Get the `scaler` to fit our train data. Then, assign the scaled data to our our newly scaled data.

In [None]:
# write code here

X_train_scaled = None

X_train_scaled

__Note:__ We should only scale our training data and not our test data.

__Question #7:__ Why should we not scale our data before splitting our data into train and test sets?

<!--crumb;qna;Question: Why should we not scale our data before splitting our data into train and test sets?-->

A: 

You could call `scaler.mean_` and `scaler.var_` to get the $\mu$ and $\sigma^2$ that was used by our `scaler`.

In [None]:
scaler.mean_

In [None]:
scaler.var_

Great! Now we can proceed to modelling

## Randomized search of hyperparameters
For this section we will use `RandomizedSearchCV` to tune our hyperparameters. 

In [None]:
from sklearn.model_selection import RandomizedSearchCV

We will also create our base model. Set the model's `max_iter` to `10`.

In [None]:
# write code here
svc = None

Let's define the hyperparameters.

__Hyperparameters__:
- C could be 0.0001, 0.001, 0.01 , 0.1, 1.0, 5, 30, 50
- kernel could be linear, poly, rbf
- degree could be 1, 3, 5, 10, 25, 50
- gamama could be scale, auto, 1000, 10, 5, 2.5, 1.5, 1.0

In [None]:
# write code here
hyperparameters= None

Initialize your `RandomizedSearchCV` with these additional parameters:
- `30` random models
- `5`-fold cross-validation
- also set the random state to `42` so we'll get the same results

Then fit it to training data

In [None]:
# write code here
rssvc = None


Get the best parameters found by `RandomizedSearchCV`

In [None]:
# write code here


__Question #8:__ What are the best parameters?

<!--crumb;qna;Question: What are the best parameters?-->

A: 

You could also get the results of each randomized model using `rssvc.cv_results_`

In [None]:
pd.DataFrame(rssvc.cv_results_)

Get the training performance

In [None]:
# write code here
predictions_train = None

predictions_train

Compute for the training accuracy

In [None]:
# write code here
predictions_train = None

acc = compute_accuracy(predictions_train, y_train)
print("Best model train accuracy:",acc,"%")

## Testing phase

Now that we have our best model, let's see how it performs on our test dataset. Before we start predicting, make sure we scale our test data set. Use the `scaler` to do this.

In [None]:
# write code here
X_test_scaled = None

X_test_scaled

Now, let's get the test predictions and test results!

In [None]:
# write code here
predictions_test = None

acc = compute_accuracy(predictions_test, y_test)
print("Best model test accuracy:",acc,"%")

__Question #9:__ What is the test accuracy of the best model found?

<!--crumb;qna;Question: What is the test accuracy of the best model found?-->

A: 

**Question #10**: Congratulations for making it this far. In your own words (no need to be formal), summarize the biggest learnings, insights, and takeaways that you have for STINTSY. *(You get a point as long as you answer this question)*

A: 

# Summary

* Support vector machines come in different kernels: linear, polynomial, and rbf. There are other kernels that you can also try out or you can also make your own.
* The kernels make feature transformation quickly because of the way SVM structured its optimization
* While we didn't cover it, support vector machines can also handle regression tasks
* SVMs have two regularizers: $\gamma$ which control the standard deviation of our rbf kernels, and `C` which controls the penalization of a sample being placed in the wrong boundary.


## <center>fin</center>


<!-- DO NOT MODIFY OR DELETE THIS -->

<sup>made/compiled by daniel stanley tan & courtney anne ngo 🐰 & thomas james tiam-lee</sup> <br>
<sup>for comments, corrections, suggestions, please email:</sup><sup> danieltan07@gmail.com & courtneyngo@gmail.com & thomasjamestiamlee@gmail.com</sup><br>
<sup>please cc your instructor, too</sup>
<!-- DO NOT MODIFY OR DELETE THIS -->