## Notes on distance measures and basis functions
We need to just review a few notes on distances measures, that we will need during the next few slides.

The Euclidian distance in D dimensions is

$$d( {\bf x_i},{\bf x_k} ) \equiv \sqrt{\left(x^{1}_{i} - x^{1}_{k}\right)^2 + \left(x^{2}_{i} - x^{2}_{k}\right)^2 + \ldots + \left(x^{D}_{i} - x^{D}_{k}\right)^2} = \sqrt{\sum_{j=1}^D \left( x^{j}_{i} - x^{j}_{k} \right)^2 }$$

The cosine similarity measure in D dimensions is

$$s({\bf x_i},{\bf x_k}) \equiv cos(\angle ({\bf x_i},{\bf x_k})) = \frac{\sum_{j=1}^{D} x_{i}^{j} x_{k}^{j}}{\sqrt{   \sum_{j=1}^{D} \left( x_{i}^{j} \right)^2} \sqrt{\sum_{j=1}^{D} \left( x_{k}^{j} \right)^2 }} = \ldots ??$$

The standard Radial Basis Function (RBF) is

$$ k({\bf x},{\bf x'}) \equiv \textrm{exp} \left( - \frac{\| {\bf x} -{\bf x'} \|^2}{2\sigma^2} \right) $$

# Where Are We At....?
![MLPaths](images/MLPaths.png)

# Support Vector Machines

## Linear SVM Classification -- 'Hard Margin'

### Example of Determining Decision Boundary
The simplest two-class example we can come up with... is easily separable.
Fit function $ g(\vec x) = \vec w \vec x + w_0 $ to the points, where $g(1,1) = -1$ and $g(2,3) = 1$.
Weight vector, for simple case, we know, is $(2,3) - (1,1) = \vec w = a(1,2) = (a,2a)$.
Two equations in two unknowns:
$a + 2a + w_0 = -1$, and
$2a + 6a + w_0 = +1$, yielding $a=\frac{2}{5}$ and $w_0 = -\frac{11}{5}$.
Finally, our decision boundary is given by: $$g(\vec x) = \frac{2}{5}x_1 + \frac{4}{5}x_2 - \frac{11}{5}.$$
Our **support vector(s)** is $\vec w = \left(\frac{2}{5},\frac{4}{5}\right)$. We only have one (why?). And it has only two components (why?).

![LinearSVM](images/LinearSVM.png)

### Let's Go To "Hands-On" Code Repo Resources, Chapter 5 
https://github.com/datsoftlyngby/soft2019spring-ai/blob/master/Week%2009/resources/LlinearSVM.ipynb

## Linear SVM Classification -- 'Soft Margin'

**What happened here?** Now there is "noise" in training data. We have two choices:
  1. Find a new decision boundary that respects the new data point.
    - not the right one we argue, but can you tell us why??
  2. Make our current procedure "sloppy".
    - better choice, when considering just this single rogue data point.
![LinearRgularizedSVM](images/LinearRegularizedSVM.png)

We need to deal with the stray data point cases, so that we do not overfit and lose valuable classification generalizing power.

We introduce a hyperparameter to soften the blow.


### The "Soft Margin" Classifier -- One (only so far....) Hyperparameter (regularization)
From 'Hard Margin' case we have:
1. ${\bf w} {\bf x}_i - b \geq 1$, if $y_i = + 1$ and 
2. ${\bf w} {\bf x}_i - b \geq 1$, if $y_i = - 1$.


Also; minimize $\|{\bf w}\|$ (or eq. $\frac{1}{2}\|{\bf w}\|^2$), so $y_i({\bf w} {\bf x}_i - b) - 1 \geq 0$ to get maximum separation to support vectors.

**This does not work for "noisy" data**. We need to do something.


## The Regularization Parameter, _C_
Introduce the 'Hinge' function to the loss, $\textrm{max}(0, 1 - y_i({w}{x}_i-b)$, which is zero if 1 and 2 above is OK. But if point on wrong side of decision boundary, 'hinge' is $\propto$ distance from D.B.

The total cost (to minimize) becomes
$$C \|{\bf w}\|^2 + \frac{1}{N}\sum_{i=1}^{N} \textrm{max}(0, 1 - y_i({w}{x}_i-b).$$

### Let's Go To "Hands-On" Code Repo Resources, Chapter 5 
Here we briefly explore the idea of the soft margin (or regularization hyperpxarameter).
Many other types of hyperparameters come into play later, as we dive into non-linear SVMs, but for now let's stay low:
https://github.com/datsoftlyngby/soft2019spring-ai/blob/master/Week%2009/resources/SoftMargin.ipynb

## Non-Linear SVM Classification, the 'kernel trick', hyper-parameters

## Data are not linearly separable in feature space
When this happens we need to create different, often complex, decision boundaries (hypersurfaces), other than a simple line (in 2D feature space). 

The process of introducing a new dimension (or more) leads to separability:
![NonlinearData1](images/nonlinear/Slide1.png)

## Data are not linearly separable in feature space
When this happens we need to create different, often complex, decision boundaries (hypersurfaces), other than a simple line (in 2D feature space). 

The process of introducing a new dimension (or more) leads to separability:
![NonlinearData3](images/nonlinear/Slide3.png)

## Data are not linearly separable in feature space
When this happens we need to create different, often complex, decision boundaries (hypersurfaces), other than a simple line (in 2D feature space). 

The process of introducing a new dimension (or more) leads to separability:
![NonlinearData2](images/nonlinear/Slide2.png)

### Let's Go To "Hands-On" Code Repo Resources, Chapter 5 
Demonstration of non-linearly separable data, the kernel trick, and a few hyperparameters.
https://github.com/datsoftlyngby/soft2019spring-ai/blob/master/Week%2009/resources/NonLinearSVM.ipynb

## Summing up on SVMs
There are many, many more aspects of SVMs that we won't have time to cover. But take a look -- if you "hungry" -- at the Skikit-Learn tutorials' libraries. Here you'll find a nicely segmented library, specifically for SVMs, here: 
https://scikit-learn.org/stable/auto_examples/index.html#support-vector-machines


# K Nearest Neighbour

## KNN as a Classifier
Let's check the KNN on a set of points, and talk about things _qualitatively_.

![KNNTest](images/knn/KNN_test_data.png)

Scikit-Learn has some great demonstrations of the KNN algorithm. Let's go there for a hand-on tutorial. They look again at the Iris flower classification data, so we're familiar with the data content at least: https://scikit-learn.org/stable/auto_examples/neighbors/plot_classification.html#sphx-glr-auto-examples-neighbors-plot-classification-py