# Support Vector Machine Algorithm (SVM)

## Model Summary

Support Vector Machine is a supervised machine learning algorithm used for both classification and regression analysis. SVM allows for multi-class, linear, and non linear classification.

A couple key terms:
 - Hyperplane: a substance (plane) of one dimension less than its ambient space. In a two-dimensional space, this would be a 'line' that separates the two classes.
 - Support vectors: observations nearest to the hyperplane.
 - Margin: the distance between the support vectors and the hyperplane.
 
Succinctly stated, SVM seeks to construct a hyperplane that maximizes the margin between the support vectors. The larger the margin, the lower the generalization error (AKA the out-of-sample-error).

![Title](extras/SVM.png)

## Key Parameters

 - Kernel: the kernel parameter allow us to adjust the linearity of our hyperplane. The default value is "rbf," which is also used to perform the [kernel trick](https://github.com/inside-track/analytics/blob/master/Cheat_Sheets/The%20Kernel%20Trick.ipynb). 
 - Gamma: this parameter is used to tune the _complexity_ of the kernels used in non-linear classification. The higher the gamma, the more isolated the classifcation areas will be.
 - C: the standard SVM first seeks a hyperplane that correctly classifies the observations _before_ maximizing the margin. This is known as the 'hard margin,' and it can result in poor models when there are outliers. The C parameter allows us to have a 'soft margin,' thereby ignoring the outliers in favor of a higher margin. A higher C means the cost of mislassification is high (resulting in low bias and high variance). A low C means the cost of misclassification is low (resulting in high bias and low variance).

## Pros and Cons

Pros:
 - Work well in high-dimensional spaces.
 - Memory efficient, since it relies only on the support vectors in the decision function.
 - Versatile (works with non-linear functions using Kernels).

Cons:
 - Predicting probabilities is very expensive as it requires using five-fold cross-validation.
 - Doesn't work well if the number of features is much greater than the number of samples.

## Example

In this example, we will use the Bernoulli classification model to predict the Gender of an ASU PSC applicant using the Military and Ethnicity fields.

In [6]:
import os
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix
os.chdir('C:\\Users\\zlatan.kremonic\\Dropbox\\Documents\\Analytics\\InsideTrack\\Analytics\\Cheat_Sheets\\data')
import sys

# set the python path
sys.path.append('C:\\Users\\zlatan.kremonic\\Dropbox\\Documents\\Analytics\\InsideTrack\\Analytics')
from oa import stats

In [7]:
# load the datafile
asu_study = pd.read_csv('asu_data_NB.csv')

# convert all columns to dummy variables
asu_study = stats.dummy_vars(asu_study)

# create our train and test data sets
train, vald, test = stats.train_val_test(asu_study,.7)

X_train = train.ix[:, 2:]
Y_train = train.ix[:, 1]

X_test = test.ix[:, 2:]
Y_test = test.ix[:, 1]

# run the BernoulliNB classifer using the training data set
clf = SVC()
clf.fit(X_train, Y_train)

# generate predicted values and model evaluation using our training data set
Y_pred = clf.predict(X_train)

print 'Training data model evaluation\n'
confusion_matrix(Y_train, Y_pred)
stats.confusion_matrix_summary(Y_train, Y_pred)

# generate predicted values and model evaluation using our test data set
Y_pred = clf.predict(X_test)

print '\nTest data model evaluation\n'
confusion_matrix(Y_test, Y_pred)
stats.confusion_matrix_summary(Y_test, Y_pred)

Training data model evaluation

[205, 98, 2257, 924]
Accuracy: 0.706659012629
Precision: 0.676567656766
Recall: 0.181576616475
Mean Actual: 0.324052812859

Test data model evaluation

[78, 43, 935, 386]
Accuracy: 0.702496532594
Precision: 0.644628099174
Recall: 0.168103448276
Mean Actual: 0.321775312067


## Additional Resources

 - http://scikit-learn.org/stable/modules/svm.html
 - http://www.analyticsvidhya.com/blog/2015/10/understaing-support-vector-machine-example-code/
 - https://en.wikipedia.org/wiki/Support_vector_machine
 - https://www.quora.com/What-are-C-and-gamma-with-regards-to-a-support-vector-machine

## Further Improvements

 - Create a more comprehensive example with a better dataset (use visuals, show the effects of each of the parameters).
