# POLI 175 - Lecture 17

## Support Vector Machines

# Support Vector Machines II

## Support Vector Machines

Today let's talk about how to estimate SVMs to data using Julia.

Let's get started:

## Support Vector Machines

In [None]:
## Loading the packages (make sure you have those installed)
using DataFrames
using MLJ, MLJIteration
using MLJModels
import MLJLinearModels, MLJBase
import MultivariateStats, MLJMultivariateStatsInterface
import CSV, Plots, GLM, StatsBase, Random
import LaTeXStrings, StatsPlots, Lowess, Gadfly, RegressionTables
import CovarianceMatrices, Econometrics, LinearAlgebra, MixedModelsExtras
import Missings, StatsAPI, FreqTables, EvalMetrics
import DecisionTree, MLJDecisionTreeInterface
import XGBoost, MLJXGBoostInterface
import LIBSVM, MLJLIBSVMInterface

# Solver (just in case)
solver = MLJLinearModels.NewtonCG()

## Loading the data
chile = CSV.read(
    download("https://raw.githubusercontent.com/umbertomig/POLI175julia/main/data/chilesurvey.csv"), 
    DataFrame,
    missingstring = ["NA"]
); dropmissing!(chile)

## Process target variable
chile.voteyes = ifelse.(chile.vote .== "Y", "Favor", "Against")

## Process statusquo a bit to lower prediction power (making things fun...)
chile.statusquo = ifelse.(chile.statusquo .> 0, 1, 0)

# Pre-process numeri cariables (log them)
chile.income_log = log.(chile.income);
chile.pop_log = log.(chile.population);

select!(chile, Not(:vote, :income, :population));

## Support Vector Machines

- If you recall from the previous class:
    1. If we have perfect separation, we can use the Maximal Margin Classifier
    2. Since this is usually not the case, we can use the Support Vector Classifiers (allows for a bit of misclassification to happen)
    3. Since this only produces a linear decision boundary, we expand by changing the Euclidean inner product to a kernel estimator shaped to improve the classification.
    
- On our end, we will fit the second and the third cases since the first one needs a very particular structure: a separable classification problem.

## Support Vector Machines

Let us use SVM to classify the vote for Pinochet.

In [None]:
# Adapted from @xiaodaigh: https://github.com/xiaodaigh/DataConvenience.jl
function onehot!(df::AbstractDataFrame, 
        col, cate = sort(unique(df[!, col])); 
        outnames = Symbol.(col, :_, cate))
    transform!(df, @. col => ByRow(isequal(cate)) .=> outnames)
end

# One-hot encoding (we will learn a better way to do it later)
onehot!(chile, :region);
onehot!(chile, :education);
onehot!(chile, :sex);

# Little bit more
chile.region_M = ifelse.(chile.region_M .== true, 1, 0)
chile.region_N = ifelse.(chile.region_N .== true, 1, 0)
chile.region_S = ifelse.(chile.region_S .== true, 1, 0)
chile.region_SA = ifelse.(chile.region_SA .== true, 1, 0)
chile.sex_F = ifelse.(chile.sex_F .== true, 1, 0)
chile.education_S = ifelse.(chile.education_S .== true, 1, 0)
chile.education_PS = ifelse.(chile.education_PS .== true, 1, 0)

# Drop reference categories
select!(chile, Not(:region, :sex, :education, :region_C, :education_P, :sex_M));

## Support Vector Machines

In [None]:
# Unpacking data
y, X = unpack(
    chile,
    ==(:voteyes),
    c -> true;
    :voteyes      => Multiclass,
    :income_log   => Continuous,
    :statusquo    => Continuous,
    :pop_log      => Continuous,
    :age          => Continuous,
    :region_M     => Continuous,
    :region_N     => Continuous,
    :region_S     => Continuous,
    :region_SA    => Continuous,
    :sex_F        => Continuous,
    :education_S  => Continuous,
    :education_PS => Continuous,
);

## Support Vector Machines

In [None]:
# Train-test split
train, test = partition(
    eachindex(y),   ## Index with the eachindex(.) method
    0.80,           ## Proportion in the training set
    shuffle = true, ## Shuffle the data
    stratify = y,   ## Stratify on the voting variable
    rng = 74593     ## Random seed (ensure same results; not necessary)
);

In [None]:
# Target
FreqTables.freqtable(y)

## Support Vector Machines

In [None]:
# Features
first(X, 3)

## Support Vector Machines

In [None]:
## Instantiate the model
# Supporting Vector Classifier with Linear Kernel
svc_lin_ker = MLJLIBSVMInterface.SVC(kernel = LIBSVM.Kernel.Linear)

In [None]:
## Build and fit our machine
mach = machine(svc_lin_ker, X, y, scitype_check_level = 0);
fit!(mach, rows = train);
y_pred_mode = predict(mach, rows = test);

## Support Vector Machines

In [None]:
accuracy(y_pred_mode, y[test])

In [None]:
f1score(y_pred_mode, y[test])

In [None]:
confusion_matrix(y_pred_mode, y[test])

## Support Vector Machines

**Your turn**: Fit a polynomial kernel SVM. I will instantiate it below.

Is it better? Explain.

In [None]:
# Supporting Vector Classifier with Polynomial Kernel
svc_poly_ker = MLJLIBSVMInterface.SVC(kernel = LIBSVM.Kernel.Polynomial, degree = Int32(3))

In [None]:
## Your answers here

## Support Vector Machines

How do we optimize these results?

There are many ways to improve here. 

Let us learn how to check the default parameters.

In [None]:
# Supporting Vector Classifier with Sigmoid Kernel (similar to logistic link, but with hyperbolic tangent function)
svc_sigm_ker = MLJLIBSVMInterface.SVC(kernel = LIBSVM.Kernel.Sigmoid)

## Support Vector Machines

We usually focus on the cost parameter.

Here, we will use *Search* to find the best cost parameter.

In [None]:
# Costs (in log scale!)
cost_tune = range(svc_sigm_ker, :cost, lower=1, upper=10, scale = :log);

In [None]:
# Self Tuning Model
self_tuning_sigmsvc = TunedModel(
    model = svc_sigm_ker,
    resampling = Holdout(fraction_train=0.75, shuffle = true, rng = 987123),
    tuning = Grid(resolution = 20),
    range = [cost_tune],
    measure = accuracy
)

## Support Vector Machines

In [None]:
# Fitting the models
mach = machine(self_tuning_sigmsvc, X, y, scitype_check_level = 0);
MLJ.fit!(mach, rows = train);
y_pred_mode = predict(mach, rows = test);

## Support Vector Machines

In [None]:
# Report
report(mach)

## Support Vector Machines

In [None]:
# Best Model Specs
fitted_params(mach).best_model

## Support Vector Machines

In [None]:
accuracy(y_pred_mode, y[test])

In [None]:
f1score(y_pred_mode, y[test])

In [None]:
confusion_matrix(y_pred_mode, y[test])

## Support Vector Machines

**Your turn**: Fit a Radial Kernel SVM. I instantiated it below. Search for the best cost parameter. Is it better? Explain.

In [None]:
# Supporting Vector Classifier with Radial Kernel
svc_radial_ker = MLJLIBSVMInterface.SVC()

In [None]:
## Your answers here

## Support Vector Machines

This is pretty much it.

There is something called **Support Vector Regression**, if you want to see the extension to this method to regression.

Suggestion: Try this technique with the civil conflict dataset.

# Questions?

# See you next class
