# POLI 175 - Quiz 07

In this quiz, you will run Supporting Vector Machines to classify vote for Pinochet.

Due date: Mar 17, 2024

Again: The grading for the quiz is:

$$ 0.7 \times \text{TRY} + 0.3 \times \text{CORRECT} $$

The points below refer to the correctness part.

**Note:** This Quiz is optional, and will substitute your lower grade only if the scores in it is higher than in your lower grade.

## Running Dataset

### [Chile Survey](https://en.wikipedia.org/wiki/Chile)

In 1988, the [Chilean Dictator](https://en.wikipedia.org/wiki/Military_dictatorship_of_Chile) [Augusto Pinochet](https://en.wikipedia.org/wiki/Augusto_Pinochet) conducted a [referendum to whether he should step out](https://en.wikipedia.org/wiki/1988_Chilean_presidential_referendum).

The [FLACSO](https://en.wikipedia.org/wiki/Latin_American_Faculty_of_Social_Sciences) in Chile conducted a surver on 2700 respondents. We are going to build a model to predict their voting intentions.

| **Variable** | **Meaning** |
|:---:|---|
| region | A factor with levels:<br>- `C`, Central; <br>- `M`, Metropolitan Santiago area; <br>- `N`, North; <br>- `S`, South; <br>- `SA`, city of Santiago. |
| population | The population size of respondent's community. |
| sex | A factor with levels: <br>- `F`, female; <br>- `M`, male. |
| age | The respondent's age in years. |
| education | A factor with levels: <br>- `P`, Primary; <br>- `S`, Secondary; <br>- `PS`, Post-secondary. |
| income | The respondent's monthly income, in Pesos. |
| statusquo | A scale of support for the status-quo. |
| voteyes | A dummy variable with one<br>meaning a vote in favor of Pinochet |

Let me pre-process the data a bit for you.

In [1]:
## Loading the packages (make sure you have those installed)
using DataFrames
using MLJ, MLJIteration
using MLJModels
import MLJLinearModels, MLJBase
import MultivariateStats, MLJMultivariateStatsInterface
import CSV, Plots, GLM, StatsBase, Random
import LaTeXStrings, StatsPlots, Lowess, Gadfly, RegressionTables
import CovarianceMatrices, Econometrics, LinearAlgebra, MixedModelsExtras
import Missings, StatsAPI, FreqTables, EvalMetrics
import DecisionTree, MLJDecisionTreeInterface
import XGBoost, MLJXGBoostInterface
import LIBSVM, MLJLIBSVMInterface

# Solver (just in case)
solver = MLJLinearModels.NewtonCG()

## Loading the data
chile = CSV.read(
    download("https://raw.githubusercontent.com/umbertomig/POLI175julia/main/data/chilesurvey.csv"), 
    DataFrame,
    missingstring = ["NA"]
); dropmissing!(chile)

## Process target variable
chile.voteyes = ifelse.(chile.vote .== "Y", "Favor", "Against")

## Process statusquo a bit to lower prediction power (making things fun...)
chile.statusquo = ifelse.(chile.statusquo .> 0, 1, 0)

# Pre-process numeri cariables (log them)
chile.income_log = log.(chile.income);
chile.pop_log = log.(chile.population);

select!(chile, Not(:vote, :income, :population));

In [2]:
# Adapted from @xiaodaigh: https://github.com/xiaodaigh/DataConvenience.jl
function onehot!(df::AbstractDataFrame, 
        col, cate = sort(unique(df[!, col])); 
        outnames = Symbol.(col, :_, cate))
    transform!(df, @. col => ByRow(isequal(cate)) .=> outnames)
end

# One-hot encoding (we will learn a better way to do it later)
onehot!(chile, :region);
onehot!(chile, :education);
onehot!(chile, :sex);

# Little bit more
chile.region_M = ifelse.(chile.region_M .== true, 1, 0)
chile.region_N = ifelse.(chile.region_N .== true, 1, 0)
chile.region_S = ifelse.(chile.region_S .== true, 1, 0)
chile.region_SA = ifelse.(chile.region_SA .== true, 1, 0)
chile.sex_F = ifelse.(chile.sex_F .== true, 1, 0)
chile.education_S = ifelse.(chile.education_S .== true, 1, 0)
chile.education_PS = ifelse.(chile.education_PS .== true, 1, 0)

# Drop reference categories
select!(chile, Not(:region, :sex, :education, :region_C, :education_P, :sex_M));

And to facilitate, I will create three feature groups for you. One for each question.

In [3]:
# Full Specification
y, X = unpack(
    chile,
    ==(:voteyes),
    c -> true;
    :voteyes      => Multiclass,
    :income_log   => Continuous,
    :statusquo    => Continuous,
    :pop_log      => Continuous,
    :age          => Continuous,
    :region_M     => Continuous,
    :region_N     => Continuous,
    :region_S     => Continuous,
    :region_SA    => Continuous,
    :sex_F        => Continuous,
    :education_S  => Continuous,
    :education_PS => Continuous,
);

In [4]:
# Train-test split
train, test = partition(
    eachindex(y),   ## Index with the eachindex(.) method
    0.75,           ## Proportion in the training set
    shuffle = true, ## Shuffle the data
    stratify = y,   ## Stratify on the voting variable
    rng = 74593     ## Random seed (ensure same results; not necessary)
);

In [5]:
# Target
FreqTables.freqtable(y)

2-element Named Vector{Int64}
Dim1    │ 
────────┼─────
Against │ 1595
Favor   │  836

In [6]:
# Features
first(X, 3)

Row,age,statusquo,income_log,pop_log,region_M,region_N,region_S,region_SA,education_PS,education_S,sex_F
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,65.0,1.0,10.4631,12.0725,0.0,1.0,0.0,0.0,0.0,0.0,0.0
2,29.0,0.0,8.92266,12.0725,0.0,1.0,0.0,0.0,1.0,0.0,0.0
3,38.0,1.0,9.61581,12.0725,0.0,1.0,0.0,0.0,0.0,0.0,1.0


### Helpers

To save you time, I am instantiating below:

1. Supporting Vector Classifier using a Linear Kernel (`svc_lin_ker`).
1. Supporting Vector Classifier using a Cubic Polynomial Kernel (`svc_poly_ker`).
1. Supporting Vector Classifier using a Sigmoid Kernel (`svc_sigm_ker`).
1. Supporting Vector Classifier using a Radial Basis Kernel (`svc_radial_ker`).

In [7]:
# Supporting Vector Classifier with Linear Kernel
svc_lin_ker = MLJLIBSVMInterface.SVC(kernel = LIBSVM.Kernel.Linear)

SVC(
  kernel = LIBSVM.Kernel.Linear, 
  gamma = 0.0, 
  cost = 1.0, 
  cachesize = 200.0, 
  degree = 3, 
  coef0 = 0.0, 
  tolerance = 0.001, 
  shrinking = true)

In [8]:
# Supporting Vector Classifier with Polynomial Kernel
svc_poly_ker = MLJLIBSVMInterface.SVC(kernel = LIBSVM.Kernel.Polynomial, degree = Int32(3))

SVC(
  kernel = LIBSVM.Kernel.Polynomial, 
  gamma = 0.0, 
  cost = 1.0, 
  cachesize = 200.0, 
  degree = 3, 
  coef0 = 0.0, 
  tolerance = 0.001, 
  shrinking = true)

In [9]:
# Supporting Vector Classifier with Sigmoid Kernel (similar to logistic link, but with hyperbolic tangent function)
svc_sigm_ker = MLJLIBSVMInterface.SVC(kernel = LIBSVM.Kernel.Sigmoid)

SVC(
  kernel = LIBSVM.Kernel.Sigmoid, 
  gamma = 0.0, 
  cost = 1.0, 
  cachesize = 200.0, 
  degree = 3, 
  coef0 = 0.0, 
  tolerance = 0.001, 
  shrinking = true)

In [10]:
# Supporting Vector Classifier with Radial Kernel
svc_radial_ker = MLJLIBSVMInterface.SVC()

SVC(
  kernel = LIBSVM.Kernel.RadialBasis, 
  gamma = 0.0, 
  cost = 1.0, 
  cachesize = 200.0, 
  degree = 3, 
  coef0 = 0.0, 
  tolerance = 0.001, 
  shrinking = true)

## Question 01: Run a Supporting Vector Classifier with a Linear Kernel (2 pts)

1. The model has been instantiated for you, with cost $= 1$. Use it. (0.5 pts)

1. Fit the model in the training set (1 pts)

1. Compute the cross-validated (testing set) `accuracy`, `confusion_matrix`, and `f1score` (0.5 pts)

In [11]:
# Your answers here

In [12]:
mach_q1 = machine(svc_lin_ker, X, y, scitype_check_level = 0);
fit!(mach_q1, rows = train);

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(SVC(kernel = Linear, …), …).



In [13]:
# Predictions
y_pred_mode_q1 = predict(mach_q1, rows = test);

In [14]:
accuracy(y_pred_mode_q1, y[test])

0.7582236842105263

In [15]:
f1score(y_pred_mode_q1, y[test])

[33m[1m└ [22m[39m[90m@ StatisticalMeasures.ConfusionMatrices ~/.julia/packages/StatisticalMeasures/hPDX2/src/confusion_matrices.jl:339[39m


0.7167630057803468

In [16]:
confusion_matrix(y_pred_mode_q1, y[test])

          ┌───────────────┐
          │ Ground Truth  │
┌─────────┼───────┬───────┤
│Predicted│Against│ Favor │
├─────────┼───────┼───────┤
│ Against │  275  │  23   │
├─────────┼───────┼───────┤
│  Favor  │  124  │  186  │
└─────────┴───────┴───────┘


In [17]:
#= Interpretation:

Linear Kernel SVM Classifier does well in classify people in favor of Pinochet.

=#

## Question 02: Fit a Supporting Vector Classifier with Polynomial Kernel (2pts)

1. The model has been instantiated for you, with cost $= 1$. Use it. (0.5 pts)

1. Fit the model in the training set (1 pts)

1. Compute the cross-validated (testing set) `accuracy`, `confusion_matrix`, and `f1score` (0.5 pts)

In [18]:
# Your answers here

In [19]:
mach_q2 = machine(svc_poly_ker, X, y, scitype_check_level = 0);
fit!(mach_q2, rows = train);

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(SVC(kernel = Polynomial, …), …).


In [20]:
# Predictions
y_pred_mode_q2 = predict(mach_q2, rows = test);

In [21]:
accuracy(y_pred_mode_q2, y[test])

0.7023026315789473

In [22]:
f1score(y_pred_mode_q2, y[test])

[33m[1m└ [22m[39m[90m@ StatisticalMeasures.ConfusionMatrices ~/.julia/packages/StatisticalMeasures/hPDX2/src/confusion_matrices.jl:339[39m


0.39464882943143814

In [23]:
confusion_matrix(y_pred_mode_q2, y[test])

          ┌───────────────┐
          │ Ground Truth  │
┌─────────┼───────┬───────┤
│Predicted│Against│ Favor │
├─────────┼───────┼───────┤
│ Against │  368  │  150  │
├─────────┼───────┼───────┤
│  Favor  │  31   │  59   │
└─────────┴───────┴───────┘


In [24]:
#= Interpretation:

Cubic Kernel SV Classifier does not do that well.

=#

## Question 03: Fit a Supporting Vector Classifier with Sigmoid Kernel (2pts)

1. The model has been instantiated for you, with cost $= 1$. Use it. (0.5 pts)

1. Fit the model in the training set (1 pts)

1. Compute the cross-validated (testing set) `accuracy`, `confusion_matrix`, and `f1score` (0.5 pts)

In [25]:
# Your answers here

In [26]:
mach_q3 = machine(svc_sigm_ker, X, y, scitype_check_level = 0);
fit!(mach_q3, rows = train);

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(SVC(kernel = Sigmoid, …), …).


In [27]:
# Predictions
y_pred_mode_q3 = predict(mach_q3, rows = test);

In [28]:
accuracy(y_pred_mode_q3, y[test])

0.555921052631579

In [29]:
f1score(y_pred_mode_q3, y[test])

[33m[1m└ [22m[39m[90m@ StatisticalMeasures.ConfusionMatrices ~/.julia/packages/StatisticalMeasures/hPDX2/src/confusion_matrices.jl:339[39m


0.36018957345971564

In [30]:
confusion_matrix(y_pred_mode_q3, y[test])

          ┌───────────────┐
          │ Ground Truth  │
┌─────────┼───────┬───────┤
│Predicted│Against│ Favor │
├─────────┼───────┼───────┤
│ Against │  262  │  133  │
├─────────┼───────┼───────┤
│  Favor  │  137  │  76   │
└─────────┴───────┴───────┘


In [31]:
#= Interpretation:

Sigmoid Kernel SVC also performs poorly.

=#

## Question 04: Fit a Supporting Vector Classifier with Radial Basis Kernel (2pts)

1. The model has been instantiated for you, with cost $= 1$. Use it. (0.5 pts)

1. Fit the model in the training set (1 pts)

1. Compute the cross-validated (testing set) `accuracy`, `confusion_matrix`, and `f1score` (0.5 pts)

**Interesting:** This is usually the default kernel choice for SVCs.

In [32]:
# Your answers here

In [33]:
mach_q4 = machine(svc_radial_ker, X, y, scitype_check_level = 0);
fit!(mach_q4, rows = train);

[36m[1m[ [22m[39m[36m[1mInfo: [22m[39mTraining machine(SVC(kernel = RadialBasis, …), …).


In [34]:
# Predictions
y_pred_mode_q4 = predict(mach_q4, rows = test);

In [35]:
accuracy(y_pred_mode_q4, y[test])

0.65625

In [36]:
f1score(y_pred_mode_q4, y[test])

[33m[1m└ [22m[39m[90m@ StatisticalMeasures.ConfusionMatrices ~/.julia/packages/StatisticalMeasures/hPDX2/src/confusion_matrices.jl:339[39m


0.0

In [37]:
confusion_matrix(y_pred_mode_q4, y[test])

          ┌───────────────┐
          │ Ground Truth  │
┌─────────┼───────┬───────┤
│Predicted│Against│ Favor │
├─────────┼───────┼───────┤
│ Against │  399  │  209  │
├─────────┼───────┼───────┤
│  Favor  │   0   │   0   │
└─────────┴───────┴───────┘


In [38]:
#= Interpretation:

Radial Basis Kernel SVC does even worse.

=#

## Question 05: If you were to choose among the four models above, which one would you choose? Explain. (2 pts)

In [39]:
# Your answers here

In [40]:
#= Best model:

(best)  Linear SVC
        tie between Polynomial SVC and Sigmoid SVC
(worst) Radial Basis SVC

=#

**Great work!**