# L5c: Support Vector Machine (SVM) Classification
In this lecture, we introduce our last (for now) classification approach, namely [support vector machines (SVMs)](https://en.wikipedia.org/wiki/Support_vector_machine). Support vector machines are a _supervised_ learning approach to learn the best possible separating hyperplane. The key ideas of this lecture are:

* A __support vector machine__ is a _supervised_ machine learning algorithm that finds an optimal (linear) hyperplane in an $N$-dimensional space to classify (binary) data points distinctly, maximizing the _margin_ between different classes. The _margin_ in a support vector machine is defined as the distance from the separating hyperplane to the closest data points of either class.
* A __hard margin support vector machine__ is a binary linear classifier that finds the optimal hyperplane to separate two classes of data points with the maximum possible _margin_, allowing no misclassifications and requiring the data to be linearly separable.
* A __soft margin support vector machine__ is a variant of the SVM algorithm that allows for some misclassification of training data points, enabling it to handle non-linearly separable datasets and reduce overfitting by finding a balance between maximizing the decision boundary margin and minimizing classification errors.

Lecture notes for today can be found: [here!](https://github.com/varnerlab/CHEME-5820-Lectures-Spring-2025/blob/main/lectures/week-5/L5c/docs/Notes.pdf)

## Setup, Data, and Prerequisites
We set up the computational environment by including the `Include.jl` file, loading any needed resources, such as sample datasets, and setting up any required constants. The `Include.jl` file loads external packages, various functions that we will use in the exercise, and custom types to model the components of our problem.

In [3]:
include("Include.jl")

generatedatacloud

### Data
This lecture will look at a [banknote authentication dataset](https://archive.ics.uci.edu/dataset/267/banknote+authentication) for classification tasks. We'll load the banknote dataset and split it into `training` and `test` data subsets (randomly).
* __Training data__: Training datasets are collections of labeled data used to teach machine learning models, allowing these tools to learn patterns and relationships within the data.
* __Test data__: Test datasets, on the other hand, are separate sets of labeled data used to evaluate the performance of trained models on unseen examples, providing an unbiased assessment of the _model's generalization capabilities_.

#### Banknote Authentication Dataset
The second dataset we will explore is the [banknote authentication dataset from the UCI archive](https://archive.ics.uci.edu/dataset/267/banknote+authentication). This dataset has `1372` instances of 4 continuous features and an integer $\{-1,1\}$ class variable. 
* __Description__: Data were extracted from images taken from genuine and forged banknote-like specimens.  An industrial camera, usually used for print inspection, was used for digitization. The final images have 400x 400 pixels. Due to the object lens and distance to the investigated object, gray-scale pictures with a resolution of about 660 dpi were gained. Wavelet Transform tools were used to extract features from images.
* __Features__: The data has four continuous features from each image: `variance` of the wavelet transformed image, `skewness` of the wavelet transformed image, `kurtosis` of the wavelet transformed image, and the `entropy` of the wavelet transformed image. The class is $\{-1,1\}$ where a class value of `-1` indicates genuine, `1` forged.

In [6]:
df_banknote = CSV.read(joinpath(_PATH_TO_DATA, "data-banknote-authentication.csv"), DataFrame)

Row,variance,skewness,curtosis,entropy,class
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Int64
1,3.6216,8.6661,-2.8073,-0.44699,-1
2,4.5459,8.1674,-2.4586,-1.4621,-1
3,3.866,-2.6383,1.9242,0.10645,-1
4,3.4566,9.5228,-4.0112,-3.5944,-1
5,0.32924,-4.4552,4.5718,-0.9888,-1
6,4.3684,9.6718,-3.9606,-3.1625,-1
7,3.5912,3.0129,0.72888,0.56421,-1
8,2.0922,-6.81,8.4636,-0.60216,-1
9,3.2032,5.7588,-0.75345,-0.61251,-1
10,1.5356,9.1772,-2.2718,-0.73535,-1


In [7]:
D_banknote = Matrix(df_banknote); # get the data as a Matrix (alias for Array{Float64,2})
number_of_training_examples_banknote = 1000; # how many training points for the banknote dataset?

In [8]:
banknote_training, banknote_test = let

    number_of_features = size(D_banknote,2); # number of cols of housing data
    number_of_examples = size(D_banknote,1); # number of rows of housing data
    full_index_set = range(1,stop=number_of_examples,step=1) |> collect |> Set;
    
    # build index sets for training and testing
    training_index_set = Set{Int64}();
    should_stop_loop = false;
    while (should_stop_loop == false)
        i = rand(1:number_of_examples);
        push!(training_index_set,i);

        if (length(training_index_set) == number_of_training_examples_banknote)
            should_stop_loop = true;
        end
    end
    test_index_set = setdiff(full_index_set,training_index_set);

    # build the test and train datasets -
    banknote_training = D_banknote[training_index_set |> collect,:];
    banknote_test = D_banknote[test_index_set |> collect,:];

    # return
    banknote_training,banknote_test
end;

## Theory: Support Vector Machine (SVM)
Fill me in

## Banknote Classification Problem using a SVM
Fill me in

In [11]:
model = let

    # Setup the data that we are using
    D = banknote_training; # what dataset are we looking at?
    X = D[:,1:end-1] |> transpose |> Matrix; # features (arranged as m x n)
    y = D[:,end]; # label

    # Train the data -
    model = svmtrain(X, y); # we are using the LIBSVM

    # return
    model
end;

__Inference__: Now that we have parameters estimated from the `training` data, we can use those parameters on the `test` dataset to see how well the model can differentiate between an actual banknote and a forgery on data it has never seen. We run the classification operation on the (unseen) test data [using the `classify(...)` method](src/Compute.jl). This method takes a feature array `X` and the (trained) model instance. It returns the estimated labels. 
* We store the actual (correct) label in the `y_banknote_perceptron::Array{Int64,1}` vector, while the model predicted label is stored in the `ŷ_banknote_perceptron::Array{Int64,1}` array.

In [13]:
ŷ,y,d = let

     # Setup the data that we are using
    D = banknote_test; # what dataset are we looking at?
    X = D[:,1:end-1] |> transpose |> Matrix; # features (arranged as m x n)
    y = D[:,end]; # label
    
    # Test model on the other half of the data.
    ŷ, decision_values = svmpredict(model, X);

    # return -
    ŷ,y,decision_values
end;

In [27]:
d

2×372 Matrix{Float64}:
 1.04159  1.14707  -1.09286  1.08968  …  -0.932714  -1.05356  1.21496
 0.0      0.0       0.0      0.0          0.0        0.0      0.0

### Confusion Matrix
The confusion matrix is a $2\times{2}$ matrix that contains four entries: true positive (TP), false positive (FP), true negative (TN), and false negative (FN). [Click me for a confusion matrix schematic!](https://github.com/varnerlab/CHEME-5820-Labs-Spring-2025/blob/main/labs/week-3/L3b/figs/Fig-BinaryConfusionMatrix.pdf). Let's compute these four values [using the `confusion(...)` method](src/Compute.jl) and store them in the `CM_perceptron::Array{Int64,2}` variable:

In [16]:
CM = confusion(y, ŷ) # call with the SVM percepton values

2×2 Matrix{Int64}:
 172    0
   0  200