In [1]:
%matplotlib inline

In [48]:
import numpy as np
import pandas as pd
import matplotlib_inline as plt

from sklearn.datasets import load_iris, make_circles

from sklearn.preprocessing import PolynomialFeatures

from sklearn.model_selection import train_test_split

# from sklearn.linear_model import LogisticRegression
from sklearn.svm import LinearSVC, LinearSVR, SVC, OneClassSVM
from sklearn.neighbors import KNeighborsClassifier

# Support Vector Machines

## Misc notes

Look up:  


Late fusion - https://medium.com/@raj.pulapakura/multimodal-models-and-fusion-a-complete-guide-225ca91f6861

Early fusion

Gradient boosting vs. Adaptive boosting

Label propagation

Data drift

Observed minus estimated

Data partitioning / Data shattering

Convex hull

Squared hinge loss function

Elastic Net Regression - combines the features of both Lasso (L1) and Ridge (L2) Regression  

https://medium.com/@shruti.dhumne/elastic-net-regression-detailed-guide-99dce30b8e6e

OVR - One vs. rest  

Crammer-Singer method for LinearSVM - https://github.com/scikit-learn/scikit-learn/issues/13556  
- similar to Softmax in Logistic Regression - https://medium.com/@tpreethi/softmax-regression-93808c02e6ac

### **SVR explained:**  

SVR tries to find a function that fits the data points as closely as possible, but within a certain tolerance. The SVR approach introduces a margin of tolerance 
$\epsilon$ around the regression line. The idea is to ignore errors that are smaller than 
$\epsilon$ and only penalize errors that are larger than this margin.

#### Logic of Linear SVR:
- **Goal:** Fit a linear function 
$f(x) = w^T x + b$ to the data while allowing some flexibility (errors) within the margin $\epsilon$.
- **Support Vectors:** In SVR, the support vectors are the data points that fall outside the $\epsilon$-margin of the regression line. These points are used to determine the final regression line.
- **Loss Function:** SVR uses a special type of loss function called **epsilon-insensitive loss**. This loss function doesn't penalize errors as long as they are within the margin $\epsilon$. For errors outside the margin, the penalty grows linearly with the error size.
- **Regression:** The final model attempts to minimize the error while keeping most of the data within the $\epsilon$-margin. The choice of $\epsilon$ controls how sensitive the model is to small errors.

In short, SVR fits the best possible line (or hyperplane) that keeps most of the data within the $\epsilon$-distance from the line, treating errors within that margin as acceptable and only focusing on larger deviations. The support vectors in SVR are the data points outside this margin that influence the regression line.


**Cover's theorem** - https://en.wikipedia.org/wiki/Cover%27s_theorem  

Cover's theorem states that a complex, non-linearly separable problem in a low-dimensional space is more likely to become linearly separable when mapped to a higher-dimensional space using a nonlinear transformation.

Radial Basis Function kernel 

Instance based learning

Voronoi tessellation

Anomaly detection - https://medium.com/@venujkvenk/anomaly-detection-techniques-a-comprehensive-guide-with-supervised-and-unsupervised-learning-67671cdc9680

Novely detection (and outliers) - https://scikit-learn.org/1.5/modules/outlier_detection.html

## Demo

In [4]:
iris_dataset = load_iris()

In [5]:
attributes, targets = iris_dataset["data"], iris_dataset["target"]

In [25]:
poly_attributes = PolynomialFeatures(degree = 3).fit_transform(attributes)

In [19]:
simple_svm = LinearSVC(C = 1e-3)

In [20]:
simple_svm.fit(attributes, targets)

In [21]:
simple_svm.coef_

array([[ 0.01439037,  0.10003511, -0.1860645 , -0.08267264],
       [-0.03772127, -0.07470283,  0.03918079,  0.00299687],
       [-0.07744765, -0.09388869,  0.11991649,  0.07658286]])

In [22]:
simple_svm.predict(attributes)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [23]:
simple_svm.score(attributes, targets)

0.6666666666666666

In [26]:
poly_svm = LinearSVC(C = 1e-3)

In [27]:
poly_svm.fit(poly_attributes, targets)



In [28]:
poly_svm.score(poly_attributes, targets)

0.98

In [33]:
circles_attributes, circles_targets = make_circles(noise = 0.1, factor = 0.5)

In [32]:
circles_svm_with_kernel = SVC(kernel = "poly", degree = 2)

In [35]:
circles_svm_with_kernel.fit(circles_attributes, circles_targets)

In [36]:
circles_svm_with_kernel.score(circles_attributes, circles_targets)

0.98

In [40]:
pulsars = pd.read_csv("data/pulsar_stars.csv")

In [41]:
pulsars

Unnamed: 0,Mean of the integrated profile,Standard deviation of the integrated profile,Excess kurtosis of the integrated profile,Skewness of the integrated profile,Mean of the DM-SNR curve,Standard deviation of the DM-SNR curve,Excess kurtosis of the DM-SNR curve,Skewness of the DM-SNR curve,target_class
0,140.562500,55.683782,-0.234571,-0.699648,3.199833,19.110426,7.975532,74.242225,0
1,102.507812,58.882430,0.465318,-0.515088,1.677258,14.860146,10.576487,127.393580,0
2,103.015625,39.341649,0.323328,1.051164,3.121237,21.744669,7.735822,63.171909,0
3,136.750000,57.178449,-0.068415,-0.636238,3.642977,20.959280,6.896499,53.593661,0
4,88.726562,40.672225,0.600866,1.123492,1.178930,11.468720,14.269573,252.567306,0
...,...,...,...,...,...,...,...,...,...
17893,136.429688,59.847421,-0.187846,-0.738123,1.296823,12.166062,15.450260,285.931022,0
17894,122.554688,49.485605,0.127978,0.323061,16.409699,44.626893,2.945244,8.297092,0
17895,119.335938,59.935939,0.159363,-0.743025,21.430602,58.872000,2.499517,4.595173,0
17896,114.507812,53.902400,0.201161,-0.024789,1.946488,13.381731,10.007967,134.238910,0


In [42]:
pulsars.target_class.value_counts(normalize = True)

target_class
0    0.908426
1    0.091574
Name: proportion, dtype: float64

In [43]:
pulsar_attributes_train, pulsar_attributes_test, pulsar_targets_train, pulsar_targets_test = train_test_split(
    pulsars.drop(columns = "target_class"),
    pulsars.target_class,
    test_size = 0.25,
    stratify = pulsars.target_class
)

In [44]:
svm = SVC(kernel = "rbf", gamma = 2)

In [45]:
svm.fit(pulsar_attributes_train, pulsar_targets_train)

In [46]:
svm.score(pulsar_attributes_train, pulsar_targets_train)

0.9999255010057364

In [47]:
svm.score(pulsar_attributes_test, pulsar_targets_test)

0.9083798882681564

In [49]:
detector = OneClassSVM()

In [50]:
detector.fit(pulsar_attributes_train)

In [51]:
detector.predict(pulsar_attributes_train)

array([ 1, -1,  1, ..., -1, -1,  1])