# OVERVIEW

![title](images/overview.png)

# DETAILED COMPARISONS

# NEAREST NEIGHBOURS

In kNN the decision boundary is the dividing line or surface that separates the feature space into regions classified as one class versus another based on the proximity of a query point to the training points. The boundary can be highly non-linear and jagged, especially for low values of k, because each region is determined by the immediate nearest training points. Every data point essentially influences the shape of the boundary.

For 1-NN **(k=1)**, the decision boundary is very sensitive to noise and may overfit the training data. The decision boundary for 1-NN forms a Voronoi diagram, where the feature space is divided into regions based on the closest training point. The edges of these regions represent the decision boundaries.

As **k increases**, the boundary becomes smoother because the classifier takes a majority vote, and it becomes less sensitive to individual noisy points.

The shape of regions can be also influenced by the choice of **distance metric**.

It can handle also not linearly separable data points and more complex shapes as it follows the borders of the regions according to the class distributions in space.

![title](images/nearest_neighbors.png)

# DECISION TREE

The decision boundary for a decision tree classifier represents the regions in the feature space where the model assigns the same class label.

A decision tree splits the feature space into regions using axis-aligned splits (e.g., "feature >5" or "feature  ≤3" or based on a category 0/1).
The resulting decision boundary is a collection of rectangles (in 2D) or hyper-rectangles (in higher dimensions) - intersections of hyperplanes.

In 2D where the decision boundary is a combination of axis-aligned segments or rectangles, each rectangle corresponds to a leaf node of the tree, and all points within it are assigned the same label.

**Shallow Tree:** Fewer splits -> Large rectangular regions.

**Deep Tree:** More splits -> Smaller, finely divided regions -> Captures finer details of the data distribution but can overfit.

![title](images/decision_tree.png)

# RANDOM FOREST

Each tree generates its own decision boundary, which as we said is typically axis-aligned and piecewise-rectangular. The boundary is more detailed in regions where the training data is dense and less so in sparse regions. This reflects how trees adapt to local patterns.

**More trees** Smoother and more stable boundary, as more trees decrease the influence of outliers and noise.

**Fewer trees:** Slightly noisier, jagged boundaries resembling individual tree splits.

**Shallow trees:** Coarser boundaries, unable to capture small patterns.

**Deep trees:** Finer boundaries, and overfitting risk for individual trees is mitigated by averaging.

![title](images/random_forest.png)

# NAIVE BAYES

The decision boundary separates regions where one class has a higher probability than the others.
Mathematically, it’s where P(y1∣X)=P(y2∣X), meaning the probabilities of two classes are equal.

Boundaries, when assuming a gaussian distribtion, are often linear or moderately curved, which reflects the simplicity of the model.
<!-- The type of feature distributions we assume (Gaussian, multinomial, etc.) can directly influence the boundary. -->
If one class has a much higher prior probability P(y), the decision boundary skews in favor of that class.


![title](images/naive_bayes01.png) 
![title](images/naive_bayes02.png)


# LOGISTIC REGRESSION
The region where the predicted probability of one class equals that of another (typically at 50%).
Logistic regression produces a linear decision boundary because it is based on a linear combination of input features passed through a sigmoid function.
Using logistic regression to classify two classes in a 2D plane produces the decision boundary as a straight line. In higher dimensions, it becomes a hyperplane. Due to the linear nature of DB it cannot effectively separate non-linear patterns such as moons or blobs.

<!-- Regularization can shrink the coefficients leading to less steepness until we reach the horizontal line -> which gives us the average for the dependent variable. -->
With too much regularization, the model may oversimplify, leading to underfitting and a less effective boundary.

![title](images/logistic_regression.png)

# Linear discriminant analysis

It tries to find the linear combination of features that best separates the classes. This results in a linear decision boundary.

The decision boundary is a straight line (in 2D) or a hyperplane (in higher dims) that separates the two classes.
The decision boundary is perpendicular to the line connecting the class means and located based on the prior probabilities and covariances. The method tries to separate the two means as much as possible.

Since it produces a linear decision boundary it has issues separating non-linear patterns such as overlapping distributions.

![title](images/lda01.png)

![title](images/lda02.png)

# Quadratic discriminant analysis

QDA allows each class to have its own covariance matrix instead of assuming a shared covariance matrix for all classes.
This results in quadratic decision boundaries rather than linear ones. 
QDA is more flexible than LDA and can model some non-linear relationships between features and classes, so it can also handle a more complex datasets.

![title](images/lda01.png)

![title](images/lda02.png)

# MLP

The hidden layer's non-linearity allows the MLP to model non-linear decision boundaries. 



For a single neuron it can specify decision boundary is linear because the single neuron applies a weighted sum of the inputs so it separates the space in two subspaces. By introducing a hidden layer we are introducing an arbitrary number of these space divisions which we can combine in the output neuron.
So hidden neurons collectively partition the feature space into regions and combine them to form the curved boundary.

For linearly separable data (e.g., points separable by a straight line), the decision boundary of an MLP can resemble that of logistic regression: a straight line or hyperplane. If the hidden layer has enough neurons, it can still introduce some slight curvature, but the problem doesn’t require it.

With enough neurons in the hidden layer, an MLP can approximate any decision boundary (this is a result of the Universal Approximation Theorem). However, the number of neurons and training data must be sufficient to represent the complexity of the data.


### How hyperparameters affect the decision boundary

#### Number of neurons in the hidden layer:

**Few neurons:** The model may underfit and produce overly simple decision boundaries (e.g., straight or slightly curved lines).

**More neurons:** Allows the model to capture more complex decision boundaries.

For example, with moons data, adding neurons enables the network to curve and adapt to the crescent shapes.

**Too many neurons:** Increases the risk of overfitting, where the decision boundary becomes unnecessarily complex and starts fitting noise.

![title](images/mlp_overview.png)

#### Regularization
Alpha is a parameter for regularization term, aka penalty term, that combats overfitting by constraining the size of the weights. Increasing alpha may fix high variance (a sign of overfitting) by encouraging smaller weights, resulting in a decision boundary plot that appears with lesser curvatures. Similarly, decreasing alpha may fix high bias (a sign of underfitting) by encouraging larger weights, potentially resulting in a more complicated decision boundary.

![title](images/mlp_alpha.png)

#### Learning Rate:

A low learning rate might cause the decision boundary to converge too slowly, resulting in suboptimal performance.
A high learning rate might lead to instability, causing the boundary to oscillate without finding a good fit.

![title](images/mlp_lr.png)

#### Number of Training Epochs:

Training for too few epochs can lead to underfitting, with an overly simple boundary.
Too many epochs can lead to overfitting, where the boundary becomes unnecessarily detailed and complex.

### What DB does MLP with more than 1 hidden layer generate?

![title](images/mlp_multilayer.png)



# SVM

For linearly separable data, the SVM constructs a straight line (in 2D) or a hyperplane (in higher dimensions) as the decision boundary.

When data is not linearly separable, we can use different (than linear) kernels to map the data into a higher-dimensional space where it becomes linearly separable. The decision boundary is then a non-linear curve in the original input space but linear in the feature space.

In the case of a **polynomial kernel**, with the increasing degree of the polynomial, the SVM can fit more detailed/complex (e.g., twisting more) boundaries between classes, but we also risk overfitting to the data.

**Margin Softness (Slack Variable):**
For datasets with overlapping classes, SVM can use slack variables to allow some points to violate the margin.
A softer margin leads to smoother boundaries that tolerate noise, while a harder margin enforces stricter separation.

![title](images/svm_overview.png)


#### Regularization Parameter (C):
C controls the trade-off between maximizing the margin and minimizing misclassification:

**Small C:** Increases margin size by allowing some misclassifications. This leads to a simpler, smoother decision boundary that generalizes better.

**Large C:** Reduces the margin to classify all training points correctly. This can result in a complex boundary that overfits the training data.

![title](images/svm_reg_c.png)




#### Gamma (for RBF Kernel):
Gamma controls the influence of individual data points on the decision boundary:

**Low Gamma:** Each data point has a far-reaching influence, leading to smoother, more generalized decision boundaries.

**High Gamma:** Each data point has a limited influence, resulting in a highly flexible decision boundary that can overfit the data.


![title](images/svm_gamma.png)

#### How does regularization parameter C affect the margins?

A larger value of C results in a narrower margin and stricter classification, while a smaller value of C allows for a larger margin and more misclassification.


![title](images/svm_margins_C.png)
