# Adaptive Boosting from Scratch
***
## Table of Contents
***

In [92]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_breast_cancer
from numpy.typing import NDArray
from typing import Tuple

## 1. Introduction
Adaptive Boosting (AdaBoost) is a foundational ensemble learning algorithm designed to improve the accuracy of machine learning models by combining multiple **weak classifiers** (often decision stumps - decision trees with a single split) into a single **strong classifier**. Althought AdaBoost is primarily for binary classification, it has been extended to handle multiclass problems and regression tasks in some variants. However, its core mechanism and main use case remain in binary classification.

### Advantages:
- Turn weak models into a strong classifier.
- Less overfitting.
- No need for parameter tuning.

### Limitations:
- Sensitive to outliers as misclassified samples get higher weights.
- Primarily for binary classification.

### Steps:
1. Initialise weights.
2. For each boosting round (M iterations),
    - Train a weak lerner (decision stump).
    - Compute weighted error.
    - Calculate lerner weights $\alpha$.
    - Update sample weights.
    - Repeat for the maximum number of iterations or until weighted error is sufficiently low.
3. Predict.

## 2. Loading Data

In [81]:
data = load_breast_cancer()
X, y = data.data, data.target
y = np.where(y == 0, -1, 1)     # AdaBoost expects labels as -1 and +1

## 3. Initialising Weights
All training samples are initialised with equal weight:

\begin{align*}
    w_i = \dfrac{1}{N}
\end{align*}

where $N$ is the number of samples. For $N = 5$, the initial weights of the sample will be:

\begin{align*}
    w_i = \dfrac{1}{5} = 0.2
\end{align*}

The `np.full` function from NumPy library can generate an array of the specified length with every entry set to the same value.

In [None]:
def initialise_weights(n_samples: int) -> NDArray[np.float64]:
    """
    Initialise all sample weights equally.
    """
    return np.full(n_samples, 1 / n_samples)

In [91]:
print(f'For N = 5: {initialise_weights(5)}')

For N = 5: [0.2 0.2 0.2 0.2 0.2]


## 4. Finding the Best Stump