# Adaptive Boosting from Scratch
***
## Table of Contents
1. [Introduction](#1-introduction)
    - [Advantages](#advantages)
    - [Limitations](#limitations)
    - [Steps](#steps)
1. [Loading Data](#2-loading-data)
1. [Initialising Weights](#3-initialising-weights)
1. [Finding the Best Stump](#4-finding-the-best-stump)
1. [Learner Weights](#5-learner-weights)
1. [Updating Sample Weights](#6-updating-sample-weights)
1. [Training Loop](#7-training-loop)
1. [Prediction](#8-prediction)
1. [Encapsulation](#9-encapsulation)
1. [Comparison with Scikit-Learn](#10-comparison-with-scikit-learn)
1. [References](#11-references)
***

In [1]:
import numpy as np
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from numpy.typing import NDArray
from typing import Tuple, Dict, Any, List, Optional

## 1. Introduction
Adaptive Boosting (AdaBoost) is a foundational ensemble learning algorithm designed to improve the accuracy of machine learning models by combining multiple **weak classifiers** (often decision stumps - decision trees with a single split) into a single **strong classifier**. Althought AdaBoost is primarily for binary classification, it has been extended to handle multiclass problems and regression tasks in some variants. However, its core mechanism and main use case remain in binary classification.

### Advantages
- Turn weak models into a strong classifier.
- Less overfitting.
- No need for parameter tuning.

### Limitations
- Sensitive to outliers as misclassified samples get higher weights.
- Primarily for binary classification.

### Steps
1. Initialise weights.
2. For each boosting round (M iterations),
    - Train a weak lerner (decision stump).
    - Compute weighted error.
    - Calculate lerner weights $\alpha$.
    - Update sample weights.
    - Repeat for the maximum number of iterations or until weighted error is sufficiently low.
3. Predict.

## 2. Loading Data

In [2]:
data = load_breast_cancer()
X, y = data.data, data.target
y = np.where(y == 0, -1, 1)     # AdaBoost expects labels as -1 and +1

## 3. Initialising Weights
All training samples are initialised with equal weight:

\begin{align*}
    w_i = \dfrac{1}{N}
\end{align*}

where $N$ is the number of samples. For $N = 5$, the initial weights of the sample will be:

\begin{align*}
    w_i = \dfrac{1}{5} = 0.2
\end{align*}

The `np.full` function from NumPy library can generate an array of the specified length with every entry set to the same value.

In [3]:
def initialise_weights(n_samples: int) -> NDArray[np.float64]:
    """
    Initialise sample weights equally.

    Parameters:
        n_samples: Number of samples.

    Returns:
        Initialised sample weights of shape (n_samples,).
    """
    return np.full(n_samples, 1 / n_samples)

In [4]:
print(f'For N = 5: {initialise_weights(5)}')

For N = 5: [0.2 0.2 0.2 0.2 0.2]


## 4. Finding the Best Stump
<!-- The following `find_best_stump` function searches all features and possible thresholds, and for each, tries both polarities (direction of the inequality). It predicts labels, computes the weighted error, and keeps the stump with the lowest error. -->
The following `find_best_stump` function implements the decision stump: It exhaustively searches for the best one-level split across all features and possible thresholds consdering both directions (polarities), and selects the split that minimises the weighted classification error.

1. Initialise variables.
2. Loop over all features and thresholds (unique values).
3. Loop over both polarities: $[1, -1]$.
4. Make predictions.
    - Initialise all predictions to $+1$.
    - For polarity $1$: set to $-1$ if $\text{value} < \text{threshold}$.
    - Otherwise: set to $+1$.
5. Calculate weighted error.
\begin{align*}
    \epsilon_m = \dfrac{\sum^{N}_{i=1} w_i \cdot \mathbb{I}(h_m(x_i) \neq y_i)}{\sum^{N}_{i=1}w_i}
\end{align*}

    where:
    - $h_m$: $m$-th weak learner.
    - $y_i$: True label.
    - $\mathbb{I}$: Indicator function.

    In fact, weighted error is just a sum of weights for misclassified samples.
6. If the error rate is smaller than `min_error`, update the value (`min_error = error`), best stump and best prediction.
7. Return `best_stump`, `min_error`, and `best_predictions` with the least error.

In [5]:
def find_best_stump(X: NDArray[np.float64], y: NDArray[np.int8],
                    sample_weights: NDArray[np.float64]) -> Tuple[Dict[str, Any], float, NDArray[np.int8]]:
    """
    Find the best decision stump that minimises weighted classification error.

    Parameters:
        X: Feature matrix of shape (n_samples, n_features).
        y: Labels array of shape (n_samples,), with values -1 or 1.
        sample_weights: Sample weights of shape (n_samples,).

    Returns:
        Tuple containing:
            - best_stump: Dictionary with keys 'feature_index', 'threshold', and 'polarity'.
            - min_error: Minimum weighted classification error.
            - best_predictions: Predictions of the best stump on X.
    """

    n_samples, n_features = X.shape
    min_error = float('inf')
    best_stump = {}
    best_predictions = None

    for feature_i in range(n_features):  # Each feature
        feature_vals = X[:, feature_i]  # All values in the selected features
        thresholds = np.unique(feature_vals)  # Unique values in feature_vals
        for threshold in thresholds:
            for polarity in [1, -1]:
                # Predict: 1 if (polarity * feature) < (polarity * threshold), else -1
                predictions = np.ones(n_samples)
                if polarity == 1:
                    predictions[feature_vals < threshold] = -1
                else:
                    predictions[feature_vals > threshold] = -1

                # Calculate weighted error
                misclassified = predictions != y
                error = np.sum(sample_weights[misclassified])

                if error < min_error:
                    min_error = error
                    best_stump = {
                        "feature_index": feature_i,
                        "threshold": threshold,
                        "polarity": polarity
                    }
                    best_predictions = predictions.copy()
    return best_stump, min_error, best_predictions

## 5. Learner Weights
For the current learner $m$, the learner weight $\alpha_m$ is:

\begin{align*}
    \alpha_m = \dfrac{1}{2} \text{ln} \left( \dfrac{1-\epsilon_m + \text{c}}{\epsilon_m + \text{c}} \right)
\end{align*}

where:
- $\epsilon_m$: Error rate calculated inside the `find_best_stump()` function.
- $c$: Small constant added to avoid division by zero. Set to $1 \times 10^{-10}$.

In [6]:
def compute_alpha(error: float) -> float:
    """
    Compute the weight of the weak learner (alpha).

    Parameters:
        error: Weighted classification error of the weak learner.

    Returns:
        Weight of the weak learner.
    """
    c = 1e-10  # constant
    return 0.5 * np.log((1 - error + c) / (error + c))

## 6. Updating Sample Weights
After calculating the learner weight $\alpha_m$, we update the old weight $w_m$ such that:

\begin{align*}
    w_i \leftarrow w_i \cdot \text{e}^{-\alpha_m y_i h_m(x_i)}
\end{align*}

where:
$w_i$: Current weight of sample $i$.
$\alpha_m$: Weight of the weak learner $m$. 
$h_m(x_i)$: Prediction for sample $i$ ($-1$ or $+1$).

The weights are increased for misclassified samples, and are decreased for correctly classified ones:
- If the prediction is **correct** $(y_i = h_m(x_i))$, then $y_i \cdot h_m(x_i) = 1$, so the weight is **decreased**:
\begin{align*}
    w_i \leftarrow w_i \cdot \text{e}^{-\alpha}
\end{align*}

- If the prediction is **incorrect** $(y_i \neq h_m(x_i))$, then $y_i \cdot h_m(x_i) = -1$, so the weight is **increased**:
\begin{align*}
    w_i \leftarrow w_i \cdot \text{e}^{\alpha}
\end{align*}

The function returns the normalised weights (all sample weights sum to 1) for the next AdaBoost iteration.

In [7]:
def update_weights(sample_weights: NDArray[np.float64], alpha: float,
                   y: NDArray[np.int8], predictions: NDArray[np.int8]) -> NDArray[np.float64]:
    """
    Update sample weights: increase for misclassified, decrease for correct.

    Parameters:
        sample_weights: Current sample weights.
        alpha: Weight of the weak learner.
        y: True labels.
        predictions: Predictions from the weak learner.

    Returns:
        Updated and normalised sample weights.
    """
    sample_weights *= np.exp(-alpha * y * predictions)
    sample_weights /= np.sum(sample_weights)  # Normalisation
    return sample_weights

## 7. Training Loop
The training loop runs for the specified number of weak learners `n_weak_learners`. After all iterations, it returns a list of all trained stumps `stumps` with their parameters, and a list of the corresponding weights for each stump `alphas`. 

In [8]:
def adaboost_train(X: NDArray[np.float64], y: NDArray[np.int8],
                   n_weak_learners: int) -> Tuple[List[Dict[str, Any]], List[float]]:
    """
    Train AdaBoost ensemble with decision stumps.

    Parameters:
        X: Feature matrix of shape (n_samples, n_features).
        y: Labels array of shape (n_samples,), with values -1 or 1.
        n_weak_learners: Number of weak learners to train.

    Returns:
        Tuple containing:
            - stumps: List of decision stump dictionaries.
            - alphas: List of weak learner weights.
    """
    n_samples = X.shape[0]
    sample_weights = initialise_weights(n_samples)
    stumps = []
    alphas = []

    for _ in range(n_weak_learners):
        stump, error, predictions = find_best_stump(X, y, sample_weights)
        alpha = compute_alpha(error)
        sample_weights = update_weights(sample_weights, alpha, y, predictions)
        stumps.append(stump)
        alphas.append(alpha)
    return stumps, alphas

## 8. Prediction
The following function makes predictions on input data $X$ using a single decision stump.

In [9]:
def stump_predict(X: NDArray[np.float64], stump: Dict[str, Any]) -> NDArray[np.int8]:
    """
    Predict labels for X using a given decision stump.

    Parameters:
        X: Feature matrix of shape (n_samples, n_features).
        stump: Decision stump parameters.

    Returns:
        Predicted labels (-1 or 1) of shape (n_samples,).
    """
    feature_values = X[:, stump["feature_index"]]
    predictions = np.ones(X.shape[0])
    if stump["polarity"] == 1:
        predictions[feature_values < stump["threshold"]] = -1
    else:
        predictions[feature_values > stump["threshold"]] = -1
    return predictions

Then the `predict()` function combines the predictions from all decision stumps in the AdaBoost ensemble using their respective weights $\alpha$ to produce the final prediction for each sample.

In [10]:
def predict(X: NDArray[np.float64], stumps: List[Dict[str, Any]],
            alphas: List[float]) -> NDArray[np.int8]:
    """
    Aggregate predictions from all stumps using their alphas.

    Parameters:
        X: Feature matrix of shape (n_samples, n_features).
        stumps: List of decision stump dictionaries.
        alphas: List of weak learner weights.

    Returns:
        Final predicted labels (-1 or 1) of shape (n_samples,).
    """
    final_pred = np.zeros(X.shape[0])
    for stump, alpha in zip(stumps, alphas):
        pred = stump_predict(X, stump)
        final_pred += alpha * pred
    return np.sign(final_pred)

In [11]:
# Train AdaBoost
n_weak_learners = 10
stumps, alphas = adaboost_train(X, y, n_weak_learners)

# Predict
y_pred = predict(X, stumps, alphas)
accuracy = np.mean(y_pred == y)
print(f"Accuracy (Training): {accuracy:.4f}")

Accuracy (Training): 0.9736


## 9. Encapsulation

In [None]:
class DecisionStump:
    """
    A simple decision stump (one-level decision tree) used as a weak learner.

    Attributes:
        polarity: The direction of the inequality for the split.
        feature_index: The index of the feature used for splitting.
        threshold: The threshold value for the split.
        alpha: The weight of this stump in the ensemble.
    """

    def __init__(self) -> None:
        """
        Initialise the decision stump with default values.
        """
        self.polarity: int = 1
        self.feature_index: Optional[int] = None
        self.threshold: Optional[float] = None
        self.alpha: Optional[float] = None

    def predict(self, X: NDArray[np.float64]) -> NDArray[np.int8]:
        """
        Predicts class labels for samples in X using the decision stump.

        Args:
            X: Feature matrix of shape (n_samples, n_features).

        Returns:
            Predicted class labels (+1 or -1) of shape (n_samples,).
        """
        n_samples = X.shape[0]
        feature_column = X[:, self.feature_index]
        predictions = np.ones(n_samples)
        if self.polarity == 1:
            predictions[feature_column < self.threshold] = -1
        else:
            predictions[feature_column > self.threshold] = -1
        return predictions


class CustomAdaBoost:
    """
    AdaBoost ensemble classifier using decision stumps.

    Attributes:
        n_weak_learners: Number of weak learners (decision stumps) to use.
        classifiers: List of fitted decision stumps.
    """

    def __init__(self, n_weak_learners: int = 5) -> None:
        """
        Initialise the AdaBoost classifier.

        Args:
            n_weak_learners: Number of weak learners (decision stumps) to use. Defaults to 5.
        """
        self.n_weak_learners = n_weak_learners
        self.classifiers = []

    def fit(self, X: NDArray[np.float64], y: NDArray[np.int8]) -> None:
        """
        Fit the AdaBoost classifier on the training data.

        Args:
            X: Training feature matrix of shape (n_samples, n_features).
            y: Training labels (+1 or -1) of shape (n_samples,).
        """
        n_samples, n_features = X.shape
        # Initialise weights to 1/N
        sample_weights = np.full(n_samples, 1 / n_samples)
        self.classifiers = []

        for _ in range(self.n_weak_learners):
            stump = DecisionStump()
            min_error = float('inf')

            # Find the best decision stump
            for feature_index in range(n_features):
                feature_column = X[:, feature_index]
                thresholds = np.unique(feature_column)
                for threshold in thresholds:
                    polarity = 1
                    predictions = np.ones(n_samples)
                    predictions[feature_column < threshold] = -1

                    # Calculate weighted error
                    error = np.sum(sample_weights[y != predictions])

                    # If error > 0.5, flip polarity
                    if error > 0.5:
                        error = 1 - error
                        polarity = -1

                    if error < min_error:
                        stump.polarity = polarity
                        stump.threshold = threshold
                        stump.feature_index = feature_index
                        min_error = error

            # Compute alpha (learner weight)
            c = 1e-10  # to avoid division by zero
            stump.alpha = 0.5 * np.log((1.0 - min_error + c) / (min_error + c))

            # Update weights
            predictions = stump.predict(X)
            sample_weights *= np.exp(-stump.alpha * y * predictions)
            sample_weights /= np.sum(sample_weights)  # Normalise

            self.classifiers.append(stump)

    def predict(self, X: NDArray[np.float64]) -> NDArray[np.int8]:
        """
        Predict class labels for samples in X using the trained AdaBoost ensemble.

        Args:
            X: Feature matrix of shape (n_samples, n_features).

        Returns:
            Predicted class labels (+1 or -1) of shape (n_samples,).
        """
        weighted_preds = [clf.alpha *
                          clf.predict(X) for clf in self.classifiers]
        y_pred = np.sum(weighted_preds, axis=0)
        return np.sign(y_pred)

In [13]:
# Split into train and test
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

# Train AdaBoost
model = CustomAdaBoost(n_weak_learners=10)
model.fit(X_train, y_train)

# Predict and evaluate
y_pred = model.predict(X_test)
accuracy = np.mean(y_test == y_pred)
print(f'Test Accuracy (Custom): {accuracy:.4f}')

Test Accuracy (Custom): 0.9912


## 10. Comparison with Scikit-Learn

In [14]:
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Initialise AdaBoost with decision stumps
base_estimator = DecisionTreeClassifier(max_depth=1)
ada = AdaBoostClassifier(estimator=base_estimator,
                         n_estimators=10, random_state=42)
ada.fit(X_train, y_train)

# Predict and evaluate
y_pred = ada.predict(X_test)
print(f'Test Accuracy (SK): {accuracy_score(y_test, y_pred):.4f}')

Test Accuracy (SK): 0.9649


## 11. References
1. Data Science Wizards. (2023). *Understanding the AdaBoost Algorithm.* <br>
https://medium.com/@datasciencewizards/understanding-the-adaboost-algorithm-2e9344d83d9b

1. GeeksforGeeks. (2025). *Boosting in Machine Learning | Boosting and AdaBoost.*<br>
https://www.geeksforgeeks.org/machine-learning/boosting-in-machine-learning-boosting-and-adaboost/

1. GeeksforGeeks. (2025). *Implementing the AdaBoost Algorithm From Scratch.*<br>
https://www.geeksforgeeks.org/machine-learning/implementing-the-adaboost-algorithm-from-scratch/

1. Patrick Loeber. (2020). *AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial*. <br>
https://youtu.be/wF5t4Mmv5us

1. scikit-learn. (n.d.). *AdaBoostClassifier — scikit-learn API Reference.* <br>
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html

1. scikit-learn. (n.d.). *1.11.7. AdaBoost — scikit-learn User Guide.* <br>
https://scikit-learn.org/stable/modules/ensemble.html#adaboost

1. StatQuest with Josh Starmer. (2019). *AdaBoost, Clearly Explained*. <br>
https://youtu.be/LsK-xG1cLYA