## AdaBoost
Adaptive Boosting. Implementation with trees as weak learners

**Mathematical Background:**  
Goal is to estimate the paremeters of weak models and the weights to combine the models' decisions to produce a final prediction:  
  
<center style="margin: 20px;">$F(x) = \sum_{k=1}^{N}a_k\phi(x;\theta_k)$</center>

where $N$ is the number of samples and $\phi(x;\theta)$ is a weak classifier, by minimizing the cost function $\sum_{k=1}^{N}\exp(-y_iF(x_i))$ in terms of $a_k$ and $\theta_k$

Since this problem is generally hard, we optimize each weak model of the partial sum $F_m(x)$ assuming optimality for previous terms:

<center style="margin: 20px;">$F_{m}(x) = F_{m-1}(x) + a_m\phi(x;\theta_m)$</center>

Key point is that when optimizing $\phi(x;\theta_m)$ in terms of $\theta_m$, the samples $x_i$ are weighted according to the ability of the classifier of the previous step to classify them correctly. The weights are $w_i = \exp(-y_iF_{m-1}(x_i))$.  

Given these weights, the current classifier's objective is to minimize the classification error, weighting each sample accordingly. For example, a classification tree can be used as weak learner using sample weights to calculate class probabilities

In [1]:
%%html
<style>.container {width: 98%}</style>

In [2]:
%load_ext autoreload
%autoreload 2

In [121]:
from __future__ import annotations
from functools import reduce

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import cross_val_score
from sklearn.base import BaseEstimator, ClassifierMixin

import sys
sys.path.insert(0, "../")
from classification_tree import ClassificationTree

In [122]:
class AdaBoost(BaseEstimator, ClassifierMixin):
    """
    Adaptive Boosting learner based on classification trees

    Methods
    -------
    fit(X, y)
        Iteratively adds and fits weak learners on data by updating sample weights
        according to classification error of previous model

    predict(X)
        Returns predictions for the input samples
    """

    def __init__(self):
        self.alphas = None  # weights used to combine weak learners
        self.learners = None
        self.p_list = None

    def fit(self, X: np.ndarray, y: np.ndarray, max_iters: int = 10, **kwargs) -> AdaBoost:
        """
        Iteratively adds and fits weak learners on data by updating sample weights
        according to classification error of previous model

        Parameters
        ----------
        X : numpy.ndarray
            Array of training samples with shape (n_samples, n_features)

        y : numpy.ndarray
            Array of training targets with shape (n_samples,)

        max_iters : int
            Number of boosting iterations
        """

        weights = np.ones(X.shape[0]) / X.shape[0]
        m = 0
        self.learners = [None] * max_iters  # arrayholding all learners
        self.alphas = [None] * max_iters  # array holding weights
        self.p_list = [None] * max_iters  # array holding classification error in each iteration

        while True:
            clf = ClassificationTree(max_depth=4, min_leaf_samples=1, min_delta_impurity=0.0)
            clf = clf.fit(X, y, sample_weights=weights, **kwargs)
            self.learners[m] = clf

            y_pred = clf.predict(X)
            P_m = (((1 - y * y_pred) > 0).astype(int) * weights).sum()
            self.p_list[m] = P_m

            a_m = (1 / 2) * np.log((1 - P_m) / P_m)
            self.alphas[m] = a_m

            weights = weights * np.exp(-y * a_m * y_pred)
            weights = weights / weights.sum()

            m += 1
            if m == max_iters:
                break

        return self

    def predict(self, X: np.ndarray):
        """
        Returns predictions for the input samples

        Parameters
        ----------
        X : numpy.ndarray
            Array of testing samples with shape (n_samples, n_features)
        """

        if self.alphas is None:
            raise ValueError("Model not fitted. Call fit() method first")

        return np.sign(np.array([a * clf.predict(X) for a, clf in zip(self.alphas, self.learners)]).T.sum(axis=1))

    def score(self, X, y, **kwargs):
        return accuracy_score(y, self.predict(X))

### Use a simple dataset with 2 classes

In [123]:
data = datasets.load_breast_cancer()
X = data["data"]
y = data["target"].astype(np.float64)
y[y == 0] = -1

### (Over)fit the model

In [135]:
ada = AdaBoost()
ada.fit(X, y, max_iters=15)

AdaBoost()

In [140]:
y_pred = ada.predict(X)
confusion_matrix(y, y_pred)

array([[212,   0],
       [  0, 357]], dtype=int64)

In [133]:
np.mean(cross_val_score(ada, X, y, cv=5))

0.9578481602235677

#### Compare with a simple decision tree
With the same parameters as the weak learner of AdaBoost

In [131]:
tree = ClassificationTree(max_depth=4, min_leaf_samples=1, min_delta_impurity=0.0)

In [132]:
np.mean(cross_val_score(tree, X, y, cv=5))

0.9192206179164726

We can see that the boosted method performed much better