## AdaBoost
Adaptive Boosting. Implementation with trees as weak learners

**Mathematical Background:**  
Goal is to estimate the paremeters of weak models and the weights to combine the models' decisions to produce a final prediction:  
  
<center style="margin: 20px;">$F(x) = \sum_{k=1}^{N}a_k\phi(x;\theta_k)$</center>

where $N$ is the number of samples and $\phi(x;\theta)$ is a weak classifier, by minimizing the cost function $\sum_{k=1}^{N}\exp(-y_iF(x_i))$ in terms of $a_k$ and $\theta_k$

Since this problem is generally hard, we optimize each weak model of the partial sum $F_m(x)$ assuming optimality for previous terms:

<center style="margin: 20px;">$F_{m}(x) = F_{m-1}(x) + a_m\phi(x;\theta_m)$</center>

Key point is that when optimizing $\phi(x;\theta_m)$ in terms of $\theta_m$, the samples $x_i$ are weighted according to the ability of the classifier of the previous step to classify them correctly. The weights are $w_i = \exp(-y_iF_{m-1}(x_i))$.  

Given these weights, the current classifier's objective is to minimize the classification error, weighting each sample accordingly. For example, a classification tree can be used as weak learner using sample weights to calculate class probabilities

In [1]:
%%html
<style>.container {width: 98%}</style>

In [2]:
%load_ext autoreload
%autoreload 2

In [12]:
from __future__ import annotations
from functools import reduce

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.metrics import accuracy_score, confusion_matrix
from sklearn.model_selection import cross_val_score
from sklearn.base import BaseEstimator, ClassifierMixin

import sys
sys.path.insert(0, "../")
from fromscratch.supervised.adaboost import AdaBoost
from fromscratch.supervised.classification_tree import ClassificationTree

### Use a simple dataset with 2 classes

In [7]:
data = datasets.load_breast_cancer()
X = data["data"]
y = data["target"].astype(np.float64)
y[y == 0] = -1

#### (Over)fit the model

In [8]:
ada = AdaBoost()
ada.fit(X, y, max_iters=15)

AdaBoost()

In [9]:
y_pred = ada.predict(X)
confusion_matrix(y, y_pred)

array([[212,   0],
       [  0, 357]], dtype=int64)

#### Cross-validated score

In [10]:
np.mean(cross_val_score(ada, X, y, cv=5))

0.9578481602235677

#### Compare with a simple decision tree
With the same parameters as the weak learner of AdaBoost

In [13]:
tree = ClassificationTree(max_depth=4, min_leaf_samples=1, min_delta_impurity=0.0)

In [14]:
np.mean(cross_val_score(tree, X, y, cv=5))

0.9192206179164726

We can see that the boosted method performed better