### Introduction
AdaBoost, short for Adaptive Boosting, is a machine learning algorithm that is commonly used for classification tasks. It belongs to the family of ensemble learning methods, which combine the predictions of multiple individual models (called weak learners) to create a more accurate and robust final prediction.

### Algorithm
The basic idea behind AdaBoost is to sequentially train a series of weak learners on different subsets of the training data. Each weak learner is trained to focus on the examples that were misclassified by the previous weak learners, thereby attempting to correct the mistakes made by the previous models. During each iteration of training, the algorithm assigns weights to the training examples, with higher weights given to the examples that were misclassified in the previous iteration.

#### Training AdaBoost
0. Initialize weight $w_t = 1 / N$ where $N$ is number of training samples 
1. Train weak learner $h_t$ on given sample weights (time step $t$)
2. Calculate the error $\epsilon_t = \sum_{h_t(x_i) \ne y_i} w_{t,i}$
3. Find the alpha $\alpha_t$ of $h_t$, $\alpha_t = 0.5 * ln((1 - \epsilon_t)/\epsilon_t)$
4. Reassign new weight $w_{t+1, i} = w_{t,i} * e^{\alpha_t * h_t(x_i) * y_i}$
5. Normalize the weight to make sure sum up to 1, $w_{t+1, i} = w_{t+1, i} / \sum_j w_{t+1, j}$

#### Predict
Assume we already trained the AdaBoost of `K` estimators:
$H(x) = sign(\sum_t^K \alpha_t * h_t(x))$

### Implementation

In [1]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_hastie_10_2

In [2]:
X, Y = make_hastie_10_2()

In [3]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.5)

In [5]:
def ada_boost(X_train, Y_train, n_estimators: int):
    weights = np.ones_like(Y_train) / len(Y_train)
    alphas = np.zeros(n_estimators)
    clfs = []

    for i in range(n_estimators):
        wl = DecisionTreeClassifier(max_depth=2, max_features="log2")
        wl.fit(X_train, Y_train, weights)

        Y_pred = wl.predict(X_train)
        error = np.sum(weights[Y_pred != Y_train])

        if error > 0.5:
            print(f"No improvement, stop at iteration [{i}]!")
            return bag_of_wl[:i], alphas[:i]

        
        alphas[i] = 0.5 * np.log((1 - error) / error)
        weights *= np.exp(-alphas[i] * Y_pred * Y_train)
        weights /= np.sum(weights)
        clfs.append(wl)

    return clfs, alphas

In [6]:
def ada_predict(clfs: list, alphas: list, X):
    stack = []
    for i, h in enumerate(clfs):
        p = alphas[i] * h.predict(X)
        stack.append(p)
    Y_pred = np.array(stack)
    return np.sign(Y_pred.sum(axis=0))
    

In [7]:
clfs, alphas = ada_boost(X_train, Y_train, 50)

Y_pred = ada_predict(clfs, alphas, X_train)
print(f"Accuracy on train: {np.sum(Y_pred == Y_train) / len(Y_train)}")

Y_pred = ada_predict(clfs, alphas, X_test)
print(f"Accuracy on test: {np.sum(Y_pred == Y_test) / len(Y_test)}")

Accuracy on train: 0.8703333333333333
Accuracy on test: 0.8446666666666667


In [8]:
clfs, alphas = ada_boost(X_train, Y_train, 200)

Y_pred = ada_predict(clfs, alphas, X_train)
print(f"Accuracy on train: {np.sum(Y_pred == Y_train) / len(Y_train)}")

Y_pred = ada_predict(clfs, alphas, X_test)
print(f"Accuracy on test: {np.sum(Y_pred == Y_test) / len(Y_test)}")

Accuracy on train: 0.952
Accuracy on test: 0.9233333333333333


In [9]:
clfs, alphas = ada_boost(X_train, Y_train, 400)

Y_pred = ada_predict(clfs, alphas, X_train)
print(f"Accuracy on train: {np.sum(Y_pred == Y_train) / len(Y_train)}")

Y_pred = ada_predict(clfs, alphas, X_test)
print(f"Accuracy on test: {np.sum(Y_pred == Y_test) / len(Y_test)}")

Accuracy on train: 0.974
Accuracy on test: 0.9398333333333333
