# Adaboost Algorithm

- Adaptive Boosting
- Ensemble Learning algorithm
- Combines many weak learners into a strong learner by reweighting training samples to focus on hard examples.
- Learners are trained sequentially, and each one pays more attention to examples the previous learners got wrong.
- Weak Learner - A model that performs slightly better than random guessing.
- Finally we combine all models through a weighted vote.
- Greedy Error correction

$$
\begin{align*}\Set{(x_1,y_1),...,(x_n,y_n)}\\y_i \in \Set{-1,+1}\\F(x) = \sum_{t=1}^{T} \alpha_t h_t(x)\\ \text{Final Prediction} = \text{sign}(F(x)) \end{align*}
$$

- $h_t(x)$ is the weak learner
- $\alpha_t$  is the learner weight
- $T$ is the number of rounds

Algorithm

- We initialise sample weights $\frac{1}{n}$
- Train $h_t$ to minimize weighted error

$$
\epsilon_t = \sum_{i=1}^{n} w_i^{(t)} 1 (h_t(x_i) \neq y_i)
$$

- Compute the learner weight

$$
\alpha_t = \frac{1}{2} \ln \left( \frac{1-\epsilon_t}{\epsilon_t}\right)
$$

- lower the error higher the influence. If $\epsilon >0.5$, model is worse than random ( discard then)

- We update the sample weights

$$
w_i^{(t+1)} = w_i^{(t)}.e^{-\alpha_t y_i h_t(x_i)}
$$

- if the prediction is correct then weight decreases
- if the prediction is incorrect then weight increases
- We normalize the weights after this
- Repeat this process T times.
- and we have the final classifier as follows;

$$
H(x) = \text{sign}\left(\sum_{t=1}^{T} \alpha_t h_t(x)\right)
$$

- In adaboost the most common weak learners are decision stumps. This has very low variance.

Adaboost minimizes the exponential loss; 

$$
\mathcal{L} = \sum_{i=1}^{n} e^{-y_i F(x_i)}
$$

In [4]:
import numpy as np

class DecisionStump:
    """Simple decision stump used as a weak learner for AdaBoost"""
    def __init__(self):
        self.feature_index = None
        self.threshold = None
        self.polarity = 1  # direction of inequality
        self.alpha = None  # weight assigned by AdaBoost

    def predict(self, X):
        n_samples = X.shape[0]
        predictions = np.ones(n_samples)
        if self.polarity == 1:
            predictions[X[:, self.feature_index] < self.threshold] = -1
        else:
            predictions[X[:, self.feature_index] > self.threshold] = -1
        return predictions

class AdaBoost:
    def __init__(self, n_estimators=10):
        self.n_estimators = n_estimators
        self.stumps = []

    def fit(self, X, y):
        n_samples, n_features = X.shape
        # initialize weights
        w = np.ones(n_samples) / n_samples
        self.stumps = []

        for t in range(self.n_estimators):
            stump = DecisionStump()
            min_error = float('inf')

            # find best stump for weighted data
            for feature in range(n_features):
                feature_values = np.unique(X[:, feature])
                for threshold in feature_values:
                    for polarity in [1, -1]:
                        predictions = np.ones(n_samples)
                        if polarity == 1:
                            predictions[X[:, feature] < threshold] = -1
                        else:
                            predictions[X[:, feature] > threshold] = -1
                        error = np.sum(w * (predictions != y))
                        if error < min_error:
                            min_error = error
                            stump.feature_index = feature
                            stump.threshold = threshold
                            stump.polarity = polarity

            # avoid divide by zero
            EPS = 1e-10
            stump.alpha = 0.5 * np.log((1 - min_error + EPS) / (min_error + EPS))
            
            # update sample weights
            predictions = stump.predict(X)
            w *= np.exp(-stump.alpha * y * predictions)
            w /= np.sum(w)  # normalize

            self.stumps.append(stump)

    def predict(self, X):
        n_samples = X.shape[0]
        F = np.zeros(n_samples)
        for stump in self.stumps:
            F += stump.alpha * stump.predict(X)
        return np.sign(F)

# ---------------------------
# Example run
# ---------------------------
if __name__ == "__main__":
    # small dataset (XOR-like)
    X = np.array([
    [0,0],
    [0,1],
    [1,0],
    [1,1]
    ], dtype=float)

    y = np.array([-1,1,1,-1], dtype=int)

    model = AdaBoost(n_estimators=50)
    model.fit(X, y)
    preds = model.predict(X)

    print("Predictions:", preds)
    print("Actual labels:", y)


Predictions: [0. 0. 0. 0.]
Actual labels: [-1  1  1 -1]
