## AdaBoost

Ensemble methods:
<img src="images/ensemble_outline.jpg" width="500"/>

**WHAT**

AdaBoost [Freund and Schapire](#Firstpaper) is a method to produce a powerful ensemble by building a sequence of estimators.

**WHY**

Theory fundation: Any weak learning algorithm (performs only slightly better than random guessing, error is less than $1/2-\gamma$) can be efficiently transformed or boosted into a strong learning algorithm \[[Schaprie](#Schaprie)\].


**HOW**

AdaBoost fits a sequence of weak learners on repeatedly modified versions of the data. The data are modified by applying weights $w_1, w_2, ... , w_N$ to each of the training samples in each iteration. 

Initially, all the samples have the same weights $w_i = 1/N$, and at the first step, a weak learner is trained on the orginal data. 

Then, in each successive iteration, the sample weights are individually modified and a learner is trained based on the reweighted data. At a given step, the wights of samples that were incorrectly predicted by the boosted model induced at the previous step are increased, while for the samples that were predicted correctly, their weights are decreased.

As iterations proceed, examples that are hard to predict will receive very high weights.

<a name="Firstpaper"></a> \[[Freund and Schapire](https://books.google.com.au/books?hl=en&lr=&id=CAnxFA6DaagC&oi=fnd&pg=PA23&dq=a+decision+theoretic+generalization+of+online+learning&ots=XaVCZZZXsD&sig=2YyHZOvGolS_gG7uhZ1KuPZYjPc#v=onepage&q=a%20decision%20theoretic%20generalization%20of%20online%20learning&f=false)\] Schapire, R.E. and Freund, Y., 1995, March. A decision-theoretic generalization of on-line learning and an application to boosting. In Second European Conference on Computational Learning Theory (pp. 23-37).

<a name="Schapire"></a> \[[Schapire](https://cs.rochester.edu/u/stefanko/Teaching/09CS446/Boosting.pdf)\] Schapire, R.E., 1990. The strength of weak learnability. Machine learning, 5(2), pp.197-227.

### The 1995 paper

#### 1.  1st reading
##### 1.1 Understanding
1. At a given iteration t, the output is made only based on the the specific hypothesis $h_t$
<img src="images/AdaBoost_process.jpg" width="600"/>
2. The details and formulas
<img src="images/AdaBoost_Alg.jpg" width="600"/>

##### 1.2 Questions 
1. For the calculation of $\beta_t$, some have a coefficient 1/2, while some don't have. What does 1/2 do?

### Write a AdaBoost algorithm step by step

In [1]:
# 1. let's define a weak learner: 1-depth decision tree
from sklearn.tree import DecisionTreeClassifier
clf_tree = DecisionTreeClassifier(max_depth = 1, random_state = 1)

In [6]:
# 2. Implement the AdaBoost
import numpy as np


def myAdaBoost(X_train, y_train, X_test, y_test, weak_learner, m_iteration=200):
    # 1. initialise weights for each of the N examples, w_i=1/N
    N, N_test = len(X_train), len(X_test)
    w = np.ones(X_train.shape[0])/N
    
    # 2. boosting
    
    predictions, test_predictions = [], []
    model_weights = []
    
    for m in range(1, m_iteration+1):
        # (a). train a weak classfier G_m(x) to training data with weights w
        weak_learner.fit(X_train, y_train, sample_weight=w)
        
        # wls.append(weak_learner)  # !!! model append in loop does not work?!!!
        
        # (b). calculate error for G_m(x)
        y_pre = weak_learner.predict(X_train)
        predictions.append(y_pre)
        error = np.sum(w*1*(y_pre != y_train)) / np.sum(w)
        
        # for testing
        y_test_pre = weak_learner.predict(X_test)
        test_predictions.append(y_test_pre)
        
        # (c). compute alpha_m
        alpha_m = np.log((1-error)/error)
        model_weights.append(alpha_m)
        
        # (d). update training examples' weights
        w = w * np.exp(alpha_m*1*(y_pre != y_train))
        
    # 3. get the weighted majority vote
    model_weights = np.array(model_weights)  # m
    print('model weights: ', model_weights)
    
    
    # training
    predictions = np.array(predictions) # m*N, predict value is either 1 or -1
    train_final = np.sign(np.sum(np.transpose(predictions)*model_weights, axis=1))
    print('Train accuracy', np.sum(y_train==train_final)/N)
    
    
    # testing
    test_predictions = np.array(test_predictions)
    test_final = np.sign(np.sum(np.transpose(test_predictions)*model_weights, axis=1))
    print('Test accuracy', np.sum(y_test==test_final)/N_test)

In [15]:
X_train = np.arange(10).reshape(10,1)
y_train = np.array([1,1,1,-1,-1,-1,1,1,1,-1])

X_test = np.array([5.8, 2.1, 4.6, 7.9]).reshape(4,1)
y_test = np.array([-1, 1, -1, 1])

myAdaBoost(X_train, y_train, X_test, y_test, clf_tree, m_iteration=3)

model weights:  [0.84729786 1.29928298 1.5040774 ]
Train accuracy 0.6
Test accuracy 0.5


In [5]:
len(X_train)

10

In [None]:
alphas = np.array([0.1, 0.3, 0.8])
print(alphas)
print(predictions)
print(np.transpose(predictions)*alphas)
np.sign(np.sum(np.transpose(predictions)*alphas, axis=1))
predictions==y_train

### Example: Two-class AdaBoost

In [None]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import make_gaussian_quantiles

In [None]:
X1, y1 = make_gaussian_quantiles(cov=2., n_samples=200, n_features=2,
                                n_classes=2, random_state=1)
X2, y2 = make_gaussian_quantiles(mean=(3, 3), cov=1.5,
                                 n_samples=300, n_features=2,
                                 n_classes=2, random_state=1)
X = np.concatenate((X1, X2))
y = np.concatenate((y1, -y2+1))

In [None]:
set(list(y2))

In [None]:
# Create AdaBoosted DT
bdt = AdaBoostClassifier(DecisionTreeClassifier(max_depth=1), algorithm="SAMME",n_estimators=200)

In [None]:
# Fit
bdt.fit(X, y)

In [None]:
# Mean accuracy of self.predict(X) wrt. y.
bdt.score(X,y)

In [None]:
n_trees = len(bdt)

# error at each iteration
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.plot(range(1, n_trees + 1), bdt.estimator_errors_[:n_trees],
         "b", label='SAMME', alpha=.5)
plt.legend()
plt.ylabel('Error')
plt.xlabel('Number of Trees')

# boost weight of each tree
plt.subplot(122)
plt.plot(range(1, n_trees + 1), bdt.estimator_weights_[:n_trees],
         "b", label='SAMME', alpha=.5)
plt.legend()
plt.ylabel('Weight')
plt.xlabel('Number of Trees')

In [None]:
plot_colors = "br"
plot_step = 0.02
class_names = "AB"

plt.figure(figsize=(5, 5))

# Plot the decision boundaries
# plt.subplot(121)
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                     np.arange(y_min, y_max, plot_step))

Z = bdt.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
cs = plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)
plt.axis("tight")

# Plot the training points
for i, n, c in zip(range(2), class_names, plot_colors):
    idx = np.where(y == i)
    plt.scatter(X[idx, 0], X[idx, 1],
                c=c, cmap=plt.cm.Paired,
                s=20, edgecolor='k',
                label="Class %s" % n)
plt.xlim(x_min, x_max)
plt.ylim(y_min, y_max)
plt.legend(loc='upper right')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Decision Boundary')

In [None]:
# Plot the two-class decision scores
twoclass_output = bdt.decision_function(X)
plot_range = (twoclass_output.min(), twoclass_output.max())
plt.figure(figsize=(5,5))
for i, n, c in zip(range(2), class_names, plot_colors):
    plt.hist(twoclass_output[y == i],
             bins=10,
             range=plot_range,
             facecolor=c,
             label='Class %s' % n,
             alpha=.5,
             edgecolor='k')
x1, x2, y1, y2 = plt.axis()
plt.axis((x1, x2, y1, y2 * 1.2))
plt.legend(loc='upper right')
plt.ylabel('Samples')
plt.xlabel('Score')
plt.title('Decision Scores')

plt.tight_layout()
plt.show()