## Boosting Decision Stumps

In this notebook, you will use [scikit-learn's decision trees](http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html) 
and implement the AdaBoost algorithm for binary classification.

The exercise is mostly based on the lecture and the following book:

T. Hastie, R. Tibshirani, and J. Friedman: [*The Elements for Statistical Learning*](http://statweb.stanford.edu/~tibs/ElemStatLearn/), 2001

As usual, some setup first:

In [None]:
import sklearn.datasets
import sklearn.tree
import sklearn.base
import sklearn.metrics
import sklearn.ensemble

import numpy as np

% matplotlib inline
import matplotlib.pyplot as plt

The data set is an example from [scikit-learn's datasets submodule](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_hastie_10_2.html) which was introduced by [Hastie, Tibshirani and Friedman]. It has 10 features, which are standard independent Gaussian. The deterministic target Y is defined with:

```   y[i] = 1 if np.sum(X[i] ** 2) > 9.34 else -1```

In [None]:
# Create a binary classification dataset
def create_dataset():
    num_samples=12000
    n_train = 2000
    
    X, y = sklearn.datasets.make_hastie_10_2(n_samples=num_samples, 
                                             random_state=1)
    X_train = X[:n_train]
    y_train = y[:n_train]
    X_test = X[n_train:]
    y_test = y[n_train:]
    return X_train, y_train, X_test, y_test

In [None]:
class AdaBooster(object):
    
    def __init__(self, weak_learner, iterations):
        '''
        weak_learner: sklearn classifier - G_m
        iterations: number of iteration to train - M
        '''
        self.weak_learner = weak_learner
        self.iterations = iterations
        
        # 1. Initialize some more lists
        self.classifiers = list() # trained classifiers G_m
        self.alphas = list()      # classifier weights       
    
    def fit(self, X, y):
        # YOUR TURN
        # 1. initialize the observation weights
        
        # 2. For m=1 to M
        # hint: Use sklearn.base.clone method to copy the classifier 
        #       in each iteration
        # (a) Fit a classifier using the weights
        #     hint: Use 'sample_weights' in the fit() method of the weak_learner
        #     See also: http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
        # (b) Compute error
        # (c) Compute classifier weight
        # (d) Update observation weights
        pass
    
    def predict(self, X):
        # Compute predictions by looping over all classifiers 
        # and adding weighted predictions. 
        # The actual output should be -1 if y_i <= 0 and 1 otherwise
    
        # YOUR TURN
        pred = np.zeros(X.shape[0])
        
        return pred
    
    def staged_predict(self, X):
        # returns predictions for X for all iterations 
        # this is only for convenience to simplify computing
        # train/test error for all iterations

        # YOUR TURN
        staged_predictions = []
            
        return staged_predictions

### Test your implementation

Now, you have a working implementation of the AdaBoost algorithm. To test your algorithm for correctness you can compare to [sklearns AdaBoost implementation](http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) which should yield the same output as your implementation:

In [None]:
# Use this cell to test whether your implementation is correct

weak_learner = sklearn.tree.DecisionTreeClassifier(max_depth=1)
adaboost = AdaBooster(weak_learner=weak_learner, iterations=400)
sk_adaboost = sklearn.ensemble.AdaBoostClassifier(weak_learner, 
                                                  algorithm="SAMME", 
                                                  n_estimators=400)

# Get data
X_train, y_train, X_test, y_test = create_dataset()

# Fit model
adaboost.fit(X_train, y_train)
sk_adaboost.fit(X_train, y_train)

# Predict
pred = adaboost.predict(X_test)
pred_sk = sk_adaboost.predict(X_test)

print("Difference: %d" % np.sum([pred != pred_sk]))

### Experiment

Next, we will experiment with AdaBoost's hyperparameters. Here are some suggestions to try:

 * Investigate the difference between performance on trainset and testset.
 * Add some noise on the (training) target and repeat.
 * Investigate the effect of the tree depth on the performance on trainset and testset.
 
Before you investigate the model it will be helpful to prepare a function that generates a plot showing
test/train performance over number of iterations:

In [None]:
def plot_train_test_error(model, X_train, y_train, X_test, y_test):
    ''' Shows a plot with train/test error over iterations

    model (AdaBooster): trained AdaBoost algorithm
    X_train (array): [#samples, #features]
    y_train (array): [#samples, ]
    X_test (array): [#samples, #features]
    y_test (array): [#samples, ]
    '''
    # Loop over all predictions and compute accuracy
    y_pred_train = model.staged_predict(X_train)
    training_errors = [sklearn.metrics.accuracy_score(y_train, y_pred) 
                       for y_pred in y_pred_train]
    
    y_pred_test = model.staged_predict(X_test)
    test_errors = [sklearn.metrics.accuracy_score(y_test, y_pred) 
                   for y_pred in y_pred_test]
    
    # Plot results
    x_values = np.arange(1, len(training_errors) + 1)

    plt.plot(x_values, training_errors, label="training")
    plt.plot(x_values, test_errors, label="test")
    plt.xlabel("iterations")
    plt.ylabel("accuracy")
    plt.legend(loc="best")
    plt.ylim([0, 1])
    
    plt.show()

In [None]:
# YOUR TURN