# **Sequential Boosting**

Source:  [https://github.com/d-insight/code-bank.git](https://github.com/d-insight/code-bank.git)  
License: [MIT License](https://opensource.org/licenses/MIT). See open source [license](LICENSE) in the Code Bank repository. 

-------------

## Overview

Demonstrate a sequential process of boosting by chaining together decisions trees to predict, generate residuals, re-predict, generate new residuals, and so on... 

-------------

## **Part 0**: Setup

In [None]:
# import all packages 
import numpy as np

# scikit-learn
from sklearn.tree import DecisionTreeRegressor

# plotting 
import matplotlib.pyplot as plt
%matplotlib inline


In [None]:
# all constants

# decision tree depth
MAX_DEPTH = 3

# plotting constants
FIGSIZE   = (7, 5)
LINEWIDTH = 3
ROWS      = 2
COLS      = 3


In [None]:
# helper function for plotting 

def myPlot(x, y, labelData, y_pred=None, labelPred=None):
    
    plt.figure(figsize=FIGSIZE)
    plt.plot(x, y, linewidth = LINEWIDTH, label = labelData)
    if type(y_pred) == np.ndarray and labelPred: 
        plt.plot(x, y_pred, linewidth = LINEWIDTH, label = labelPred)
    plt.ylim(-1.1, 1.1)
    plt.grid()
    plt.legend()
    
    return plt.show()
    

## **Part 1**: Generate toy data from sine function

We generate 100 samples from the trigonometric sine function.

In [None]:
# create toy data
SAMPLES = 100

# Feature value
x = np.linspace(-2*np.pi, 2*np.pi, SAMPLES)

# Actual values come from the sin function
y = np.sin(x)

# Plot data
myPlot(x, y, 'data')

## **Part 2**: Fit a decision tree regressor to original data

We fit a single (!) decision tree to the original data

In [None]:
# Predict 
y_hat = DecisionTreeRegressor(max_depth=MAX_DEPTH).fit(x.reshape(-1, 1), y).predict(x.reshape(-1, 1))

# Compare predictions with ground truth 
myPlot(x, y, 'data', y_hat, 'prediction')


## **Part 3a**: Fit a new decision tree regressor to the residuals

We now fit a new, single decision tree to the residuals. 

In [None]:
# Residuals 
y_res1 = y - y_hat
myPlot(x, y_res1, 'residuals')
print('Sum of residuals: {}'.format(round(sum(abs(y_res1)), 2)))

In [None]:
# Fit new tree to residuals
y_res1_hat = DecisionTreeRegressor(max_depth=MAX_DEPTH).fit(x.reshape(-1, 1), y_res1).predict(x.reshape(-1, 1))

# Compare predictions with residuals
myPlot(x, y_res1, 'residuals', y_res1_hat, 'prediction')

## **Part 3b**: Fit another new decision tree regressor to the remaining residuals

In [None]:
# Fit new tree to residuals
y_res2 = y_res1 - y_res1_hat
y_res2_hat = DecisionTreeRegressor(max_depth=MAX_DEPTH).fit(x.reshape(-1, 1), y_res2).predict(x.reshape(-1, 1))

# Compare predictions with residuals
myPlot(x, y_res2, 'residuals', y_res2_hat, 'prediction')
print('Sum of residuals: {}'.format(round(sum(abs(y_res2)), 2)))

## **Part 3c**: And again ...

Note how the new decision trees in each iteration focus on the larger residual errors. The remaining residuals tend to be smaller. 

In [None]:
# Fit new tree to residuals
y_res3 = y_res2 - y_res2_hat
y_res3_hat = DecisionTreeRegressor(max_depth=MAX_DEPTH).fit(x.reshape(-1, 1), y_res3).predict(x.reshape(-1, 1))

# Compare predictions with residuals
myPlot(x, y_res3, 'residuals', y_res3_hat, 'prediction')
print('Sum of residuals: {}'.format(round(sum(abs(y_res3)), 2)))

## **Part 3d**: And again ...

In [None]:
# Fit new tree to residuals
y_res4 = y_res3 - y_res3_hat
y_res4_hat = DecisionTreeRegressor(max_depth=MAX_DEPTH).fit(x.reshape(-1, 1), y_res4).predict(x.reshape(-1, 1))

# Compare predictions with residuals
myPlot(x, y_res4, 'residuals', y_res4_hat, 'prediction')
print('Sum of residuals: {}'.format(round(sum(abs(y_res4)), 2)))

## **Part 4**: All together

In [None]:
# Adjust/overwrite maximum decision tree depth 
MAX_DEPTH = 3

# Set up sub-plot 
fig, axs = plt.subplots(ROWS, COLS, figsize=(20, 10))
axs      = axs.flatten()
y_data   = y

for i, ax in enumerate(axs): 
    
    # Ground truth = residuals 
    if i > 0: y_data = y_data - y_hat
    # Plot ground truth data
    if i == 0: ax.set_title('Original data')
    if i > 0: ax.set_title('Tree {} on residuals\nSum of residuals: {}'.format(i, round(sum(abs(y_data)), 2)))
    ax.plot(x, y_data, linewidth = LINEWIDTH, label='data')
    ax.set_ylim(-1.1, 1.1)
        
    # Predict 
    y_hat = DecisionTreeRegressor(max_depth=MAX_DEPTH).fit(x.reshape(-1, 1), y_data).predict(x.reshape(-1, 1))
    
    # Plot prediction 
    ax.plot(x, y_hat, linewidth = LINEWIDTH, label='prediction')
    ax.grid()
    ax.legend()

## **Bonus**: Further Reading

- What is the difference between Bagging and Boosting? https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
- The intuition behind boosting: https://medium.com/greyatom/a-quick-guide-to-boosting-in-ml-acf7c1585cb5