# Forward Stagewise Algorithm

ref: https://windmissing.github.io/LiHang-TongJiXueXiFangFa/Chapter8/4.html

- 加法模型
- 从前向后，每一步只学习一个基函数及其系数，逐渐逼近优化目标函数式，就可以简化优化的复杂度。

具体地，每步只需要优化如下损失函数：

![image](./0902.png)

![image](./0903.png)

# Gradient Boosting Decision Tree

ref: https://towardsdatascience.com/machine-learning-part-18-boosting-algorithms-gradient-boosting-in-python-ef5ae6965be4

- unlike AdaBoost, the GBDT have a depth larger than 1. 

- typically, Gradient Boost use with a maximum number of leaves of between 8 and 32

- Step 1: Calculate the average of the target label

- Step 2: Calculate the residuals

- Step 3: Construct a decision tree based on the residuals

- Step 4: Predict the target label using all of the trees within the ensemble

    to prevent overfitting, we introduce a huperparameter called learning rate
    
    When we make a prediction, each residual is multiplied by the learning rate
    
    prediction =  average price (step 1) + learning rate * residual predicted by decision tree 
    
- Step 5: Compute the new residuals

- Step 6: Repeat step 3 to 5 until the number of iterations matches the number specified by the hyperparameter(number of estimators)

- Step 7: Once trained, use all of the trees in the ensemble to make a final prediction

In [1]:
from sklearn.ensemble import GradientBoostingRegressor
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.metrics import mean_absolute_error

In [2]:
boston = load_boston()
X = pd.DataFrame(boston.data, columns=boston.feature_names)
y = pd.Series(boston.target)

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [4]:
regressor = GradientBoostingRegressor(max_depth=2, n_estimators=3,learning_rate=1.0)
regressor.fit(X_train, y_train)

GradientBoostingRegressor(learning_rate=1.0, max_depth=2, n_estimators=3)

In [5]:
errors = [mean_squared_error(y_test, y_pred) for y_pred in regressor.staged_predict(X_test)]
# staged_predict() measures the validation error at each stage of training to find the optimal number of trees
best_n_estimators = np.argmin(errors)

In [6]:
best_regressor = GradientBoostingRegressor(max_depth=2,n_estimators=best_n_estimators,learning_rate=1.0)
best_regressor.fit(X_train, y_train)

GradientBoostingRegressor(learning_rate=1.0, max_depth=2, n_estimators=2)

In [7]:
y_pred = best_regressor.predict(X_test)
mean_absolute_error(y_test, y_pred)

3.676385308929364