# Gradient Boosting

Gradient boosting trees works by growing trees sequentially. The models fits on the residual errors of the previous model to update residuals. 


# Training Process

The model starts with an initial base model which makes a prediction for each observation. The subsequent model then makes improvements by differentiating the residuals of the previous tree and scaling them with a learning rate. This process continues until all predictors are trained. The ensemble then makes predictions by classifying the class that receives the most weighted votes. 


# Gradient Boosting Hyperparameters

Since it’s an ensemble of decision trees it contains hyperparameters to control the growth of decision trees and ensembles.

- n_estimators: Number of trees. Can overfit if this is too large

- learning_rate: Shrinkage parameter. This controls the contribution of weak learners in the final contribution. Setting this at a value too low will require more trees but the model will generalize better. So, a low learning rate and high n_estimator works well.

- subsample: fraction of training instances to be used for training each tree. setting this at 0.25 means that each tree only uses 25% of the training instances selected randomly. This trades high bias for low variance. This is called stochastic gradient boosting.

- warm_start: early stopping callback parameter if the model does not improve well for 5 consecutive iterations.

- max_depth: Size of each tree

- max_leaf_nodes: 

- min_samples_split: Minimum number of samples to split


# Gradient Boosting pros and cons


**Pros**

- They provide feature importance scores


**Cons**

- Each model in the boosting sequence depends upon the previous tree in the series. This creates a bottleneck.



# 1. Libraries

In [1]:
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

In [2]:
# Import Data
df = pd.read_csv('LungCapData.csv')
df.head()

Unnamed: 0,LungCap,Age,Height,Smoke,Gender,Caesarean
0,6.475,6,62.1,no,male,no
1,10.125,18,74.7,yes,female,no
2,9.55,16,69.7,no,female,yes
3,11.125,14,71.0,no,male,no
4,4.8,5,56.9,no,male,no


# 2. Preprocessing

In [3]:
# Predictors and Target
X = df.drop(columns = ['LungCap'])
y = df['LungCap']

# Instantiate one-hot encoder
ohe = OneHotEncoder()

# columns to be one hot encoded
ct = make_column_transformer(

    (ohe, ['Smoke', 'Gender', 'Caesarean']),
    remainder = 'passthrough')

# predictors and target variable
X = np.array(ct.fit_transform(X))
y = np.array(y)

# Checck input and target variable shape
X.shape, y.shape

((725, 8), (725,))

In [4]:
# Training and Testing subsets 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 911)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print('Standardized feature Mean:',  X_train.mean().round())
print('Standardized feature SD :',   X_train.std().round())

Standardized feature Mean: 0.0
Standardized feature SD : 1.0


# 3. Training 

In [5]:
# Training the Random Forest Classification
gbc = GradientBoostingRegressor(n_estimators = 10, random_state = 0)
gbc.fit(X_train, y_train)

GradientBoostingRegressor(n_estimators=10, random_state=0)

# 4. Testing

In [6]:
# Predicting the Test set results
y_pred = gbc.predict(X_test)

# Mean squared error
print('Mean Squared Error :', mean_squared_error(y_test, y_pred))

Mean Squared Error : 2.1257018536038896


# 5. K-Fold Cross Validation

In [7]:
# 10 fold cross validation
R2 = cross_val_score(estimator = gbc,
                             X = X,
                             y = y,
                             cv = 10)

# Cross validation accuracy and standard deviation
print(R2)
print("R2: {:.3f} %".format(R2.mean()*100))
print("R2 Standard Deviation: {:.3f} %".format(R2.std()*100))

[0.72012262 0.70833702 0.75267031 0.75421688 0.72884653 0.70718999
 0.70907298 0.70889971 0.74264751 0.65978791]
R2: 71.918 %
R2 Standard Deviation: 2.644 %


# 5. Hyperparametric Tuning

In [8]:
# Grid Search CV
param_grid = [{'max_depth':    [2, 3, 5, 10, 20, 50],
              'n_estimators':  [3,5, 10, 15],
              'learning_rate': [0.25, 0.5, 0.75]}]

# Instantiate GBC
gbc = GradientBoostingRegressor()

# Configure GridSearchCV
grid_search = GridSearchCV(gbc, param_grid, cv=5,
                                  scoring="r2",
                                  n_jobs=-1)

# Initiate Search
grid_search.fit(X_train, y_train)


# Extract Tuned Parameters and Predictive Accuracy
tuned_params = grid_search.best_params_
tuned_score = grid_search.best_score_
best_estimator = grid_search.best_estimator_

# Print Results
print("Best R2: {:.2f} %".format(grid_search.best_score_*100))
print("Best Parameters:", tuned_params)

Best R2: 82.00 %
Best Parameters: {'learning_rate': 0.25, 'max_depth': 3, 'n_estimators': 15}


In [9]:
# Randomized Search

param_space = {'max_depth':    [2, 3, 5, 10, 20, 50],
              'n_estimators':  [3,5, 10, 15],
              'learning_rate': [0.25, 0.5, 0.75]}

# Configure Randomized Search
random_search = RandomizedSearchCV(gbc, param_space, n_iter=32,
                                        scoring="r2", cv=5,
                                        n_jobs=-1, random_state=911)

# Initiate Search
random_search.fit(X_train, y_train)

# Extract Tuned Parameters and Predictive Accuracy
tuned_params = random_search.best_params_
tuned_score = random_search.best_score_
best_estimator = random_search.best_estimator_

# Print Results
print("Best R2: {:.2f} %".format(random_search.best_score_*100))
print("Best Parameters:", tuned_params)

Best R2: 81.61 %
Best Parameters: {'n_estimators': 5, 'max_depth': 3, 'learning_rate': 0.5}
