# XGboost Boosting Trees

Extreme boosting is a sequential ensemble of algorithms. The model starts with a base estimator and each subsequent model in the series learns from the mistakes of its predecessor. The model learns slowly as it depends upon previous trees in the series

# Training Process

The model starts with an initial base model. The models fits on the residual errors of the previous model to update residuals. The subsequent model then performs better than its predecessor and this sequence continues until all predictors are trained. The ensemble then makes predictions by classifying the class that receives the most weighted votes. 

The model minimizes the residuals through gradient descent.



# XGBoost Hyperparameters

This model has many hyperparameters. They are classified into three categories.

**General parameters** relate to which booster we are using to do boosting, commonly tree or linear model

**Booster parameters** depend on which booster you have chosen

**Learning task parameters** decide on the learning scenario. For example, regression tasks may use different parameters with ranking tasks.

**Command line parameters** relate to behaviour of CLI version of XGBoost.



- n_estimators: Number of trees. Can overfit if this is too large

- learning rate: Shrinkage parameter. Setting this at a value too low will  require more trees but the model will generalize better. So, a low learning rate and high n_estimator works well.

- subsample: fraction of training instances to be used for training each tree. setting this at 0.25 means that each tree only uses 25% of the training instances selected randomly. This trades high bias for low variance. This is called stochastic gradient boosting.

- warm_start: early stopping call-back parameter if the model does not improve well for 5 consecutive iterations.





#  Pros and Cons


**Pros**

- Outperforms most other predictive models

- It is optimized for performance, making it highly scalable. 

**Cons**

- Trades better performance for lack of interpretability

- Works well with non-linear datasets

	



# 1. Libraries

In [8]:
# Importing Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from xgboost import XGBRegressor
from sklearn.metrics import mean_absolute_error, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import RandomizedSearchCV

In [4]:
# Import Data
df = pd.read_csv('LungCapData.csv')
df.head()

Unnamed: 0,LungCap,Age,Height,Smoke,Gender,Caesarean
0,6.475,6,62.1,no,male,no
1,10.125,18,74.7,yes,female,no
2,9.55,16,69.7,no,female,yes
3,11.125,14,71.0,no,male,no
4,4.8,5,56.9,no,male,no


# 2. Preprocessing

In [9]:
# Predictors and Target
X = df.drop(columns = ['LungCap'])
y = df['LungCap']

# Instantiate one-hot encoder
ohe = OneHotEncoder()

# columns to be one hot encoded
ct = make_column_transformer(

    (ohe, ['Smoke', 'Gender', 'Caesarean']),
    remainder = 'passthrough')

# predictors and target variable
X = np.array(ct.fit_transform(X))
y = np.array(y)

# Checck input and target variable shape
X.shape, y.shape

((725, 8), (725,))

In [10]:
# Training and Testing subsets 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 911)

# Feature Scaling
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
print('Standardized feature Mean:',  X_train.mean().round())
print('Standardized feature SD :',   X_train.std().round())

Standardized feature Mean: 0.0
Standardized feature SD : 1.0


# 3. Training

In [14]:
# Training the XGBoost Model
xgb = XGBRegressor(use_label_encoder=False,  
    learning_rate =0.1, # Step size shrinkage to prevents overfitting.
    n_estimators=1000, # Number of Estimators
    max_depth=5, # Maximum depth of a tree.
    min_child_weight=1, #Minimum sum of instance weight (hessian) needed in a child.
    gamma=0, #Minimum loss reduction required to make a further partition on a leaf node of the tree.
    subsample=0.8, #fraction of training instances to be used for training each tree.
    colsample_bytree=0.8, # subsampling of columns.
    objective= 'reg:squarederror', # tells XGBoost that we aim to train a logistic regression model
    nthread=4, #Number of parallel threads used to run XGBoost
    scale_pos_weight=1, # Control the balance of positive and negative weights, useful for unbalanced classes.
    seed=911) # For reproducible results

# Fit the model
xgb.fit(X_train, y_train)

XGBRegressor(base_score=0.5, booster='gbtree', colsample_bylevel=1,
             colsample_bynode=1, colsample_bytree=0.8, gamma=0, gpu_id=-1,
             importance_type='gain', interaction_constraints='',
             learning_rate=0.1, max_delta_step=0, max_depth=5,
             min_child_weight=1, missing=nan, monotone_constraints='()',
             n_estimators=1000, n_jobs=4, nthread=4, num_parallel_tree=1,
             random_state=911, reg_alpha=0, reg_lambda=1, scale_pos_weight=1,
             seed=911, subsample=0.8, tree_method='exact',
             use_label_encoder=False, validate_parameters=1, verbosity=None)

# 3. Model Evaluation

In [15]:
# Predicting the Test set results
y_pred = xgb.predict(X_test)

# Mean squared error
print('Mean Squared Error :', mean_squared_error(y_test, y_pred))

Mean Squared Error : 1.8235127926549428


# 4. K-Fold Cross Validation

In [17]:
# 10 fold cross validation
R2 = cross_val_score(estimator = XGBRegressor(),
                             X = X,
                             y = y,
                             cv = 10)

# Cross validation accuracy and standard deviation
print(R2)
print("R2: {:.3f} %".format(R2.mean()*100))
print("R2 Standard Deviation: {:.3f} %".format(R2.std()*100))

[0.72232326 0.81490868 0.78387278 0.71825417 0.81373546 0.85629128
 0.76562195 0.80271692 0.78271151 0.67267721]
R2: 77.331 %
R2 Standard Deviation: 5.209 %


# 5. Hyperparametric Tuning

In [18]:
# Grid Search CV
param_grid = [{"learning_rate": (0.05, 0.10, 0.15),
               "max_depth": [ 3, 4, 5, 6, 8],
               "min_child_weight": [ 1, 3, 5, 7],
               "gamma":[ 0.0, 0.1, 0.2],
               "colsample_bytree":[ 0.3, 0.4],
               "n_estimators": [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]}]

#Instantiate XGbBoost
xgboost_tuned = XGBRegressor(use_label_encoder=False)

# Configure GridSearchCV
grid_search = GridSearchCV(xgboost_tuned, param_grid, cv=5, scoring = "r2", n_jobs = -1)

# Initiate Search
grid_search.fit(X_train, y_train)

# Extract Tuned Parameters and Predictive Accuracy
tuned_params = grid_search.best_params_
tuned_score = grid_search.best_score_
best_estimator = grid_search.best_estimator_

# Print Results
print("Best Accuracy: {:.2f} %".format(grid_search.best_score_*100))
print("Best Parameters:", tuned_params)

KeyboardInterrupt: 