## Adaboost
<ol>
<li><b>    
Boosting:</b> Ensemble method combining several weak learners to form a strong learner.</li>
<li><b>    
Weak learner:</b> Model doing slightly better than random guessing</li>
           example: Dicision stump (CART whose maximum depth is 1)
</ol>    
- Train an ensemble of predictors sequentially.<br>
- Each predictor tries to correct its predecessor<br>
- Most popular boosting methods:<br>
<ul>
    <li>
        AdaBoost</li>
    <li>    
    Gradient Boosting</li>
</ul>    
<b>AdaBoost</b>
- Stands for <b>Ada</b>ptive <b>Boost</b>ing<br>
- Each predictor pays more attention to the instances wrongly predicted by its predecessor.<br>
- Achieved by changing the weights of training instances.<br>
- Each predictor is assigned a coefficient $\alpha$ that depends on the predictor's training error<br>

### Define the AdaBoost Classifier

Predict whether a patient suffers from a liver disease using 10 features including Albumin, age and gender.<br>
Train an AdaBoost ensemble to perform the classification task. In addition, given that this dataset is imbalanced, use the ROC AUC score as a metric instead of accuracy.

In [1]:
#import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import roc_auc_score
from sklearn.metrics import mean_squared_error as MSE

In [2]:
#load dataset
lipd = pd.read_csv('./datasets/indian_liver_patient/indian_liver_patient_preprocessed.csv')

lipd.head()

Unnamed: 0.1,Unnamed: 0,Age_std,Total_Bilirubin_std,Direct_Bilirubin_std,Alkaline_Phosphotase_std,Alamine_Aminotransferase_std,Aspartate_Aminotransferase_std,Total_Protiens_std,Albumin_std,Albumin_and_Globulin_Ratio_std,Is_male_std,Liver_disease
0,0,1.247403,-0.42032,-0.495414,-0.42887,-0.355832,-0.319111,0.293722,0.203446,-0.14739,0,1
1,1,1.062306,1.218936,1.423518,1.675083,-0.093573,-0.035962,0.939655,0.077462,-0.648461,1,1
2,2,1.062306,0.640375,0.926017,0.816243,-0.115428,-0.146459,0.478274,0.203446,-0.178707,1,1
3,3,0.815511,-0.372106,-0.388807,-0.449416,-0.36676,-0.312205,0.293722,0.329431,0.16578,1,1
4,4,1.679294,0.093956,0.179766,-0.395996,-0.295731,-0.177537,0.755102,-0.930414,-1.713237,1,1


In [3]:
# define X and y
X = lipd.loc[:, ~lipd.columns.isin(['Unnamed: 0','Liver_disease'])] 
## Drop unnamed column to prevent covergence warning in LogisticRegression
y = lipd['Liver_disease']

#split train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y,test_size = 0.2,random_state=1)

In [4]:
#instantiate classifiers
dt = DecisionTreeClassifier(max_depth=2, random_state=1)

ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)

#fit data
ada.fit(X_train, y_train)

#predict proba
y_pred_proba = ada.predict_proba(X_test)[:,1]

#evaluate model
ada_roc_auc = roc_auc_score(y_test, y_pred_proba)

# Print roc_auc_score
print('ROC AUC score: {:.2f}'.format(ada_roc_auc))

ROC AUC score: 0.64


### Gradient Boosting

**Gradient Boosted Trees**<br>
Sequential correction of predecessor's errors.<br>
Does not tweak the weights of training instances.<br>
Fit each predictor is trained using its predecessor's residual errors as labels.<br>
Gradient Boosted Trees: a CART is used as a base learner.<br>

#### Define, Train and Evaluate GB Regressor

In [5]:
#load dataset
bikes = pd.read_csv('./datasets/bikes.csv')
bikes.head()

Unnamed: 0,hr,holiday,workingday,temp,hum,windspeed,cnt,instant,mnth,yr,Clear to partly cloudy,Light Precipitation,Misty
0,0,0,0,0.76,0.66,0.0,149,13004,7,1,1,0,0
1,1,0,0,0.74,0.7,0.1343,93,13005,7,1,1,0,0
2,2,0,0,0.72,0.74,0.0896,90,13006,7,1,1,0,0
3,3,0,0,0.72,0.84,0.1343,33,13007,7,1,1,0,0
4,4,0,0,0.7,0.79,0.194,4,13008,7,1,1,0,0


In [6]:
# define X and Y
X = bikes.drop('cnt', axis = 1)
y = bikes['cnt']

# split train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,random_state=2)

In [7]:
#instantiate GB regressor
gb = GradientBoostingRegressor(max_depth=4,
                              n_estimators=200,
                              random_state=1)
#train model
gb.fit(X_train, y_train)

#predict test labels
y_pred = gb.predict(X_test)

#evaluate model using root mean squared error
gb_mse = MSE(y_test, y_pred)

gb_rmse = gb_mse ** (1/2)

# Print RMSE
print('Test set RMSE of gb: {:.3f}'.format(gb_rmse))


Test set RMSE of gb: 48.881


### Stochastic Gradient Boosting
Each tree is trained on a random subset of rows ofthe training data.<br>
The sampled instances (40%-80% ofthe training set) are sampled without replacement.<br>
Features are sampled (without replacement) when choosing split points.<br>
Result: further ensemble diversity.<br>
Effect: adding further variance to the ensemble of trees.<br>

#### Regression with SGB
Solve bike count regression problem using stochastic gradient boosting.

#### Define, Train and Evaluate SGB Regressor

In [8]:
# Instantiate sgbr
sgbr = GradientBoostingRegressor(max_depth=4, 
            subsample=0.9,
            max_features=0.75,
            n_estimators=200,                                
            random_state=2)

# Fit sgbr to the training set
sgbr.fit(X_train, y_train)

# Predict test set labels
y_pred = sgbr.predict(X_test)

# Compute test set MSE
mse_test = MSE(y_test, y_pred)

# Compute test set RMSE
rmse_test = mse_test ** (1/2)

# Print rmse_test
print('Test set RMSE of sgbr: {:.3f}'.format(rmse_test))

Test set RMSE of sgbr: 47.260


The stochastic gradient boosting regressor achieves a lower test set RMSE than the gradient boosting regressor