<font color="#CC3D3D"><p>
# Decision Tree based Ensemble Models

<img align='left' src="https://cdn-images-1.medium.com/max/1000/1*QJZ6W-Pck_W7RlIDwUIN9Q.jpeg" width=700, height=500>

##### Data Preparation

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pylab as plt
%matplotlib inline
import warnings
warnings.filterwarnings("ignore")

In [2]:
%%time

# California Housing dataset (20640 samples, 8 numeric features)
from sklearn.datasets import fetch_california_housing

X, y = fetch_california_housing(return_X_y=True)

CPU times: total: 750 ms
Wall time: 7.34 s


In [3]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=0)
X_train.shape

(10320, 8)

In [4]:
# 동일한 조건의 실험을 위해 모든 알고리즘에 아래와 같은 파라미터 설정
hyperparam = {'n_estimators': 200, 'random_state': 0}

<font color="blue"><p>
### Bagging
<img align='left' src="http://drive.google.com/uc?export=view&id=1px4nXiYkoRZrPpnHlkYn0hWfGih9SHpB" width=650, height=500>

In [5]:
%%time

from sklearn.ensemble import BaggingRegressor

bagging = BaggingRegressor(**hyperparam)
bagging.fit(X_train, y_train).score(X_test, y_test)

CPU times: total: 20.4 s
Wall time: 1min 2s


0.7913458806077901

<font color="blue"><p>
### Random Forest (RF)
<img align='left' src="https://c.mql5.com/2/33/image1__1.png">

In [6]:
%%time

from sklearn.ensemble import RandomForestRegressor

rf = RandomForestRegressor(**hyperparam)
rf.fit(X_train, y_train).score(X_test, y_test)

CPU times: total: 20.4 s
Wall time: 1min 1s


0.7914160796401511

<font color="blue"><p>
### Gradient Boosting (GBM) #####   
<br/><img src="https://explained.ai/gradient-boosting/images/golf-dir-vector.png" width=800, height=600>   
<img src='http://drive.google.com/uc?export=view&id=1IPejYVq077Z1HZLkl3_DpSgtjwXzcGAf' width=550, height=400>

In [7]:
%%time

from sklearn.ensemble import GradientBoostingRegressor

gbm = GradientBoostingRegressor(**hyperparam)
gbm.fit(X_train, y_train).score(X_test, y_test)

CPU times: total: 6.38 s
Wall time: 16.1 s


0.8021970696547149

<font color="blue"><p>
### eXtreme Gradient Boosting (XGBoost) #####   
<img align='left' src='https://dzone.com/storage/temp/13069535-xgboost-features.png'>

**XGBoost**의 성능에 가장 큰 영향을 미치는 하이퍼파라미터는 다음과 같다:

| 파라미터 이름 | 파라미터 유형 | 권장 범위 |
|---|:---:|---:|
|**reg_alpha**|Continuous|**0 ~ 1000**|
|colsample_bytree|Continuous|0.5 ~ 1|
|**learning_rate**|Continuous|**0.1 ~ 0.5**|
|gamma|Continuous|0 ~ 5|
|reg_lambda|Continuous|0 ~ 1000|
|max_depth|Integer|0 ~ 10|
|**min_child_weight**|Continuous|**0 ~ 120**|
|**num_round**|Continuous|**1 ~ 4000**|
|**subsample**|Continuous|**0.5 ~ 1**|

In [10]:
#!pip install xgboost, lightgbm, catboost

Collecting lightgbm
  Downloading lightgbm-3.3.5-py3-none-win_amd64.whl (1.0 MB)
                                              0.0/1.0 MB ? eta -:--:--
     ----                                     0.1/1.0 MB ? eta -:--:--
     -------------                            0.4/1.0 MB 3.7 MB/s eta 0:00:01
     ---------------------------              0.7/1.0 MB 6.2 MB/s eta 0:00:01
     --------------------------------------   1.0/1.0 MB 5.7 MB/s eta 0:00:01
     ---------------------------------------  1.0/1.0 MB 5.4 MB/s eta 0:00:01
     ---------------------------------------- 1.0/1.0 MB 4.3 MB/s eta 0:00:00
Installing collected packages: lightgbm
Successfully installed lightgbm-3.3.5


In [11]:
%%time

from xgboost import XGBRegressor

xgb =  XGBRegressor(**hyperparam)
xgb.fit(X_train, y_train).score(X_test, y_test)

CPU times: total: 16.7 s
Wall time: 2.83 s


0.8216724772284858

<font color="blue"><p>
### LightGBM #####   
<img align='left' src='https://www.researchgate.net/publication/348936955/figure/fig2/AS:986417228431363@1612191602872/Gradient-based-One-Side-Sampling-GOSS-along-with-Exclusive-Feature-Bundling-EFB.ppm' width=700>

In [12]:
%%time

from lightgbm import LGBMRegressor

lgbm =  LGBMRegressor(**hyperparam)
lgbm.fit(X_train, y_train).score(X_test, y_test)

CPU times: total: 4.09 s
Wall time: 4.31 s


0.8345437677022924

<font color="blue"><p>
### CatBoost #####   
<img align='left' src='https://i.imgur.com/E7tcz7Q.png'>

In [13]:
%%time

from catboost import CatBoostRegressor

cboost =  CatBoostRegressor(**hyperparam, verbose=False)
cboost.fit(X_train, y_train).score(X_test, y_test)

CPU times: total: 2.69 s
Wall time: 2.79 s


0.8351085082836359

### Performance Comparison Between Ensemble Models

In [None]:
%%time

from sklearn.model_selection import cross_val_score
import time

mean = []
r2_score = []
std = []
elapsed = []
classifiers = ['Bagging', 'RandomForest', 'Gradient Boosting', 'XGBoost', 'LightGBM', 'CatBoost']
models = [bagging, rf, gbm, xgb, lgbm, cboost]

for model in models:
    start = time.time()
    cv_result = cross_val_score(model, X_train, y_train, cv=10)
    end = time.time()
    elapsed.append(end - start)
    mean.append(cv_result.mean())
    std.append(cv_result.std())
    r2_score.append(cv_result)

models_dataframe = pd.DataFrame({'CV Mean':mean,'Std':std, 'Execution Time':elapsed}, index=classifiers) 
print(models_dataframe)
plt.subplots(figsize=(12,6))
box = pd.DataFrame(r2_score, index=classifiers)
box.T.boxplot()
plt.show()

<font color="#CC3D3D"><p>
# End