# Ensemble learning
- Bagging
- Boosting
- Stacking

## Bagging

In [1]:
from sklearn.svm import SVC
from sklearn.ensemble import BaggingClassifier
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

In [2]:
# generate 1000 samples, each represented by 4 features
X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)
X.shape

(1000, 4)

In [3]:
# visualize the top-10 samples
X[:10].round(2)

array([[-1.67, -1.3 ,  0.27, -0.6 ],
       [-2.97, -1.09,  0.71,  0.42],
       [-0.6 , -1.37, -3.12,  0.64],
       [-1.07, -1.18, -1.91,  0.66],
       [-1.31, -0.97, -0.15,  1.19],
       [-2.18, -0.97, -0.1 , -0.89],
       [-1.25, -1.13, -0.15,  1.06],
       [-1.35, -1.07,  0.03, -0.11],
       [-1.13, -1.27,  0.74,  0.21],
       [-0.38, -1.09, -0.01,  1.37]])

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42, shuffle=True)
X_train.shape, X_test.shape

((670, 4), (330, 4))

- Model performance without bagging

In [5]:
SVC().fit(X_train,y_train).score(X_test, y_test)

0.9363636363636364

- Model performance with bagging

In [6]:
clf = BaggingClassifier(estimator=SVC(), n_estimators=10, random_state=0, 
                        max_samples=0.6, max_features=0.8, bootstrap=True)
clf.fit(X_train, y_train)

In [48]:
clf.score(X_test, y_test)

0.9424242424242424

In [12]:
# make prediction for a test sample
clf.predict([[0, 0, 0, 0]])

array([1])

In [41]:
# predicted probability
clf.predict_proba([[0, 0, 0, 0]]) #40% confidence belongs to class1, and 60% confidence belongs to class2

array([[0.4, 0.6]])

In [42]:
clf.classes_

array([0, 1])

In [43]:
clf.estimators_ # set random state because we are randomizing in bagging

[SVC(random_state=2087557356),
 SVC(random_state=132990059),
 SVC(random_state=1109697837),
 SVC(random_state=123230084),
 SVC(random_state=633163265),
 SVC(random_state=998640145),
 SVC(random_state=1452413565),
 SVC(random_state=2006313316),
 SVC(random_state=45050103),
 SVC(random_state=395371042)]

In [10]:
for e in clf.estimators_:
    print(e.predict([[0, 0, 0, 0]]))

ValueError: X has 4 features, but SVC is expecting 3 features as input.

**Question:** can you tell what is wrong with the above implementation?

Answer: The original model was trained by 4 features. However, when we trained the bagging model, we set the hyperparamters max_features=0.8, which means the features for training is now 4*0.8=3.2. Thereby, for the new model with bagging, the model will only expect to include 3 features as input in training, but we have 4 features in the for loop. Thus, the error message returns "ValueError: X has 4 features, but SVC isexpecting 3 features as input."

## Boosting

In [50]:
from sklearn.model_selection import cross_val_score
from sklearn.datasets import load_iris
from sklearn.ensemble import AdaBoostClassifier
from sklearn.tree import DecisionTreeClassifier

In [51]:
X, y = load_iris(return_X_y=True)

- Model performance without boosting

In [52]:
dt = DecisionTreeClassifier(max_depth=1, random_state=0)
dt_scores = cross_val_score(dt, X, y, cv=5)
dt_scores.mean()

0.6666666666666666

- Model performance using boosting

In [53]:
# create a boosting classifier
# please check documentation to understand what does "estimator=None" mean?
clf = AdaBoostClassifier(estimator=None, n_estimators=100, algorithm="SAMME")

In [54]:
scores = cross_val_score(clf, X, y, cv=5)
scores.mean()

0.9533333333333334

**Question:** compare model performance before and after using boosting, refer to our lecture to explain the improvement  

Answer: The model with boosting has a mush higher performance score compared to model with out boosting. The score improved because when we set n_estimators=100, we will build 100 estimators sequantially, each based on the previous weak model to adjust the weight of the features and correct the misclassifications of the previous model. Thus, the combined the weak models contribute to a better model.

## Single-layer stacking

In [55]:
from sklearn.linear_model import RidgeCV, LassoCV
from sklearn.neighbors import KNeighborsRegressor

# create multiple individual estimators
base_estimators = [('ridge', RidgeCV()),
              ('lasso', LassoCV(random_state=42)),
              ('knr', KNeighborsRegressor(n_neighbors=20, metric='euclidean'))]

In [56]:
from sklearn.ensemble import GradientBoostingRegressor

# create a final estimator
gb_reg = GradientBoostingRegressor(n_estimators=25, subsample=0.5, min_samples_leaf=25, 
                                            max_features=1,random_state=42)

In [57]:
from sklearn.ensemble import StackingRegressor

# stacking the multiple individual estomators and the final estimator
stack_reg = StackingRegressor(estimators=base_estimators, final_estimator=gb_reg)

In [61]:
# prepare train/test data
from sklearn.datasets import load_diabetes
X, y = load_diabetes(return_X_y=True)
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,random_state=42)

In [62]:
# fit the stacked regressors
stack_reg.fit(X_train, y_train)

In [63]:
stack_reg.score(X_test, y_test)

0.5267013426135393

**Question:** what does R^2 score mean? What value indicates the model is performing well? 

Answer: The R^2 score is the coefficients of determination, which is used to assess how goodness the fit of a model in regression analysis. The more close of the R^2 value to 1, the better performance of the model, indicating a better fit. If the score is close to 0, it means that the model is making random perfromance of the regression task, indicating a poor performance. 


## Multi-layer stacking

In [60]:
from sklearn.ensemble import RandomForestRegressor

final_layer_rfr = RandomForestRegressor(n_estimators=10, max_features=1, max_leaf_nodes=5,random_state=42)
final_layer_gbr = GradientBoostingRegressor(n_estimators=10, max_features=1, max_leaf_nodes=5,random_state=42)
final_layer = StackingRegressor(estimators=[('rf', final_layer_rfr),('gbrt', final_layer_gbr)],
                                final_estimator=RidgeCV())

multi_layer_regressor = StackingRegressor(estimators=[('ridge', RidgeCV()),
                                                      ('lasso', LassoCV(random_state=42)),
                                                      ('knr', KNeighborsRegressor(n_neighbors=20,metric='euclidean'))],
                                          final_estimator=final_layer)

multi_layer_regressor.fit(X_train, y_train)