___

# Machine Learning in Geosciences 
Department of Applied Geoinformatics and Carthography, Charles University

Lukas Brodsky lukas.brodsky@natur.cuni.cz

## Exercise: Building and Evaluating Ensemble Models

This notebook is dedicated to ensemble learning exercizes. 

**Objective**:
Understand and implement different ensemble learning techniques — Bagging, Boosting, and Stacking and compare their performance.

Tasks: 
1. Implement Bagging using `BaggingClassifier()` and compare the result with a weak classifier, e.g. `DecisionTreeClassifier()` using high variance (noisy moons) dataset. 

2. Implement Boosting using `GradientBoostingClassifier()` and compare result with a weak classifier `DecisionTreeClassifier()` on  a complex decision boundary (circles) dataset. 

3. Implement Stacking using `StackingClassifier()` based on `SVC()`, `DecisionTreeClassifier()` and `KNeighborsClassifier()` and `LogisticRegression()`. Compare the stacking result with single weak classifiers, e.g. `DecisionTreeClassifier()`.  

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons, make_circles, make_classification
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler

# ensmbles 
from sklearn.ensemble import BaggingClassifier, GradientBoostingClassifier, StackingClassifier
# base estimators 
from sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

from sklearn.metrics import accuracy_score, classification_report

### Dataset 1 - bagging

In [None]:
# Dataset 1 for BAGGING: High Variance (Noisy Moons)
X_bagging, y_bagging = make_moons(n_samples=1000, noise=0.4, random_state=42)
X_train_bag, X_test_bag, y_train_bag, y_test_bag = train_test_split(X_bagging, y_bagging, 
                                                                    test_size=0.2, random_state=42)

In [None]:
plt.scatter(X_bagging[:, 0], X_bagging[:, 1], c=y_bagging, cmap="viridis", alpha=0.5)
plt.title("Dataset: High Variance")

### Bagging classifier

**A Bagging classifier** is an ensemble meta-estimator that fits base classifiers each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction. Such a meta-estimator can typically be used as a way to reduce the variance of a black-box estimator (e.g., a decision tree), by introducing randomization into its construction procedure and then making an ensemble out of it.

`
class sklearn.ensemble.BaggingClassifier(
    estimator=None, 
    n_estimators=10, 
    max_samples=1.0,
    max_features=1.0, 
    bootstrap=True,
    n_jobs=None
    )
`

In [None]:
# hyperparameters tunes number of estimators, sample size, and bootstrap settings.
bagging_params = {
    "n_estimators": [50, 100, 150, 200],
    "max_samples": [0.5, 0.8, 1.0]
}

In [None]:
# use GridSearchCV() on BaggingClassifier() with DecisionTreeClassifier()
# cv=10, return_train_score=True 
bagging_grid = pass 

In [None]:
# search and fit 
pass 

In [None]:
# best paramters are: estimator.best_params_
pass 

In [None]:
# the best one is: estimator.best_estimator_

#### Model evaluation

In [None]:
# training score: estimator.best_score_
BC_train_acc = pass

In [None]:
# prediction for testing 
bagging_pred = pass 

In [None]:
# accuracy: accuracy_score(y_test_bag, bagging_pred)
BC_test_acc = pass 

In [None]:
# print the accuracies
pass 

#### Plot decision boundary

In [None]:
# Ensemble Decision Boundary
def plot_ensmble_boundary(model, X, y, title):
    h = 0.02
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    for model in model.estimators_:
        Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
        Z = Z.reshape(xx.shape)
        cs = plt.contourf(xx, yy, Z, alpha=0.1)

    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k", marker="o", alpha=0.3)
    plt.title(title)

In [None]:
plot_ensmble_boundary(bagging_best, X_test_bag, y_test_bag, "Bagging Decision Boundary")

### Single model Decision Tree

In [None]:
# DT DecisionTreeClassifier()  
pass 

In [None]:
# fit the model 
pass 

In [None]:
# training accuracy: accuracy_score()
DT_train_acc = pass 

In [None]:
# testing accuracy: accuracy_score()
DT_test_acc = pass 

In [None]:
# Decision Boundary Plot
def plot_decision_boundary(model, X, y, title):
    h = 0.02
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k", marker="o", alpha=0.3)
    plt.contourf(xx, yy, Z, alpha=0.5)

    plt.title(title)

In [None]:
plot_decision_boundary(dt, X_test_bag, y_test_bag, "Single DT Decision Boundary")

#### Models comparison (accuarcy)

In [None]:
print("\n===== Models Comparision =====")
print(f'Bagging training error: {round(BC_train_acc, 2)}')
print(f'Bagging testing  error: {round(BC_test_acc , 2)}')
print('---')
print(f'Decition Tree training: {round(DT_train_acc, 2)}')
print(f'Decition Tree training: {round(DT_test_acc, 2)}')

### Does averaging weak learners reduces overfitting on noisy data? 

.

### Boosting classifier 

**Gradient Boosting** for classification builds an additive model in a forward stage-wise fashion; it allows for the optimization of arbitrary differentiable loss functions. In each stage n_classes_ regression trees are fit on the negative gradient of the loss function, e.g. binary or multiclass log loss. 

`
class sklearn.ensemble.GradientBoostingClassifier(
    n_estimators=100, 
    learning_rate=0.1, 
    max_depth=3
    )
`

where learning rate shrinks the contribution of each tree by learning_rate. There is a trade-off between learning_rate and n_estimators. Values must be in the range [0.0, inf).

### Dataset 2 - boosting 

In [None]:
# Dataset 2 for BOOSTING: Complex Decision Boundary (Circles)
X_boosting, y_boosting = make_circles(n_samples=1000, noise=0.2, factor=0.5, random_state=42)
X_train_boost, X_test_boost, y_train_boost, y_test_boost = train_test_split(X_boosting, y_boosting, test_size=0.2, random_state=42)

In [None]:
plt.scatter(X_boosting[:, 0], X_boosting[:, 1], c=y_boosting, cmap="viridis", alpha=0.5)
plt.title("Dataset: Complex Decision Boundary")

### Boosting classifier

In [None]:
# Boosting model
# boosting_clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1, random_state=42)

In [None]:
# Boosting: optimizes tree depth, learning rate, and number of estimators
boosting_params = {
    "n_estimators": [50, 100, 200],
    "learning_rate": [0.01, 0.1, 0.2],
    "max_depth": [2, 3, 4]
}

In [None]:
# run GridSearchCV() on GradientBoostingClassifier 
# cv=5
boosting_grid = pass 

In [None]:
# fit through grid search CV 
pass 

In [None]:
# the best one: boosting_estimator.best_estimator_
pass 

In [None]:
# training score: boosting_estimator.best_score_
Boost_train_acc = pass 

In [None]:
# test the model 
boosting_pred = pass 

In [None]:
# testing acccuracy accuracy_score()
Boost_test_acc = pass 

In [None]:
# print
pass 

In [None]:
plot_decision_boundary(boosting_best, X_boosting, y_boosting, "Boosting single Decision Boundary")

In [None]:
# Function to Plot Decision Boundaries at Different Stages
def plot_decision_boundaries_ensemble(model, X, y, stages=[1, 5, 50, 100], 
                                      title="Gradient Boosting Evolution"):
    h = 0.02  # Step size in meshgrid
    x_min, x_max = X[:, 0].min() - 0.5, X[:, 0].max() + 0.5
    y_min, y_max = X[:, 1].min() - 0.5, X[:, 1].max() + 0.5
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

    fig, axes = plt.subplots(1, len(stages), figsize=(15, 4))
    for ax, stage in zip(axes, stages):
        # Partial predictions using first `stage` trees
        stage_model = GradientBoostingClassifier(n_estimators=stage, learning_rate=0.1, random_state=42)
        stage_model.fit(X, y)
        Z = stage_model.predict(np.c_[xx.ravel(), yy.ravel()])
        Z = Z.reshape(xx.shape)

        # Plot decision boundary
        ax.contourf(xx, yy, Z, alpha=0.5)
        ax.set_title(f"{stage} Model(s)")
        ax.scatter(X[:, 0], X[:, 1], c=y, edgecolors="k", marker="o", alpha=0.1)
    plt.suptitle(title)
    plt.show()

In [None]:
plot_decision_boundaries_ensemble(boosting_best, X_boosting, y_boosting, stages=[1, 5, 20, 100], 
                                  title="Gradient Boosting Evolution")

### Weaker classifier

In [None]:
# DecisionTreeClassifier()
pass 

In [None]:
# fit model
pass 

In [None]:
# training accuracy
DT_train_acc = pass 

In [None]:
plot_decision_boundary(dt, X_train_boost, y_train_boost, "Single DT Decision Boundary")

In [None]:
print("\n===== Models Comparision =====")
print(f'Boosting training error: {round(Boost_train_acc, 2)}')
print(f'Boosting testing  error: {round(Boost_test_acc, 2)}')
print('---')
print(f'Decition Tree training: {round(DT_train_acc, 2)}')
print(f'Decition Tree training: {round(DT_test_acc, 2)}')

### Does boosting improves weak model errors in overlapping, complex patterns? 

.

### Stacking 

**Stack of estimators with a final classifier** consists in stacking the output of individual estimator and use a classifier to compute the final prediction. Stacking allows to use the strength of each individual estimator by using their output as input of a final estimator.

`
class sklearn.ensemble.StackingClassifier(
    estimators,
    final_estimator
    )
`

wehere `estimators` are base estimators, 
and `final_estimator` is a classifier which will be used to combine the base estimators. The default classifier is a LogisticRegression.

### Dataset 3 - stacking

In [None]:
# Dataset 3 for STACKING: Diverse Model Mistakes (Classification)
X_stacking, y_stacking = make_classification(n_samples=1000, n_features=10, n_informative=3, class_sep=0.5, 
                                             n_redundant=2, n_clusters_per_class=2, random_state=42)

X_train_stack, X_test_stack, y_train_stack, y_test_stack = train_test_split(X_stacking, y_stacking, 
                                                                            test_size=0.2, random_state=42)

In [None]:
# use SVM, DT, and KNN 
stacking_estimators = [
    ('svm', SVC(probability=True, kernel='rbf', random_state=42)),
    ('dt', DecisionTreeClassifier(random_state=42)),
    ('knn', KNeighborsClassifier(n_neighbors=5))
]

In [None]:
# stacking: fine-tunes Logistic Regression used as the meta-classifier. 
stacking_params = {
    "final_estimator__C": [0.1, 1, 10]  # regularization strength for Logistic Regression
}

In [None]:
# run GridSearchCV() on StackingClassifier() using e.g. cv=10 
stacking_grid = pass 

In [None]:
# search and fit 
pass 

In [None]:
# the best one is: .best_estimator_
stacking_best = pass 

In [None]:
# training score: .best_score_
Stack_train_acc = spass 

In [None]:
# testing: 
stacking_pred = pass 

In [None]:
# testing accuracy
Stack_test_acc = pass 

In [None]:
# print
pass 

### Weaker classifiers

In [None]:
# SVM: SVC(probability=True, kernel='rbf', random_state=42) 
pass 

In [None]:
# fit 
pass 

In [None]:
# training accuracy
SVM_train_acc = pass

In [None]:
# testing accuracy 
SVM_test_acc = pass 

In [None]:
# DT: DecisionTreeClassifier()  
pass 

In [None]:
# fit
pass 

In [None]:
# training accuracy 
DT_train_acc = pass

In [None]:
# testing accuracy 
DT_test_acc = pass

In [None]:
# KNN: KNeighborsClassifier(n_neighbors=10)
pass 

In [None]:
# fit 
pass 

In [None]:
# training accuracy 
KNN_train_acc = pass 

In [None]:
# testing accuracy 
KNN_test_acc = pass 

In [None]:
print("\n===== Models Comparision =====")
print(f'Stacking training error: {round(Stack_train_acc, 2)}')
print(f'Stacking testing  error: {round(Stack_test_acc, 2)}')
print('===')
print(f'SVM model training: {round(SVM_train_acc, 2)}')
print(f'SVM model testing: {round(SVM_test_acc, 2)}')
print('---')
print(f'Decition Tree training: {round(DT_train_acc, 2)}')
print(f'Decition Tree testing: {round(DT_test_acc, 2)}')
print('---')
print(f'KNN model training: {round(KNN_train_acc, 2)}')
print(f'KNN model testing: {round(KNN_test_acc, 2)}')

### Does boosting improves weak model errors in overlapping, complex patterns? 

.

### Final thoughts

**1. Bagging (Random Forest)** 
   
Questions: 

    Is the model robust on noisy data? 
    Is it true that the high-variance dataset benefits from model averaging? 
    Is it true that Bagging leads to less overfitting compared to a single decision tree.
    

**2. Boosting** 

Questions: 

    Is it true that it corrects mistakes iteratively, capturing non-linear interactions better? 
    Is it true that it would outperform traditional models like Decision Trees.
    

**3. Stacking** 

Questions: 

    Does it works better when different models make different errors? 
    Is it true that stacking leverages these complementary strengths for improved accuracy? 