## Ensemble Learning

The concept behind it is simple. Multiple machine learning models are combined to obtain a more accurate model.

Bagging, boosting and stacking are the three most popular ensemble learning techniques. Each of these techniques offers a unique approach to improving predictive accuracy.

As we know from the bias-variance trade-off, an underfit model has high bias and low variance, whereas an overfit model has high variance and low bias. In either case, there is no balance between bias and variance. For there to be a balance, both the bias and variance need to be low. Ensemble learning tries to balance this bias-variance trade-off by reducing either the bias or the variance.

Ensemble learning improves a model’s performance in mainly three ways:

- By reducing the variance of weak learners
- By reducing the bias of weak learners,
- By improving the overall accuracy of strong learners.

### Key Differences:

- Parallel vs. Sequential: Bagging and Boosting use multiple base models in parallel, whereas Stacking uses them in a sequential manner.
- Variance vs. Bias Reduction: Bagging reduces variance by training models independently, whereas Boosting focuses on reducing bias by sequentially correcting mistakes.
- Feature Utilization: Stacking utilizes predictions from multiple base models as additional features for the meta-model, combining different model strengths.
- Objective: Bagging and Boosting are ensemble methods that focus on improving individual model performance, while Stacking combines multiple models to create a more powerful unified model.

### Reducing Variance with Bagging

One popular ensemble method is Bagging, which stands for Bootstrap Aggregating. Bagging is particularly effective at reducing variance in predictions and improving the overall accuracy and robustness of the model.

#### Bagging Steps

- Bagging: Create multiple subsets of the training data and train diverse base models.
- Model Aggregation: Combine predictions through averaging (regression) or majority vote (classification).
- Reducing Variance: Errors among diverse models cancel out, leading to more stable predictions.

In [10]:
import pandas as pd
from sklearn.datasets import load_digits
from sklearn.datasets import load_wine
from sklearn.datasets import load_diabetes
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import StackingClassifier

In this case, we are using Bagging with Decision Trees as the base model to classify the "load_digits" dataset, which contains images of handwritten digits (0 to 9).

In [7]:
# Load the "load_digits" dataset
digits = load_digits()
X, y = digits.data, digits.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a decision tree classifier as the base model
base_model = DecisionTreeClassifier(random_state=42)

# Create a Bagging classifier with 20 base estimators (trees)
bagging_model = BaggingClassifier(base_model, n_estimators=20, random_state=42)

# Train the Bagging model
bagging_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = bagging_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Bagging Accuracy:", accuracy)

Bagging Accuracy: 0.9444444444444444


### Reducing Bias by Boosting

Boosting is another powerful ensemble learning method that aims to reduce bias and improve the overall performance of machine learning models. Unlike Bagging, which focuses on reducing variance, Boosting focuses on reducing bias and increasing the model's accuracy.

#### Boosting Steps

- Iterative Learning: Boosting trains weak learners iteratively.
- Weighted Training: Misclassified instances get higher weights.
- Model Weighting: Better models receive higher weights.
- Model Aggregation: Combine predictions using weighted averaging or voting.
- Bias Reduction: Emphasize challenging instances to reduce bias.
- Final Prediction: Strong model with improved performance.

In this case, we are using AdaBoost with Decision Stumps (weak learners) as the base model to classify the "Wine" dataset, which contains features extracted from three types of wines.

In [5]:
# Load the Wine dataset
wine = load_wine()
X, y = wine.data, wine.target

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create a decision tree classifier as the base model
base_model = DecisionTreeClassifier(max_depth=1, random_state=42)

# Create an AdaBoost classifier with 100 base estimators (weak learners)
adaboost_model = AdaBoostClassifier(base_model, n_estimators=100, random_state=42)

# Train the AdaBoost model
adaboost_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = adaboost_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("AdaBoost Accuracy:", accuracy)

AdaBoost Accuracy: 0.9166666666666666


### Improving Model Accuracy with Stacking

Stacking is an ensemble learning technique that combines multiple individual models to achieve better predictive performance.

#### Stacking Steps

- Create Diverse Base Models: Build multiple independent base models.
- Generate Predictions: Train base models and get predictions for the target variable.
- Create Meta-Model: Develop a meta-model using base models' predictions.
- Blend Predictions: Train the meta-model to blend base models' predictions.
- Make Final Predictions: Use the meta-model to predict new data by combining base models' predictions.

In this case, we are using a StackingClassifier to classify the "Diabetes" dataset, which contains features related to diabetes patients' medical details. We will convert it into a binary classification problem by predicting whether the disease progression will be greater than 150.

In [11]:
# Load the Diabetes dataset
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target

# Since it's a regression dataset, let's convert it into a binary classification problem
y_binary = (y > 150).astype(int)

# Split data into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y_binary, test_size=0.2, random_state=42)

# Define the base models
base_models = [
    ('rf', RandomForestClassifier(n_estimators=100, random_state=42)),
    ('gb', GradientBoostingClassifier(n_estimators=100, random_state=42))
]

# Define the meta-model
meta_model = LogisticRegression()

# Create the stacking classifier
stacking_model = StackingClassifier(estimators=base_models, final_estimator=meta_model)

# Train the stacking model
stacking_model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = stacking_model.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Stacking Accuracy:", accuracy)

Stacking Accuracy: 0.7528089887640449
