# Bagging Regression
Bagging (Bootstrap Aggregating) Regression is an ensemble learning technique that improves the stability and accuracy of machine learning algorithms by training multiple models on different subsets of the training data and averaging their predictions. Each subset is created by sampling with replacement from the original dataset.

## Advantages:
- Reduced Variance: By averaging the predictions of multiple models, bagging reduces the variance and helps to prevent overfitting.
- Robustness: It is less sensitive to noisy data and outliers.
- Parallelizable: Each model can be trained independently, making it well-suited for parallel processing.

## Disadvantages:
- Increased Complexity: The overall model becomes more complex and harder to interpret.
- Computationally Intensive: Training multiple models can be computationally expensive and time-consuming.

## Use Case:
- Finance: Predicting stock prices or credit risk.
- Healthcare: Predicting patient outcomes or disease progression.
- Marketing: Customer segmentation and predicting customer lifetime value.

## Scaling (not necessary and necessary Depend on the models)
Whether scaling is needed depends on the base estimator used. For example, tree-based models do not require scaling, but models like Support Vector Regression (SVR) do.

## Encoding (necessary)
Categorical data must be encoded into numerical values.

# Import Libraries

In [26]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from scipy.stats import uniform, loguniform
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import make_regression

In [27]:
# Generate a random regression problem
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# 1. Bagging with the Default Estimator (Decision Tree)

## Grid Search

In [28]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

# Create the Bagging Regressor with default estimator (DecisionTreeRegressor)
bagging_reg = BaggingRegressor(estimator=DecisionTreeRegressor(), n_estimators=50, random_state=42)

# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__max_depth': [None, 10, 20, 30],
    'estimator__min_samples_split': [2, 5, 10]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(bagging_reg, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(X, y)

In [29]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 13
Best Hyperparameters: {'estimator__max_depth': 10, 'estimator__min_samples_split': 2, 'n_estimators': 50}
Best Cross-Validated Score: 0.9385969246574488


In [30]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [31]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

# Create the Bagging Regressor with default estimator (DecisionTreeRegressor)
bagging_reg = BaggingRegressor(estimator=DecisionTreeRegressor(), n_estimators=50, random_state=42)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__max_depth': [None, 10, 20, 30, 40, 50],
    'estimator__min_samples_split': [2, 5, 10, 15]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(bagging_reg, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(X, y)

In [32]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 4
Best Hyperparameters: {'n_estimators': 50, 'estimator__min_samples_split': 2, 'estimator__max_depth': 20}
Best Cross-Validated Score: 0.9376365189819941


In [33]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train BaggingRegressor without search

In [34]:
from sklearn.ensemble import BaggingRegressor
from sklearn.tree import DecisionTreeRegressor

model = BaggingRegressor(estimator=DecisionTreeRegressor(max_depth=10, min_samples_split=5),n_estimators=50,random_state=42)
# model.fit(x_train, y_train)

# 2. Bagging with a Single Estimator (Support Vector Regression)

## Grid Search

In [36]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR


# Create the Bagging Regressor with SVR
bagging_reg_svr = BaggingRegressor(estimator=SVR(), random_state=42)


# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__C': [0.1, 1, 10],
    'estimator__epsilon': [0.1, 0.2, 0.5],
    'estimator__kernel': ['linear', 'poly', 'rbf']
}

# Initialize GridSearchCV
grid_search = GridSearchCV(bagging_reg_svr, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(X, y)

In [37]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 72
Best Hyperparameters: {'estimator__C': 10, 'estimator__epsilon': 0.1, 'estimator__kernel': 'linear', 'n_estimators': 10}
Best Cross-Validated Score: 0.9999991787101559


In [38]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [40]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR

# Create the Bagging Regressor with default estimator (DecisionTreeRegressor)
bagging_reg_svr = BaggingRegressor(estimator=SVR(), random_state=42)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__C': [0.1, 1, 10, 100],
    'estimator__epsilon': [0.1, 0.2, 0.5, 1.0],
    'estimator__kernel': ['linear', 'poly', 'rbf', 'sigmoid']
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(bagging_reg_svr, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(X, y)

In [41]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 27
Best Hyperparameters: {'n_estimators': 10, 'estimator__kernel': 'linear', 'estimator__epsilon': 0.2, 'estimator__C': 100}
Best Cross-Validated Score: 0.999999148110774


In [42]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train BaggingRegressor without search

In [43]:
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR

model = BaggingRegressor(estimator=SVR(kernel='linear', epsilon=0.2, C=100),n_estimators=50,random_state=42)
# model.fit(x_train, y_train)

# 3. Bagging with Multiple Estimators (SVR, Decision Tree, ElasticNet)

## Grid Search

In [None]:
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import VotingRegressor, BaggingRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

# Create the individual regressors without the Pipeline of scaler
svr = SVR()
decision_tree = DecisionTreeRegressor()
elastic_net = ElasticNet()

# Create the VotingRegressor with the different models
voting_regressor = VotingRegressor(estimators=[
    ('svr', svr),
    ('decision_tree', decision_tree),
    ('elastic_net', elastic_net)
])

# Create the Bagging Regressor with VotingRegressor
bagging_reg_voting = BaggingRegressor(estimator=voting_regressor, random_state=42)

# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__svr__C': [0.1, 1, 10],
    'estimator__svr__epsilon': [0.1, 0.2, 0.5],
    'estimator__svr__kernel': ['linear', 'poly', 'rbf'],
    'estimator__decision_tree__max_depth': [None, 10, 20, 30],
    'estimator__decision_tree__min_samples_split': [2, 5, 10],
    'estimator__elastic_net__alpha': [0.1, 1, 10],
    'estimator__elastic_net__l1_ratio': [0.1, 0.5, 0.9]
}

# Initialize GridSearchCV
grid_search_voting = GridSearchCV(bagging_reg_voting, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search_voting.fit(X, y)

In [None]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

In [None]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [46]:
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import VotingRegressor, BaggingRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import RandomizedSearchCV

# Create the individual regressors without the Pipeline of scaler
svr = SVR()
decision_tree = DecisionTreeRegressor()
elastic_net = ElasticNet()

# Create the VotingRegressor with the different models
voting_regressor = VotingRegressor(estimators=[
    ('svr', svr),
    ('decision_tree', decision_tree),
    ('elastic_net', elastic_net)
])

# Create the Bagging Regressor with VotingRegressor
bagging_reg_voting = BaggingRegressor(estimator=voting_regressor, random_state=42)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'estimator__svr__C': [0.1, 1, 10, 100],
    'estimator__svr__epsilon': [0.1, 0.2, 0.5, 1.0],
    'estimator__svr__kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'estimator__decision_tree__max_depth': [None, 10, 20, 30, 40, 50],
    'estimator__decision_tree__min_samples_split': [2, 5, 10, 15],
    'estimator__elastic_net__alpha': [0.1, 1, 10, 100],
    'estimator__elastic_net__l1_ratio': [0.1, 0.5, 0.9]
}

# Initialize RandomizedSearchCV
random_search_voting = RandomizedSearchCV(bagging_reg_voting, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)
# Train the grid search
random_search_voting.fit(X, y)

In [47]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 27
Best Hyperparameters: {'n_estimators': 10, 'estimator__kernel': 'linear', 'estimator__epsilon': 0.2, 'estimator__C': 100}
Best Cross-Validated Score: 0.999999148110774


In [None]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train BaggingRegressor without search

In [48]:
from sklearn.ensemble import BaggingRegressor
from sklearn.svm import SVR

# Create the individual regressors without the Pipeline of scaler
svr = SVR(C=1, epsilon=0.1, kernel='linear')
decision_tree = DecisionTreeRegressor(max_depth=5, min_samples_split=2)
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.4)

# Create the VotingRegressor with the different models
voting_regressor = VotingRegressor(estimators=[
    ('svr', svr),
    ('decision_tree', decision_tree),
    ('elastic_net', elastic_net)
])


model = BaggingRegressor(estimator=voting_regressor,n_estimators=50,random_state=42)
# model.fit(x_train, y_train)

