# AdaBoost Regression
AdaBoost (Adaptive Boosting) Regression is an ensemble learning technique that combines the predictions of multiple weak learners (typically decision trees) to create a strong predictive model. It works by sequentially training weak learners, each attempting to correct the errors of its predecessor, and combining their predictions weighted by their performance.

## Advantages:
- Improved Accuracy: Can significantly improve the accuracy of weak learners.
- Simple and Effective: Simple to implement and can achieve good performance with minimal parameter tuning.
- Robustness to Overfitting: Tends to be less prone to overfitting compared to other ensemble methods, especially when using decision trees as base estimators.

## Disadvantages:
- Sensitive to Noisy Data: Outliers and noise in the data can have a significant impact on the performance of the model.
- Computationally Intensive: Requires more computational resources and time compared to single models.
- Complexity: The model can become complex, making it harder to interpret.

## Use Case:
- Finance: Credit scoring and fraud detection.
- Healthcare: Predicting patient outcomes based on historical data.
- Marketing: Customer segmentation and response modeling.

## Scaling (not necessary and necessary Depend on the models)
AdaBoost itself does not require feature scaling, but if the base estimator does (e.g., Support Vector Regression), then scaling is necessary.

## Encoding (necessary)
Categorical data must be encoded into numerical values.

# Import Libraries

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from scipy.stats import uniform, loguniform
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import make_regression

In [3]:
# Generate a random regression problem
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# 1. AdaBoost with Default Estimator (Decision Tree)

## Grid Search

In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

# Create the Bagging Regressor with default estimator (DecisionTreeRegressor)
adaboost_reg = AdaBoostRegressor(estimator=DecisionTreeRegressor(), random_state=42)

# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.01, 0.1, 1, 10],
    'estimator__max_depth': [None, 3, 5, 7],
    'estimator__min_samples_split': [2, 5, 10]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(adaboost_reg, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(X, y)

In [5]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 3
Best Hyperparameters: {'estimator__max_depth': None, 'estimator__min_samples_split': 2, 'learning_rate': 0.01, 'n_estimators': 300}
Best Cross-Validated Score: 0.943708280678119


In [6]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [7]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

# Create the Bagging Regressor with default estimator (DecisionTreeRegressor)
adaboost_reg = AdaBoostRegressor(estimator=DecisionTreeRegressor(), random_state=42)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.01, 0.1, 1, 10],
    'estimator__max_depth': [None, 3, 5, 7, 10],
    'estimator__min_samples_split': [2, 5, 10, 15]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(adaboost_reg, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(X, y)

In [8]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 45
Best Hyperparameters: {'n_estimators': 300, 'learning_rate': 0.01, 'estimator__min_samples_split': 2, 'estimator__max_depth': None}
Best Cross-Validated Score: 0.943708280678119


In [9]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train AdaBoostRegressor without search

In [10]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor

model = AdaBoostRegressor(estimator=DecisionTreeRegressor(max_depth=10, min_samples_split=5),n_estimators=50,learning_rate=0.5,random_state=42)
# model.fit(x_train, y_train)

# 2. AdaBoost with a Single Estimator (Support Vector Regression)

## Grid Search

In [11]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import AdaBoostRegressor
from sklearn.svm import SVR


# Create the Bagging Regressor with SVR
adaboost_reg_svr  = AdaBoostRegressor(estimator=SVR(), random_state=42)


# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.01, 0.1, 1, 10],
    'estimator__C': [0.1, 1, 10],
    'estimator__epsilon': [0.1, 0.2, 0.5],
    'estimator__kernel': ['linear', 'poly', 'rbf']
}

# Initialize GridSearchCV
grid_search = GridSearchCV(adaboost_reg_svr , param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(X, y)

In [12]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 293
Best Hyperparameters: {'estimator__C': 10, 'estimator__epsilon': 0.1, 'estimator__kernel': 'linear', 'learning_rate': 0.1, 'n_estimators': 100}
Best Cross-Validated Score: 0.9999986551751465


In [13]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [14]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import AdaBoostRegressor
from sklearn.svm import SVR

# Create the Bagging Regressor with default estimator (DecisionTreeRegressor)
adaboost_reg_svr = AdaBoostRegressor(estimator=SVR(), random_state=42)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.01, 0.1, 1, 10],
    'estimator__C': [0.1, 1, 10],
    'estimator__epsilon': [0.1, 0.2, 0.5],
    'estimator__kernel': ['linear', 'poly', 'rbf']
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(adaboost_reg_svr , param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(X, y)

In [15]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 41
Best Hyperparameters: {'n_estimators': 100, 'learning_rate': 1, 'estimator__kernel': 'linear', 'estimator__epsilon': 0.1, 'estimator__C': 10}
Best Cross-Validated Score: 0.9999986343240389


In [16]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train AdaBoostRegressor without search

In [17]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.svm import SVR

model = AdaBoostRegressor(estimator=SVR(kernel='linear', epsilon=0.2, C=100),n_estimators=50,learning_rate=0.5,random_state=42)
# model.fit(x_train, y_train)

# 3. Bagging with Multiple Estimators (SVR, Decision Tree, ElasticNet)

## Grid Search

In [None]:
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import VotingRegressor, AdaBoostRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV

# Create the individual regressors without the Pipeline of scaler
svr = SVR()
decision_tree = DecisionTreeRegressor()
elastic_net = ElasticNet()

# Create the VotingRegressor with the different models
voting_regressor = VotingRegressor(estimators=[
    ('svr', svr),
    ('decision_tree', decision_tree),
    ('elastic_net', elastic_net)
])

# Create the Bagging Regressor with VotingRegressor
adaboost_reg_voting  = AdaBoostRegressor(estimator=voting_regressor, random_state=42)

# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [10, 50, 100, 200],
    'learning_rate': [0.01, 0.1, 1, 10],
    'estimator__svr__C': [0.1, 1, 10],
    'estimator__svr__epsilon': [0.1, 0.2, 0.5],
    'estimator__svr__kernel': ['linear', 'poly', 'rbf'],
    'estimator__decision_tree__max_depth': [None, 10, 20, 30],
    'estimator__decision_tree__min_samples_split': [2, 5, 10],
    'estimator__elastic_net__alpha': [0.1, 1, 10],
    'estimator__elastic_net__l1_ratio': [0.1, 0.5, 0.9]
}

# Initialize GridSearchCV
grid_search_voting = GridSearchCV(adaboost_reg_voting , param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search_voting.fit(X, y)

In [None]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

In [None]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [18]:
from sklearn.linear_model import ElasticNet
from sklearn.ensemble import VotingRegressor, AdaBoostRegressor
from sklearn.svm import SVR
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import RandomizedSearchCV

# Create the individual regressors without the Pipeline of scaler
svr = SVR()
decision_tree = DecisionTreeRegressor()
elastic_net = ElasticNet()

# Create the VotingRegressor with the different models
voting_regressor = VotingRegressor(estimators=[
    ('svr', svr),
    ('decision_tree', decision_tree),
    ('elastic_net', elastic_net)
])

# Create the Bagging Regressor with VotingRegressor
adaboost_reg_voting  = AdaBoostRegressor(estimator=voting_regressor, random_state=42)


# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [10, 50, 100, 200],
    'learning_rate': [0.01, 0.1, 1, 10],
    'estimator__svr__C': [0.1, 1, 10, 100],
    'estimator__svr__epsilon': [0.1, 0.2, 0.5, 1.0],
    'estimator__svr__kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
    'estimator__decision_tree__max_depth': [None, 10, 20, 30, 40, 50],
    'estimator__decision_tree__min_samples_split': [2, 5, 10, 15],
    'estimator__elastic_net__alpha': [0.1, 1, 10, 100],
    'estimator__elastic_net__l1_ratio': [0.1, 0.5, 0.9]
}

# Initialize RandomizedSearchCV
random_search_voting = RandomizedSearchCV(adaboost_reg_voting , param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)
# Train the grid search
random_search_voting.fit(X, y)

In [19]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 41
Best Hyperparameters: {'n_estimators': 100, 'learning_rate': 1, 'estimator__kernel': 'linear', 'estimator__epsilon': 0.1, 'estimator__C': 10}
Best Cross-Validated Score: 0.9999986343240389


In [20]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train AdaBoostRegressor without search

In [22]:
from sklearn.ensemble import AdaBoostRegressor
from sklearn.svm import SVR

# Create the individual regressors without the Pipeline of scaler
svr = SVR(C=1, epsilon=0.1, kernel='linear')
decision_tree = DecisionTreeRegressor(max_depth=5, min_samples_split=2)
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.4)

# Create the VotingRegressor with the different models
voting_regressor = VotingRegressor(estimators=[
    ('svr', svr),
    ('decision_tree', decision_tree),
    ('elastic_net', elastic_net)
])


model = AdaBoostRegressor(estimator=voting_regressor,n_estimators=100,learning_rate = 0.5,random_state=42)
# model.fit(x_train, y_train)