# Gradient Boosting Regression
Gradient Boosting Regression is an ensemble learning technique that builds models sequentially, each new model attempting to correct the errors of the previous models. It combines the predictions of multiple weak learners, typically decision trees, to produce a strong predictive model. Gradient Boosting optimizes the model by minimizing a specified loss function through gradient descent.

## Advantages:
- High Accuracy: Often provides better accuracy compared to other regression methods.
- Flexibility: Can optimize different loss functions and be used for both regression and classification tasks.
- Feature Importance: Provides insights into the importance of features.

## Disadvantages:
- Computationally Intensive: Training can be slow and resource-intensive, especially with large datasets.
- Overfitting: Prone to overfitting if not properly tuned (e.g., number of trees, learning rate).
- Parameter Sensitivity: Requires careful tuning of hyperparameters.

## Use Case:
- Finance: Predicting stock prices, credit scoring, and risk management.
- Healthcare: Disease prediction and patient outcome prediction.
- Marketing: Customer segmentation, churn prediction, and sales forecasting.

## Scaling (not necessary and necessary Depend on the models)
Gradient Boosting itself does not require feature scaling, but if the base estimator does (e.g., Support Vector Regression), then scaling is necessary.

## Encoding (necessary)
Categorical data must be encoded into numerical values.

# Import Libraries

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
from scipy.stats import uniform, loguniform
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV
from sklearn.datasets import make_regression

In [3]:
# Generate a random regression problem
X, y = make_regression(n_samples=100, n_features=2, noise=0.1)

# 1. GradientBoosting with Default Estimator (Decision Tree)

## Grid Search

In [4]:
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import GradientBoostingRegressor

# Create the GradientBoosting Regressor with default estimator (DecisionTreeRegressor)
GradientBoosting_reg = GradientBoostingRegressor(random_state=42)

# Define parameter grid for GridSearchCV
param_grid = {
    'n_estimators': [50, 100, 200, 300],
    'learning_rate': [0.01, 0.1, 0.2, 0.3],
    'max_depth': [3, 5, 7, 9],
    'min_samples_split': [2, 5, 10],
    'min_samples_leaf': [1, 2, 4]
}

# Initialize GridSearchCV
grid_search = GridSearchCV(GradientBoosting_reg, param_grid, cv=5, n_jobs=-1)

# Train the grid search
grid_search.fit(X, y)

In [5]:
print("Best Hyperparameter Index:", grid_search.best_index_)
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Cross-Validated Score:", grid_search.best_score_)

Best Hyperparameter Index: 297
Best Hyperparameters: {'learning_rate': 0.2, 'max_depth': 3, 'min_samples_leaf': 1, 'min_samples_split': 10, 'n_estimators': 100}
Best Cross-Validated Score: 0.9898663571508417


In [6]:
# Get the model with best hyperparameters
model = grid_search.best_estimator_
# y_pred = model.predict(x_test)

## Randomized Search

In [7]:
from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor

# Create the GradientBoosting Regressor with default estimator (DecisionTreeRegressor)
GradientBoosting_reg = GradientBoostingRegressor(random_state=42)

# Define parameter distribution for RandomizedSearchCV
param_dist = {
    'n_estimators': [50, 100, 200, 300, 400, 500],
    'learning_rate': [0.01, 0.1, 0.2, 0.3, 0.5],
    'max_depth': [3, 5, 7, 9, 11],
    'min_samples_split': [2, 5, 10, 15],
    'min_samples_leaf': [1, 2, 4, 6]
}

# Initialize RandomizedSearchCV
random_search = RandomizedSearchCV(GradientBoosting_reg, param_distributions=param_dist, n_iter=50, cv=5, n_jobs=-1, random_state=42)

# Train the grid search
random_search.fit(X, y)

In [8]:
print("Best Hyperparameter Index:", random_search.best_index_)
print("Best Hyperparameters:", random_search.best_params_)
print("Best Cross-Validated Score:", random_search.best_score_)

Best Hyperparameter Index: 39
Best Hyperparameters: {'n_estimators': 200, 'min_samples_split': 10, 'min_samples_leaf': 1, 'max_depth': 3, 'learning_rate': 0.2}
Best Cross-Validated Score: 0.9898485582201115


In [9]:
model = random_search.best_estimator_
# y_pred = model.predict(x_test)

## Train GradientBoostingRegressor without search

In [10]:
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.tree import DecisionTreeRegressor

model = GradientBoostingRegressor(n_estimators=50,learning_rate=0.5,max_depth=4,min_samples_split=11,min_samples_leaf=2,random_state=42)
# model.fit(x_train, y_train)