In [None]:
"""Q1. What is Gradient Boosting Regression?

    Ans: Gradient Boosting Regression is a machine learning technique that combines multiple weak regression models in an additive manner. It trains models in a stage-wise 
         fashion, with each model fitting to the negative gradient of the loss function. The final model is an ensemble of weak models, producing accurate regression predictions.
"""

In [None]:
"""Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. 
       Evaluate the model's performance using metrics such as mean squared error and R-squared."""

In [15]:
import numpy as np

class GradientBoostingRegressor:
    def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
        self.n_estimators = n_estimators
        self.learning_rate = learning_rate
        self.max_depth = max_depth
        self.models = []
        self.weights = []

    def fit(self, X, y):
        y_pred = np.mean(y)  # Initialize with the mean
        residuals = y - y_pred

        for _ in range(self.n_estimators):
            model = DecisionTreeRegressor(max_depth=self.max_depth)
            model.fit(X, residuals)
            self.models.append(model)

            # Update weights
            self.weights.append(self.learning_rate)
            y_pred += self.learning_rate * model.predict(X)
            residuals = y - y_pred

    def predict(self, X):
        y_pred = np.zeros(len(X))
        for model, weight in zip(self.models, self.weights):
            y_pred += weight * model.predict(X)
        return y_pred
    
    def get_params(self, deep=True):
        return {'n_estimators': self.n_estimators, 'learning_rate': self.learning_rate, 'max_depth': self.max_depth}

    def set_params(self, **parameters):
        for parameter, value in parameters.items():
            setattr(self, parameter, value)
        return self

# Example usage
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor

# Generate toy dataset
X, y = make_regression(n_samples=100, n_features=1, noise=0.1, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the gradient boosting model
gb_model = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb_model.fit(X_train, y_train)

# Evaluate the model's performance
y_pred = gb_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 31.735482349161565
R-squared: 0.9772379183627112


In [None]:
"""Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random 
       search to find the best hyperparameters."""

In [14]:
param_grid = {
    'n_estimators': [100,150],
    'learning_rate': [0.01, 0.1],
    'max_depth': [3,4]
}

# Create the gradient boosting regressor
gb_model = GradientBoostingRegressor()

# Perform random search
from sklearn.model_selection import GridSearchCV
random_search = GridSearchCV(gb_model, param_grid=param_grid, scoring='neg_mean_squared_error', cv=5)
random_search.fit(X_train, y_train)

# Print the best hyperparameters
print("Best Hyperparameters:", random_search.best_params_)

# Evaluate the model with the best hyperparameters
best_model = random_search.best_estimator_
y_pred = best_model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Best Model Performance:")
print("Mean Squared Error:", mse)
print("R-squared:", r2)


Best Hyperparameters: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100}
Best Model Performance:
Mean Squared Error: 31.735482349161565
R-squared: 0.9772379183627112


In [None]:
"""Q4. What is a weak learner in Gradient Boosting?

    Ans: A weak learner in gradient boosting is a simple and relatively low-performing model that can be trained to make predictions slightly better than random guessing. 
         It typically has low complexity, such as a decision tree with shallow depth, and is combined with other weak learners to create a strong ensemble model.
"""

In [None]:
"""Q5. What is the intuition behind the Gradient Boosting algorithm?

    Ans: The intuition behind the Gradient Boosting algorithm is to iteratively build an ensemble of weak models, where each subsequent model focuses on reducing the errors 
         made by the previous models. It does this by fitting each new model to the negative gradient (residuals) of the loss function of the ensemble. By continuously adjusting
         the weights of the weak models, Gradient Boosting aims to create a strong learner that gradually minimizes the overall loss and improves predictions.
"""

In [None]:
"""Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?

    Ans: The Gradient Boosting algorithm builds an ensemble of weak learners in an iterative manner. It starts with an initial prediction based on the average value of the 
         target variable. In each iteration, a new weak learner is trained to predict the negative gradient (residuals) of the loss function with respect to the current 
         ensemble's predictions. The predictions of all weak learners are combined with a weight assigned to each learner to create the final ensemble prediction.
"""

In [None]:
"""Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?

    Ans: 1) Initialization with a simple model 
         2) Calculation of residuals (negative gradients)
         3) Fitting a weak learner to the residuals 
         4) Updating the ensemble by adding the weak learner's predictions with a weight
         5) Iteratively repeating these process multiple times.
"""