## Q1. What is Gradient Boosting Regression?

Ans: 
Gradient Boosting Regression is a machine learning algorithm that belongs to the family of boosting algorithms. It is used for regression problems, where the goal is to predict a continuous numerical value.

In Gradient Boosting Regression, the model is built by sequentially adding weak learners to the model, where each weak learner tries to predict the residual error of the previous weak learner. Specifically, the algorithm starts by fitting a simple model to the data, such as a decision tree with a small depth. Then, it calculates the error between the predicted values and the true values of the target variable.

In the subsequent iterations, the algorithm trains a new weak learner to predict the residual error from the previous iteration. This process is repeated multiple times, with each new learner trying to improve the performance of the previous model. The final model is a weighted combination of all the weak learners, where each learner is assigned a weight that depends on its performance.

The name "gradient" in Gradient Boosting Regression comes from the fact that the algorithm uses gradient descent optimization to minimize the loss function, which is typically mean squared error (MSE) or another differentiable loss function.

Gradient Boosting Regression is known for its high accuracy and ability to handle non-linear relationships between the input features and the target variable. It is a popular algorithm in machine learning and is widely used in various applications, including in finance, healthcare, and marketing.






## Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and R-squared.

Ans: 

In [4]:
import numpy as np
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error,r2_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingRegressor

X,y=make_regression(n_samples=100,n_features=5,noise=0.1,random_state=42)

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.33,random_state=42)

classifier1=GradientBoostingRegressor()

classifier1.fit(X_train,y_train)

y_pred1=classifier1.predict(X_test)

print('Mean squared error is ',mean_squared_error(y_pred1,y_test))
print('R-Squared is ',r2_score(y_pred1,y_test))

Mean squared error is  4033.8451702494444
R-Squared is  0.6545350362847577


## Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimise the performance of the model. Use grid search or random search to find the best hyperparameters.
Ans: 

In [22]:
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.datasets import make_regression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error,r2_score


X,y=make_regression(n_samples=100,n_features=5,noise=0.1)

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=42)

params={
    'n_estimators': [50, 100, 200],
    'max_depth': [2, 3, 4],
    'learning_rate': [0.01, 0.1, 0.2]
}

model=GradientBoostingRegressor()

In [23]:
grid=GridSearchCV(model,param_grid=params,cv=5)

In [24]:
grid.fit(X_train,y_train)

In [25]:
y_pred1=grid.predict(X_test)

In [26]:
print(mean_squared_error(y_pred1,y_test))
print(r2_score(y_pred1,y_test))

5796.985415550791
0.6528316922316911


## Q4. What is a weak learner in Gradient Boosting?
Ans:
Gradient Boosting is a popular ensemble machine learning algorithm that works by sequentially adding weak learners to the model, with each learner attempting to correct the errors of the previous learner.

A weak learner is a simple model that has low predictive power on its own, but when combined with other weak learners in an ensemble, it can contribute to a strong predictive model. In Gradient Boosting, the weak learner is typically a decision tree with very few splits (also known as a shallow decision tree).

During each iteration of the Gradient Boosting algorithm, a weak learner is trained on the residuals (i.e., the difference between the predicted values and the true values) of the previous learner. The weak learner's predictions are then added to the current ensemble, and the process is repeated until the desired number of learners is reached, or until the model's performance on a validation set plateaus.

In summary, a weak learner in Gradient Boosting is a simple, low-performing model that is used in combination with other weak learners to build a strong predictive model.






## Q5. What is the intuition behind the Gradient Boosting algorithm?
Ans:
The intuition behind the Gradient Boosting algorithm is to combine multiple weak learners into a single strong learner in a way that minimizes the errors in the predictions.

The algorithm works by iteratively adding weak learners to the model, with each learner attempting to correct the errors made by the previous learners. Specifically, the algorithm starts by fitting a weak learner (often a decision tree) to the data, and then it calculates the errors made by the model.

Next, it trains a new weak learner on the residuals (i.e., the difference between the predicted values and the true values) of the previous model. This new model attempts to correct the errors made by the previous model, and its predictions are added to the ensemble. The process is repeated until a stopping criterion is met (e.g., a maximum number of iterations is reached, or the performance on a validation set plateaus).

During each iteration, the algorithm uses gradient descent to find the optimal direction and magnitude of the update to the model's parameters that will minimize the errors. This is done by calculating the gradient of the loss function with respect to the model's predictions and then updating the model's parameters in the direction of the negative gradient.

The intuition behind this process is that by combining multiple weak learners into an ensemble, the model can capture more complex patterns in the data than any individual learner could. By iteratively adding new learners and updating the parameters of the existing ones, the model can gradually improve its predictive accuracy, leading to a strong learner that can make accurate predictions on unseen data.






## Q6. How does Gradient Boosting algorithm build an ensemble of weak learners?
Ans: 
The Gradient Boosting algorithm builds an ensemble of weak learners by sequentially adding new learners to the model, with each new learner attempting to correct the errors of the previous learners.

At a high level, the algorithm works as follows:

Fit an initial weak learner (often a decision tree) to the data.
Calculate the errors made by the model.
Train a new weak learner on the residuals (i.e., the difference between the predicted values and the true values) of the previous model. This new model attempts to correct the errors made by the previous model.
Add the new model's predictions to the ensemble.
Repeat steps 2-4 until a stopping criterion is met (e.g., a maximum number of iterations is reached, or the performance on a validation set plateaus).
During each iteration of the algorithm, the model uses gradient descent to find the optimal direction and magnitude of the update to the model's parameters that will minimize the errors. Specifically, it calculates the gradient of the loss function with respect to the model's predictions and then updates the model's parameters in the direction of the negative gradient.

The result is an ensemble of weak learners, where each learner attempts to correct the errors of the previous learners. By combining the predictions of all the learners in the ensemble, the model can make more accurate predictions than any individual learner could on its own.

It's worth noting that there are several variations of the Gradient Boosting algorithm, including variants that use different types of weak learners (e.g., linear models or neural networks) or that use different loss functions and regularization techniques. However, the basic idea of sequentially adding weak learners to the model to form an ensemble remains the same across all variants.






## Q7. What are the steps involved in constructing the mathematical intuition of Gradient Boosting algorithm?
Ans: 
The mathematical intuition of Gradient Boosting algorithm can be constructed in the following steps:

1. Define a loss function L(y, F(x)) that measures the difference between the true label y and the predicted label F(x) for a given input x.

2. Initialize the model with a constant value that minimizes the loss function over the training data. This value can be set to the mean of the training labels or to any other value that minimizes the loss function.

3. For each iteration of the algorithm, fit a new weak learner to the negative gradient of the loss function with respect to the current predictions of the model. The negative gradient tells us the direction in which we should update the model's predictions to reduce the loss function.

4. Add the new weak learner to the ensemble by multiplying it with a learning rate and adding it to the current predictions of the model. The learning rate controls the contribution of each weak learner to the ensemble and can be set to a small value to avoid overfitting.

5. Repeat steps 3-4 until a stopping criterion is met, such as a maximum number of iterations or a minimum improvement in the validation error.

6. Finally, evaluate the performance of the model on a held-out test set.

The intuition behind these steps is that by iteratively adding weak learners that correct the errors made by the previous learners, the model can gradually improve its predictive accuracy. The negative gradient of the loss function with respect to the current predictions of the model tells us the direction in which we should update the model's predictions to reduce the loss function. By adding each new weak learner to the ensemble with a small learning rate, we can control the contribution of each learner to the model and avoid overfitting.

Overall, Gradient Boosting algorithm combines the strengths of weak learners to form a strong learner that can make accurate predictions on unseen data. The mathematical intuition of the algorithm provides a rigorous framework for understanding and implementing it effectively.




