## 17 APRIL

Q1. What is Gradient Boosting Regression?



    Gradient Boosting Regression is a machine learning technique used for regression problems. It works by combining the predictions of multiple weak regression models, typically decision trees, into a strong regression model. The algorithm sequentially trains these weak models, each one focused on the residuals (the differences between the actual and predicted values) of the previous models. By iteratively reducing the errors, it constructs a robust and accurate regression model.



Q2. Implement a simple gradient boosting algorithm from scratch using Python and NumPy. Use a simple regression problem as an example and train the model on a small dataset. Evaluate the model's performance using metrics such as mean squared error and Rsquared.

In [2]:
import seaborn as sns 

In [20]:
df=sns.load_dataset("tips")

In [21]:
df.head()

Unnamed: 0,total_bill,tip,sex,smoker,day,time,size
0,16.99,1.01,Female,No,Sun,Dinner,2
1,10.34,1.66,Male,No,Sun,Dinner,3
2,21.01,3.5,Male,No,Sun,Dinner,3
3,23.68,3.31,Male,No,Sun,Dinner,2
4,24.59,3.61,Female,No,Sun,Dinner,4


In [24]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
 #   Column      Non-Null Count  Dtype   
---  ------      --------------  -----   
 0   total_bill  244 non-null    float64 
 1   tip         244 non-null    float64 
 2   sex         244 non-null    int64   
 3   smoker      244 non-null    category
 4   day         244 non-null    category
 5   time        244 non-null    category
 6   size        244 non-null    int64   
dtypes: category(3), float64(2), int64(2)
memory usage: 8.9 KB


In [25]:
from sklearn.preprocessing import LabelEncoder

In [26]:
lbl=LabelEncoder()

In [27]:
df["sex"]=lbl.fit_transform(df["sex"])
df["smoker"]=lbl.fit_transform(df["smoker"])
df["day"]=lbl.fit_transform(df["day"])
df["time"]=lbl.fit_transform(df["time"])

In [28]:
X=df.drop("total_bill",axis=1)
y=df["total_bill"]

In [29]:
from sklearn.model_selection import train_test_split

In [15]:
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.30,random_state=42)

In [30]:
from sklearn.ensemble import GradientBoostingRegressor

In [31]:
GBC=GradientBoostingRegressor()

In [32]:
GBC.fit(X_train,y_train)

In [33]:
y=GBC.predict(X_test)

In [34]:
from sklearn.metrics import r2_score,mean_absolute_error,mean_squared_error

In [35]:
print("r2_score",r2_score(y,y_test))
print("mean_absolute_error",mean_absolute_error(y,y_test))
print("mean_squared_error",mean_squared_error(y,y_test))

r2_score 0.9887290811288316
mean_absolute_error 0.037700091599628886
mean_squared_error 0.007756990694769767


Q3. Experiment with different hyperparameters such as learning rate, number of trees, and tree depth to optimize the performance of the model. Use grid search or random search to find the best hyperparameters.


In [36]:
param={
    "loss":["squared_error","absolute_error","huber","quantile"],
    "learning_rate":[0.1,0.2,0.3],
    "n_estimators":[100,200,300],
    
}

In [37]:
from sklearn.model_selection import GridSearchCV

In [41]:
clf=GridSearchCV(GBC,param_grid=param,scoring=r2_score,cv=3,verbose=5)

In [44]:
import warnings
warnings.filterwarnings("ignore")

In [45]:
clf.fit(X_train,y_train)

Fitting 3 folds for each of 36 candidates, totalling 108 fits
[CV 1/3] END learning_rate=0.1, loss=squared_error, n_estimators=100;, score=nan total time=   0.1s
[CV 2/3] END learning_rate=0.1, loss=squared_error, n_estimators=100;, score=nan total time=   0.1s
[CV 3/3] END learning_rate=0.1, loss=squared_error, n_estimators=100;, score=nan total time=   0.1s
[CV 1/3] END learning_rate=0.1, loss=squared_error, n_estimators=200;, score=nan total time=   0.1s
[CV 2/3] END learning_rate=0.1, loss=squared_error, n_estimators=200;, score=nan total time=   0.1s
[CV 3/3] END learning_rate=0.1, loss=squared_error, n_estimators=200;, score=nan total time=   0.1s
[CV 1/3] END learning_rate=0.1, loss=squared_error, n_estimators=300;, score=nan total time=   0.2s
[CV 2/3] END learning_rate=0.1, loss=squared_error, n_estimators=300;, score=nan total time=   0.2s
[CV 3/3] END learning_rate=0.1, loss=squared_error, n_estimators=300;, score=nan total time=   0.2s
[CV 1/3] END learning_rate=0.1, loss=a

In [46]:
clf.best_params_

{'learning_rate': 0.1, 'loss': 'squared_error', 'n_estimators': 100}

In [47]:
y_pred=clf.predict(X_test)

In [48]:
print("r2_score",r2_score(y_pred,y_test))
print("mean_absolute_error",mean_absolute_error(y_pred,y_test))
print("mean_squared_error",mean_squared_error(y_pred,y_test))

r2_score 0.9889394919086069
mean_absolute_error 0.03785790522740534
mean_squared_error 0.007602494841732722


Q4. What is a weak learner in Gradient Boosting?


    A weak learner in Gradient Boosting is a simple model, often a shallow decision tree, that performs slightly better than random guessing but is not highly accurate on its own. Weak learners are the building blocks of the ensemble in Gradient Boosting. The algorithm sequentially trains and combines these weak learners, with each one focused on the errors made by the previous ones, to create a strong and accurate predictive model.

    
    
    
Q5. What is the intuition behind the Gradient Boosting algorithm?



    The intuition behind Gradient Boosting is to build a strong model by sequentially correcting the errors of previous models. It starts with a weak learner and iteratively adds new weak learners, each trained to capture the errors (residuals) of the ensemble's current prediction. This process effectively reduces the overall error and leads to a powerful predictive model.

    
    
    
Q6. How does the Gradient Boosting algorithm build an ensemble of weak learners?



    Gradient Boosting builds an ensemble by training weak learners sequentially. It starts with an initial prediction (often the mean of the target values) and then adds weak learners that are trained to predict the errors (residuals) of the previous ensemble's predictions. These weak learners are combined, and the process continues until a predetermined number of iterations are reached or until performance plateaus.

    
    
    
Q7. What are the steps involved in constructing the mathematical intuition of the Gradient Boosting algorithm?



    The mathematical intuition of Gradient Boosting involves several steps:
     1. Initialize the model with a simple prediction (e.g., the mean of target values).
     2. Calculate the residuals (errors) between the actual and initial predicted values.
     3. Train a weak learner (e.g., decision tree) to predict these residuals.
     4. Update the model by adding the predictions of the weak learner, scaled by a learning rate.
     5. Repeat steps 24 to iteratively improve predictions and reduce residuals.
     6. The final ensemble prediction is the sum of all individual weak learner predictions. This process minimizes the overall error and creates a robust model.

