# In this notebook we'll apply batch gradient descent from scratch.

## Gradient Descent
**The goal of the gradient descent is to minimise a given function which, in our case, is the loss function of the mode. To achieve this goal, it performs two steps iteratively.**
1. Compute the slope (gradient) that is the first-order derivative of the function at the current point
2. Move-in the opposite direction of the slope increase from the current point by the computed amount

![](https://miro.medium.com/max/875/1*P7z2BKhd0R-9uyn9ThDasA.png)

## Batch Gradient Descent
**In Batch Gradient Descent, all the training data is taken into consideration to take a single step. We take the average of the gradients of all the training examples and then use that mean gradient to update our parameters. So that’s just one step of gradient descent in one epoch.**
***Batch Gradient Descent is great for convex or relatively smooth error manifolds.***

![](https://miro.medium.com/max/735/1*44QbDJ9gJvw8tXtHNVLoCA.png)

*Documentation reference is from www.towardsdatascience.com

In [1]:
# importing libraries and using inbuilt dataset for this process.
from sklearn.datasets import load_diabetes  

import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

In [2]:
X,y = load_diabetes(return_X_y=True)

In [3]:
print(X.shape)
print(y.shape)

(442, 10)
(442,)


In [4]:
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=2)  # train test split

In [5]:
reg = LinearRegression()
reg.fit(X_train,y_train)

LinearRegression()

In [6]:
print(reg.coef_)
print(reg.intercept_)

[  -9.16088483 -205.46225988  516.68462383  340.62734108 -895.54360867
  561.21453306  153.88478595  126.73431596  861.12139955   52.41982836]
151.88334520854633


## R2 Score
![](https://wikimedia.org/api/rest_v1/media/math/render/svg/6b863cb70dd04b45984983cb6ed00801d5eddc94)

*R-squared is a statistical measure of how close the data are to the fitted regression line. It is also known as the coefficient of determination, or the coefficient of multiple determination for multiple regression. ... 100% indicates that the model explains all the variability of the response data around its mean*

In [7]:
y_pred = reg.predict(X_test)
r2_score(y_test,y_pred)

0.4399387660024644

In [8]:
X_train.shape

(353, 10)

## Updating intercept 
![](https://miro.medium.com/max/1192/0*VHSZPjAofkxFk3i2.png)

**Step 1**

For Calculating y hat we'll use matrix multiplication(dot product) of coefficint(matrix) and X_train(matrix).

**Step2**

Now, using y hat calculate the derivative which is the mean of (y_train - y hat) which is clearly shown in formula in above image.

**Step 3**

Now, we've calculated the derivative of the loss function w.r.t to intercept. So, new intercept will be (i_old - learning_rate*intercept_derivative)

## Updating Coefficients

**Step 1**

We know that number of coefficient is equal to the number of features, so we have to calculate that many derivatives in one run.
By multiplying whole X_train in form of matrix using dot product with each derivative in above image will do the work.

**Step 2**

(coef_old - learning_rate*coef_derivative)

Now we have a vector of coef_derivatives easily

***IMPLEMENTATION*** 

In [9]:
class GDRegressor:
    
    def __init__(self,learning_rate=0.01,epochs=100):
        
        self.coef_ = None
        self.intercept_ = None
        self.lr = learning_rate
        self.epochs = epochs
        
    def fit(self,X_train,y_train):
        # init your coefficients from X_train because coefficients are equal to the number of features
        self.intercept_ = 0
        self.coef_ = np.ones(X_train.shape[1]) # [1] take the columns from (353, 10) <-- shape of X_train
        
        for i in range(self.epochs):
            # update all the coefficients and the intercept
            y_hat = np.dot(X_train,self.coef_) + self.intercept_
            #print("Shape of y_hat",y_hat.shape)
            intercept_der = -2 * np.mean(y_train - y_hat)
            self.intercept_ = self.intercept_ - (self.lr * intercept_der)
            
            coef_der = -2 * np.dot((y_train - y_hat),X_train)/X_train.shape[0]
            self.coef_ = self.coef_ - (self.lr * coef_der)
        
        print(self.intercept_,self.coef_)
    
    def predict(self,X_test):
        return np.dot(X_test,self.coef_) + self.intercept_

In [10]:
gdr = GDRegressor(epochs=1000,learning_rate=0.5)  # building model

In [11]:
gdr.fit(X_train,y_train)  # fitting data

152.0135263267291 [  14.38915082 -173.72674118  491.54504015  323.91983579  -39.32680194
 -116.01099114 -194.04229501  103.38216641  451.63385893   97.57119174]


In [12]:
y_pred = gdr.predict(X_test)  # testing on X_test

In [13]:
r2_score(y_test,y_pred)  # improved r2_score, changing the learning rate will tune r2_score

0.4534524671450598