### Theory & Maths about Linear Regression

1. Training linear regression means finding the best coefficients
2. Best coefficients can be obtained with gradient decent algorithm

### Math behind linear regression

1. Y = mx + c (m = slopes, c = intercept)
2. Finding weights (slopes) and bias (intercepts) can be done with gradient decent

$$ \large \hat{y} = wx + b $$

Cost function -  To minimize Mean Squared Error
$$ \large MSE = \frac{1}{N} \sum_{i=1}^{n} ((y_i - \hat{y})^2 $$

$$ \large MSE = \frac{1}{N} \sum_{i=1}^{n} ((y_i - (wx_i + b))^2 $$

MSE with respect to each parameter
$$ \large \partial_w = \frac{1}{N} \sum_{i=1}^{n} 2x_i(\hat{y} - y) $$
$$ \large \partial_b = \frac{1}{N} \sum_{i=1}^{n} 2(\hat{y} - y) $$

Updating weithts and bias

$$ \large w = w - \alpha \cdot \partial_w $$
$$ \large b = b - \alpha \cdot \partial_b $$

In [21]:
# important packages
from sklearn.datasets import load_diabetes # toy dataset
from sklearn.model_selection import train_test_split # to split datasets
from sklearn.linear_model import LinearRegression # for comparision
from sklearn.metrics import mean_squared_error # to evaluate loss function
import numpy as np

In [22]:
class MyLinearRegression:
    
    def __init__(self, learning_rate=0.01, no_iterations=10000):
        self.learning_rate = learning_rate
        self.no_iterations = no_iterations
        self.weights, self.bias = None, None
        self.loss = []
    
    @staticmethod
    def _mean_sqr_error(y, y_hat):
        error = 0
        for i in range(len(y)):
            error += (y[i] - y_hat[i]) **2
        return error / len(y)
    
    def fit(self, X, y):
        self.weights = np.zeros(X.shape[1])
        self.bias = 0
        
        for i in range(self.no_iterations):
            y_hat = np.dot(X, self.weights) + self.bias
            loss = self._mean_sqr_error(y, y_hat)
            self.loss.append(loss)
            
            partial_w = (1 / X.shape[0]) * (2 * np.dot(X.T, (y_hat - y)))
            partial_d = (1 / X.shape[0]) * (2 * np.sum(y_hat - y))
            
            self.weights -= self.learning_rate * partial_w
            self.bias -= self.learning_rate * partial_d
            
    def predict(self, X):
        return np.dot(X, self.weights) + self.bias
        

In [23]:
data = load_diabetes()
X = data.data
y = data.target

In [24]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)

In [25]:
model = MyLinearRegression()
model.fit(X_train, y_train)
my_pred = model.predict(X_test)
model.weights

array([  57.01554737,  -44.13781863,  269.89143674,  192.64514651,
         27.72222515,    2.18960209, -147.70003766,  136.38206323,
        224.11121915,  134.72758341])

In [26]:
model.bias

152.2631135652031

In [27]:
mean_squared_error(y_test, my_pred)

3090.645533651321

In [28]:
sk_model = LinearRegression()
sk_model.fit(X_train, y_train)
sk_pred = sk_model.predict(X_test)
sk_pred

array([139.5483133 , 179.52030578, 134.04133298, 291.41193598,
       123.78723656,  92.17357677, 258.23409704, 181.33895238,
        90.22217862, 108.63143298,  94.13938654, 168.43379636,
        53.50669663, 206.63040068, 100.13238561, 130.66881649,
       219.53270758, 250.78291772, 196.36682356, 218.57497401,
       207.35002447,  88.48361667,  70.43428801, 188.95725301,
       154.88720039, 159.35957695, 188.31587948, 180.38835506,
        47.98988446, 108.97514644, 174.78080029,  86.36598906,
       132.95890535, 184.5410226 , 173.83298051, 190.35863287,
       124.41740796, 119.65426903, 147.95402494,  59.05311211,
        71.62636914, 107.68722902, 165.45544477, 155.00784964,
       171.04558668,  61.45763075,  71.66975626, 114.96330486,
        51.57808027, 167.57781958, 152.52505798,  62.95827693,
       103.49862017, 109.20495627, 175.63844013, 154.60247734,
        94.41476124, 210.74244148, 120.25601864,  77.61590087,
       187.93503183, 206.49543321, 140.63018684, 105.59

In [29]:
mean_squared_error(y_test,sk_pred)

2900.1732878832318