<h1 style="text-align: center; color: #007bff; font-size: 2em;">
    📘✏️ Create a <span style="color: #ff5733;">Linear Regression</span> Model from Scratch 🚀📊
</h1>

### Content
1. [Analysing what the Linear Regrresion model does](#1)
2. [Estimating the Loss](#2)
3. [Minimizing the Cost Function $J(w, b)$](#3)
4. [Gradient Descent and Finding the Best Fit](#4)
5. [Crating the Model](#5)

In [101]:
import numpy as np
import matplotlib.pyplot as plt

In [102]:
np.random.seed(42)

<a name='1'></a>
### Analysing what the Linear Regrresion model does

We can think of this model as a neural network with one layer and one neuron, without an activation function. 
___

Now, let's take a neural network with its parameters already tuned and analyze what it does:

1. Takes an input matrix of the data with dimensions $\displaystyle (m, n_x) $.
2. Outputs an array for the prediction of a certain feature.
3. The prediction is generated by creating a linear graph that emulates the patterns in the data as accurately as possible.

lets make a function that generates the $w$ and $b$ parameters

In [42]:
def generate_params(dim):
    ''' Takes as input the amount of fetures in the training set (n_x)'''
    w = np.zeros((dim, 1))
    b = 0.0
    return w, b

So lets make a function that takes as input $X, w, b$ and output the prediction ($\hat{y} = wx + b$)

$$
\hat{y} = 
\begin{pmatrix} 
    w_1 \\ 
    \vdots\\
    \vdots\\
    w_{n_x} 
\end{pmatrix}^T
\cdot \space
\begin{pmatrix} 
    x_{1,1} & \vdots & \vdots & \vdots & x_{1,n_x}\\
       \vdots & \vdots & \vdots & \vdots & \vdots\\
    x_{m, 1} & \vdots & \vdots & \vdots & x_{m, n_x}\\
\end{pmatrix}
+ b
$$

In [58]:
def predict(X, w, b):
    return np.dot(w.T, X) + b

<a name='2'></a>
### Estimating the Loss
now that we know how to predict data we need a way to train the model to find the optimal values for $w$ and $b$
* so how could we check what line fits the data best?
for knowing which is the best line we could check how far away is from evry datapoint in the training set. or in other words the difference between $y$ and $\hat{y}$.
* $y$ is a vector of all the true values and $\hat{y}$ is a vector of all of the predicted values.

the formula for this will be taking the L2 norm of $\displaystyle||y - \hat{y}||_2$ which is $\displaystyle\frac{1}{m}\sum_{i=1}^{m}(y_i - \hat{y}_i)^2$

In [44]:
def mse_cost(y, y_hat):
    ''' Mean Squared Error (MSE) cost function '''
    return np.mean((y - y_hat)**2)

<a name='3'></a>
### Minimizing the Cost Function $J(w, b)$
now we have a function that claculates the fit of a certaine line, the next step is to minimize the cost function to find the optimal valus for $w$ and $b$.
to find the minimun value for each parameter we take the derivative of it with respect to the cost function



<div style="border-left: 4px solid #007bff; padding: 10px; background-color: #cce5ff; color: #004085;">
📘 <b>Note:</b> the intuition for this is that when $\frac{\partial{J}}{\partial{b}}, \frac{\partial{J}}{\partial{w}}$ $\to{0}$ you have found the best values for $w$ and $b$
</div>

so lets take the derivative of each parameter with respect to the cost function. this will yeild a metric of estimating when is the best fit.
$\displaystyle J(w, b) = \frac{1}{m}\sum\left(y - (w^Tx + b)\right)^2$

$\displaystyle\frac{\partial{J}}{\partial{w}} = \frac{1}{m}\sum2(y - (w^Tx +b))^{2-1}\frac{\partial}{\partial{w}}(y - (w^Tx + b)) \to \frac{1}{m}\sum2(y - (w^Tx + b))(-x) \to \color{#cce5ff}{\frac{-2}{m} \sum{x(y - w^Tx - b))}}$

$\displaystyle\frac{\partial{J}}{\partial{b}} = \frac{1}{m}\sum2(y - (w^Tx + b)^{2-1}  \frac{\partial}{\partial{b}}(y - (w^Tx + b) \to \frac{1}{m}\sum-2(y - (w^Tx + b)) \to \color{#cce5ff}{\frac{-2}{m}\sum(y - w^Tx - b))}$

In [45]:
def compute_gradiants(w, b, X, y):
    ''' returns a dictionary of dw, db and the mse cost '''
    m = len(y)
    y_hat = np.dot(w.T, X) + b
    
    dw = (-2/m) * np.sum(X*(y - y_hat))
    db = (-2/m) * np.sum((y - y_hat))
                         
    gards = {'dw': dw, 
             'db': db}
    cost = mse_cost(y, y_hat)
    
    return grads, cost

<a name='4'></a>
### Gradient Descent and Finding the Best Fit
Now that we have a way of getting the derivative we need a way of reaching the best fit. This is done with an alogorithem called gradiant decent.

Gradiant decent is an algorithem that pushes the values of $w, b$ in the right direction.
___

What this mean is that we go through a loop that runs a certain amount of times, and each time we calculate $\frac{\partial{J}}{\partial{w}}, \frac{\partial{J}}{\partial{b}}$. which we know are the slopes of the function $J(w, b)$ so if we think of the function $J$ as a hill, the negative of the slope tells us which way is down.
___

What gradiant decent does is push $w, b$ by $-\alpha$ (aka: learning rate) times $\frac{\partial{J}}{\partial{w}} , \frac{\partial{J}}{\partial{b}}$. or in other words we push $w, b$ down hill to reach best fit. 
> In each loop we do the folloing operation - 

$$
b = b -\alpha\left(\frac{\partial{J}}{\partial{b}}\right) \quad w = w -\alpha\left(\frac{\partial{J}}{\partial{w}}\right)
$$

In [49]:
def gradiant_decent(w, b, X, y, learning_rate=0.001, epochs=1000):
    ''' Rreturns the optimal patameters for w and b and a dictionary conatining the changing cost per 100 loops.'''
    costs_dict = {}
    for i in range(epochs):
        grads, cost = compute_gradiants(w, b, X, y)
        w -= learning_rate * grads['dw']
        b -= learning_rate * grads['db']
        
        if 100 % i == 0:
            costs_dict[i] = cost 
            
    return w, b, costs_dict

<a name='5'></a>
### Crating the Model
Now we have every thing we need to create and train the linear regreesion model. 
> No lets create a class to better track the training process, and see the model in action.

In [52]:
class Regression:
    def __init__(self, X_train, y_train):
        self.X_train = X_train
        self.y_train = y_train
        
        self.features = self.X_train.shape[1]
        self.w, self.b = generate_params(features)
        
    def fit(self, learning_rate=0.001, epochs=1000):
        w, b, costs_dict = gradiant_decent(self.w, self.b, self.X_train, self.y_train, learning_rate, epochs)
        self.costs = costs
        
    def plot_learning_curve(self, costs_dict):
        
        items = costs_dict.items()
        iterations = [item[0] for item in items]
        cost = [item[1] for item in items]
        
        plt.plot(iterations, cost)
        plt.show()