<h2 style="text-align:center;">Linear Regression</h2>
<p>The idea of regression is to predict continous values</P>
<p>The goal of Linear Regression is to approximate the best fit linear equation to a plot as shown:</p>
<img src="../images/linear_regression.png">
<p>From Algebra, we recall that the linear line equation is:</p>



\begin{align}
\hat{y} = wx + b
\end{align}

<p>where:</p>
<p style="text-align:center;"><i>y_hat</i>  is the approximation</p>
<p style="text-align:center;"><i>w</i>  is the slop or weight</p>
<p style="text-align:center;"><i>b</i>  is the intercept or bias</p>
<p style="text-align:center;"><i>x</i>  is the input variable</p>

<p>to figure out 'w' and 'b' we will need to recursively update the values</p>
<p> this can be done by the <strong>Mean Squared Error</strong> (the cost function)</p>
<p>we want to find the difference between the actual value <i>y_i</i> and approximated value <i> (wx +b)</i> for all elelements <i>n</i>.</p>
<p>the value is squared to avoid negatives and sum is divided by the number of samples to give us the mean</p>



\begin{align}
MSE = J(w,b) =  \frac{1}{N} \sum_{i=1}^{n} (y_i - (w x_i + b))^2
\end{align}

<h2>Mean Squared Error</h2><p>(cost function)</p>
<p>will need to be minimized</p>
<p> to do this, we can take the derivitave of each varaible with respect to the other</p>
<p>this is also called the gradient</p>


\begin{align}
J'(m,b) =
   \begin{bmatrix}
     \frac{df}{dw}\\
     \frac{df}{db}\\
    \end{bmatrix} =
   \begin{bmatrix}
     \frac{1}{N} \sum -2x_i(y_i - (wx_i + b)) \\
     \frac{1}{N} \sum -2(y_i - (wx_i + b)) \\
    \end{bmatrix}
\end{align}

<p> The gradiant works to find the minimum cost function</p>
<img src="../images/gradient.png">
<p>with each iteration, the weights and bias are updated and moved towards the global minimum</p>
<p>the rules for updating the weights and bais are:</p>
<p>where alpha is the learning rate and <i>dw</i> or <i>db</i> is the derivative</p>


\begin{align}
w = w - \alpha \cdot dw
\end{align}
\begin{align}
b = b - \alpha \cdot db
\end{align}


<p>the derivitives can be found as follows</p>

\begin{align}
dw = \frac{dJ}{dw} = \frac{1}{N} \sum_{i=1}^{n} -2x_i(y_i - (wx_i + b)) =
\frac{1}{N} \sum_{i=1}^{n} -2x_i(y_i - \hat{y}) = \frac{1}{N} \sum_{i=1}^{n} 2x_i(\hat{y} - y_i)
\end{align}
\begin{align}\newline\end{align}
\begin{align}
db = \frac{dJ}{db} = \frac{1}{N} \sum_{i=1}^{n} -2(y_i - (wx_i + b)) =
\frac{1}{N} \sum_{i=1}^{n} -2(y_i - \hat{y}) = \frac{1}{N} \sum_{i=1}^{n} 2(\hat{y} - y_i)
\end{align}

<p>the learning rate determines how 'fast' the varaibles approach the global minimum. A large learning rate can approach it faster but has a risk of over shooting. While a small learning rate would approach it smaller but has less risk.</p>
<img src="../images/learning_rate.png">

<h2>Algorithm Code</h2>

In [13]:
import numpy as np

class LinearRegression:
    def __init__ (self,lr=0.001,n_iters=1000):
        #lr is the learning rate. Needs to be a small number
        #n_iters is number of iterations
        
        #store the learning rate and number of iterations
        self.lr = lr
        self.n_iters = n_iters
        
        #weights and bias will be defined later, but nned to be declared here
        self.weights = None
        self.bias = None
        
    def fit(self,X,y):
        #X is training samples
        #y is training labels
        #initiate parameters
        n_samples,n_features = X.shape
        
        #initialize weights and bais as an array of 0's of size n
        self.weights = np.zeros(n_features)
        self.bias = 0
        
        #implement gradiant decent
        for _ in range(self.n_iters):
            y_predicted = np.dot(X,self.weights) + self.bias
            
            #calculate the derivatives
            dw = (1/n_samples)*np.dot(X.T,(y_predicted - y))
            db = (1/n_samples)*np.sum(y_predicted - y)
            
            self.weights -= self.lr*dw
            self.bias -= self.lr*db
        
    
    def predict(self,X):
        #predict labels for X
        y_predicted = np.dot(X,self.weights) + self.bias
        return y_predicted
        
        
        
        


<h2>lets see it in action</h2>

In [17]:
from sklearn.model_selection import train_test_split
from sklearn import datasets
import matplotlib.pyplot as plt

X,y = datasets.make_regression(n_samples=100,n_features=1,noise=20, random_state=4)
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=123)

regressor = LinearRegression(lr=0.01)
#we want to fit the train samples and training labels
regressor.fit(X_train,y_train)

#predict the labels to the test samples
predicted = regressor.predict(X_test)

#check performance
def mse(y_true,y_predicted):
    return np.mean((y_true-y_predicted)**2)

mse_value = mse(y_test, predicted)
print('Mean Squared Error: ',mse_value)

Mean Squared Error:  380.1306497140645
