<font color='RED' size='6'><b>MACHINE LEARNING : Linear Regression

<img src="1.jpeg" alt="Linear Regression Model" height='400' width='400' align='left'>

<b>Linear Regression is the most basic approach of modelling used in predictive analysis.
<br>The overall idea of regression is to examine two things:<br>(1) does a set of predictor variables do a good job in predicting an outcome (dependent) variable?  <br>(2) Which variables in particular are significant predictors of the outcome variable, and in what way do they impact the outcome variable?

<b> <font size='3'>Some basic concepts : </font><br><br>
    Supervised learning </b>: Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.<br><br>
    

<b> <font size='3'>Model Representation</font> <br><br></b>
    First, the goal of most machine learning algorithms is to construct a model: a hypothesis that can be used to estimate Y based on X. The hypothesis, or model, maps inputs to outputs.<br><br>
    The hypothesis is usually presented as :
    
    

<font size='5' align=left>$$h_\theta(x^i) = \theta_0 + x^i\theta_1$$

The task is to find the values of <font size='3'>𝜃</font>0 and <font size='3'>𝜃</font>1 such that the cost(h(x)) for each ith test sample(x^i) has the minimum mean difference from respective  training example y^i.

This boils down to minimization of the cost function for m training samples as follows : 

<font size='5'>$$\frac{1}{2m} {\textstyle \sum^m_i} (h_\theta(x)_i - y_i)^2 $$

<b><font size='5' color='red'>IMPLEMENTATION OF LINEAR REGRESSION FOR MACHINE LEARNING IN PYTHON</font>

<font color='green' size='5'><b>STEP 1 </b>: Importing the appropriate libraries.</font>

In [96]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
plt.rcParams['figure.figsize'] = (20.0, 10.0)

This is a self explanatory code. We're basically importing all the necessary libraries needed for the implementation of Linear Regression model, that is numpy for hard mathematics and matplotlib for plotting graphs.

<font color='green' size='5'><b>STEP 2 </b>: Reading data and creating the required vectors/matrices</font>

In [197]:
data = np.array([-1.0, -2.0, -3.0, -4.0, -5.0])
Y = data
X = [0.0, 1.0, 2.0, 3.0, 4.0]
x_train = [5.0, 6.0, 7.0, 8.0, 9.0]
y_train = [-6.0, -7.0, -8.0, -9.0, -10.0]

Basically this data represents a line y = -x-1


<font color='green' size='5'><b>STEP 3 </b>: Making the cost function</font>

In [198]:

#let the prediction be Y = W * X + b, the answer should be y= -1*X + -1
W = 0.3
b = -0.3
ones = np.array([1.0,1.0,1.0,1.0,1.0])


In [199]:
def cost(W,b):
    pred = np.dot(X,W) + np.dot(b,ones)
    return (1/10.0)*np.sum(np.square(pred-Y))

In [200]:
cost(-0.3,0.3)

4.135000000000001

As we know the current cost is really high, optimally we would want the parameters W and b such that the cost is 0 as follow. So we have to get W = -1 and b = -1 for our model to work accurately. 

In [201]:
cost(-1,-1)

0.0

<font color='green' size='5'><b>STEP 4 </b>: Gradient Descent to optimize W and b</font>

In [207]:

def GradientDescent(X,Y,W,b,a,i):
    cost_history = np.zeros(i)
    W_history = np.zeros(i)
    b_history = np.zeros(i)
    w1 = W
    b1 = b
    for it in range(i):
        W_history[it] = W
        b_history[it] = b
        pred = np.dot(X,W) + np.dot(b,ones)
        w1 = W - (1/m)*a*np.dot(np.transpose(X),(pred-Y))
        b1 = b - (1/m)*a*np.sum(pred-Y)
        cost_history[it] = cost(w1,b1)
        W = w1
        b = b1
        
    return W, b, W_history, b_history 

In [216]:
W, b, W_h, b_h = GradientDescent(X,Y,W,b,0.1,150)

So now that we have optimized our parameters W and b let's see how close they are to our required value : 

In [230]:
print(W,b)
print("Cost on the current parameters = ", cost(W,b))

-1.000000000970514 -0.9999999972332775
Cost on the current parameters =  1.2827829817897735e-18


We can say that it is pretty close. Now we can use this code to determine any univariate linear regression problem with a very few adjustments. Let's see how it predicts :


In [231]:
pred = np.dot(W,x_train) + np.dot(ones,b)

In [232]:
print(pred)

[ -6.          -7.          -8.          -9.         -10.00000001]


Let's find out the accuracy on the given test data : 


In [233]:
mean_squared_error=np.mean(np.sum(np.square(pred-y_train)))
mean_squared_accuracy=1-mean_squared_error
print("Accuracy = ", mean_squared_accuracy)


Accuracy =  0.9999999999999999
