# Estimating Linear Regression With an Iterative Algorithm

## Table of Contents
[Introduction](#Introduction)<br>
[Motivation](#Motivation)

## Introduction

In the second lecture in Andrew Ng's machine learning __[lecture series](https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599)__ he talked about a learning algorithm that could approximate the line of best fit of a data set by iteratively minimizing the squared distance between our model and the data set. Following is a brief motivation and implementation of the algorithm in Python. We will test our algorithm on a data set courtesy of __[Siraj Raval](https://github.com/llSourcell)__.

## Motivation

Since our goal is the approximate the line of best fit to our data set, we want to find $\theta_0, \theta_1$ such that $y = \theta_0 x + \theta_1$ is close to the least squares distance to our data set.

Suppose that our data set models the relationship between two variables $x, y$. Each entry, or training example in our data set can be thought of as a length 2 tuple $(x, y)$, where the $i^{th}$ training example is denoted as $(x^{(i)}, y^{(i)})$. We define our loss function $J$, as the least squares distance between our model and data set. Note that $J=\Sigma_{i=1}^{m}(h(x^{(i)})-y^{(i)})^2$ is a function of two variables $\theta_0, \theta_1$ since to compute the loss, we need to compute the squared distance between our model $y = \theta_0 x + \theta_1$ and our data. We demote our model, also known as our hypothesis (apparently for historical reasons?) as $h$, where $h(x) = \theta_0 x + \theta_1$

In [None]:


%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

DATA = np.genfromtxt('./res/data.csv', delimiter=',')
PARAMS_LIST = list()
ALPHA = 0.00007, 0.01
ITERATIONS = 20000

In [2]:
def descend(params: list) -> list:
    influence0 = lambda datum: ((params[0] * datum[0]) + (params[1] * 1) - datum[1]) * datum[0]
    influence1 = lambda datum: ((params[0] * datum[0]) + (params[1] * 1) - datum[1]) * 1
    step0 = lambda delta: -(ALPHA[0] * delta[0])/len(DATA)
    step1 = lambda delta: -(ALPHA[1] * delta[1])/len(DATA)
    
    delta = [0, 0]
    for datum in DATA:
        delta[0] += influence0(datum)
        delta[1] += influence1(datum)
    
    return [params[0] + step0(delta), params[1] + step1(delta)]

In [3]:
def run() -> None:
    params = [0, 0]
    PARAMS_LIST.append(params)
    for i in range(ITERATIONS):
        params = descend(params)
        PARAMS_LIST.append(params)
    params = list(map(lambda x: "%.4f"%x, params))
    print(f'y = {params[0]}x + {params[1]} estimates the line of best fit after {ITERATIONS} iterations')

if __name__ == '__main__':
    run()

y = 1.322493174907506x + 7.987865218566115 estimates the line of best fit after 20000 iterations
