# Estimating Linear Regression With an Iterative Algorithm

## Table of Contents
[Introduction](#Introduction)<br>
[Motivation](#Motivation)<br>
[Constants](#Constants)<br>
[Descent](#Descent)

## Introduction

In the second lecture in Andrew Ng's machine learning __[lecture series](https://www.youtube.com/watch?v=UzxYlbK2c7E&list=PLA89DCFA6ADACE599)__ he talked about a learning algorithm that could approximate the line of best fit of a data set by iteratively minimizing the squared distance between our model and the data set. Following is a brief motivation and implementation of the algorithm in Python. We will test our algorithm on a data set courtesy of __[Siraj Raval](https://github.com/llSourcell)__.

## Motivation

Since our goal is the approximate the line of best fit to our data set, we want to find $\theta_0, \theta_1$ such that $y = \theta_0 x + \theta_1$ is close to the least squares distance to our data set. Suppose that our data set models the relationship between two variables $x, y$. Each entry, or training example in our data set can be thought of as a length 2 tuple $(x, y)$, where the $i^{th}$ training example is denoted as $(x^{(i)}, y^{(i)})$. We demote our model, also known as our hypothesis (apparently for historical reasons?) as $h$, where $h(x) = \theta_0 x + \theta_1$. We define our loss function $J$, as the least squares distance between our model and data set. Note that $J=\Sigma_{i=1}^{m}(h(x^{(i)})-y^{(i)})^2$ is a function of two variables $\theta_0, \theta_1$ since to compute the loss, we need to compute the squared distance between our model $y = \theta_0 x + \theta_1$ and our data.

We can initialize our model with parameters $(\theta_0, \theta_1) = (0, 0)$, and over each iteration of our learning cycle we can adjust our parameters in a way such that the new model has a smaller squared distance from our data set by computing the gradient of $J$ and moving our parameters on a vector in the direction negative to the gradient, which would be the direction in which our loss function will be minimized.

## Constants

In the first part of our algorithm, we will import the computing and data plotting libraries numpy and matplotlib. We will get our data from csv file. We have a hyperparameter list ALPHA which contains the learning rates for our two parameters $\theta_0, \theta_1$. To choose these constants I tried running the algorithm with arbitrary learning rates until I found two numbers that have relatively fast convergence. We can choose to run our algorithm for any number of iterations we want to.

In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

DATA = np.genfromtxt('./res/data.csv', delimiter=',')
ALPHA = 0.0001, 0.01
ITERATIONS = 20000

## Descent

To compute the gradient of our loss function $J$, we need to compute the partial derivatives $\frac{\partial}{\partial\theta_0}J$ and $\frac{\partial}{\partial\theta_1}J$.

$\frac{\partial}{\partial\theta_0}J = \frac{\partial}{\partial\theta_0}\Sigma_{i=0}^{m}(h(x^{(i)}_0, x^{(i)}_1)-y^{(i)})^2$<br>
$\phantom{.....}= \Sigma_{i=0}^{m}\frac{\partial}{\partial\theta_0}(h(x^{(i)}_0, x^{(i)}_1)-y^{(i)})^2$<br>
$\phantom{.....}= \Sigma_{i=0}^{m}\frac{\partial}{\partial\theta_0}(\theta_0 x_0 + \theta_1 x_1 - y^{(i)})^2$
$\phantom{.....}$

In [2]:
def descend(params: list) -> list:
    influence0 = lambda datum: ((params[0] * datum[0]) + (params[1] * 1) - datum[1]) * datum[0]
    influence1 = lambda datum: ((params[0] * datum[0]) + (params[1] * 1) - datum[1]) * 1
    step0 = lambda delta: -(ALPHA[0] * delta[0])/len(DATA)
    step1 = lambda delta: -(ALPHA[1] * delta[1])/len(DATA)
    
    delta = [0, 0]
    for datum in DATA:
        delta[0] += influence0(datum)
        delta[1] += influence1(datum)
    
    return [params[0] + step0(delta), params[1] + step1(delta)]

In [3]:
def run() -> None:
    params = [0, 0]
    PARAMS_LIST.append(params)
    for i in range(ITERATIONS):
        params = descend(params)
        PARAMS_LIST.append(params)
    params = list(map(lambda x: "%.4f"%x, params))
    print(f'y = {params[0]}x + {params[1]} estimates the line of best fit after {ITERATIONS} iterations')

if __name__ == '__main__':
    run()

y = 1.3225x + 7.9874 estimates the line of best fit after 20000 iterations
