## Gradient descent example
On a linear 1-dimensional function

There are a few ways to find linear approximation of the dataset - it can be even done analytically.

In this example we present the popular and simple method - <a href="https://en.wikipedia.org/wiki/Stochastic_gradient_descent">Stochastic Gradient Descent (SGD)</a>. This is an iterative approach, in which given a set of input data we try to draw a straight line that with every iteration will have smaller error (error/loss ~~ metric representing summed distance of every point on the plot from that line).

We can define a loss function as $L(\theta)=\sum_{i^{(i)}\in data} (f(x^{(i)}) - y^{(i)})^2$, where <br />
$y^{(i)}$ - value y corresponding to point x <br />
$f(x^{(i)})$ - value of approximating function in point $x^{(i)}$

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
plt.rcParams.update({'font.size': 16})
from matplotlib import rc
rc('text', usetex=True)

In [None]:
data_df = pd.read_csv('data.csv')
data_df

In [None]:
data_df.plot.scatter(1, 2)

In [None]:
data_df = pd.read_csv('data.csv')
xs = data_df.values[:, 1]
ys = data_df.values[:, 2]

In [None]:
# helper function to plot loss
def plot_fun(t0, t1, i=0, loss=0):
    data_df.plot.scatter(1, 2, s=40)
    x = 14
    plt.plot([0, x], [t0, f(x)], 'k-', lw=2, label=r'$f($chocolate$)$')
    plt.axis([0, 13, 0, 35])
    plt.legend()
    plt.title('iteration = %02d, loss = %.2f' % (i, loss))
    plt.tight_layout()
    plt.savefig('plots/plot_%03d.png' % (i), dpi=300)

**Excercise 1.** Define linear function f(x) that will  be approximating our dataset and depends on two unknown parameters that we will want to find out. - $\theta_0$ and $\theta_1$:

$f(x)=\theta_0 + \theta_1x$


In [None]:
t0 = 0.  # first parameter (theta_0)
t1 = 0.  # second parameter (theta_1)

def f(x):
    # fill here

**Excercise 2.** Knowing that 

$\frac{\partial L}{\partial \theta_0} = 2\sum_i (f(x^{(i)}) - y^{(i)})$
<br /> and <br />
$\frac{\partial L}{\partial \theta_1} = 2\sum_i (f(x^{(i)}) - y^{(i)})x_{1}^{(i)}$
<br /> Implement the functions to compute those derivatives

In [None]:
import numpy as np

def d_t0(xs, ys):
    # fill here

    
def d_t1(xs, ys):
    # fill here


**Excercise 3.** Implement loss function <br />
$L(\theta)=\sum_{i^{(i)}\in data} (f(x^{(i)}) - y^{(i)})^2$

In [None]:
def fun_loss(xs, ys):
    # fill here

In [None]:
epsilon = 0.0001
t0 = 0.
t1 = 0.
losses = []
for i in range(100):
    d_t0 = fun_d_t0(xs, ys)
    d_t1 = fun_d_t1(xs, ys)
    
    t0 -= epsilon * d_t0
    t1 -= epsilon * d_t1
    
    loss = fun_loss(xs, ys)
    losses.append(loss)
    plot_fun(t0, t1, i, loss)
    plt.show()

In [None]:
plt.plot(losses, lw=2)
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.tight_layout()
plt.savefig('plots/loss_over_time.png', dpi=300)

In [None]:
## SPOILER alert - here you can find all the excercises implemented properly

t0 = 0. 
t1 = 0. 

def f(x):
    return t0 + t1 * x

def fun_d_t0(xs, ys):
    return 2 * sum(np.array([f(x) for x in xs]) - ys)
    
def fun_d_t1(xs, ys):
    return 2 * sum((np.array([f(x) for x in xs]) - ys) * xs)

def fun_loss(xs, ys):
    return sum((np.array([f(x) for x in xs]) - ys) ** 2)