## Linear Regression

In this section we will implement a linear regression model trainable with SGD using numpy. Here are the objectives:

1. Implement a simple forward model: $y = W x + b$

1. build a `predict` function which returns the predicted regression value given an input $x$

1. build an `accuracy` function for a batch of inputs $X$ and the corresponding expected outputs $y_{true}$ (for regression we typically use Mean Squared Error (MSE) as metric)

1. build a `grad` function which computes the gradients for an $x$ and its corresponding expected output $y_{true}$ ; check that the gradients are well defined

1. build a `train` function which uses the `grad` function output to update $W$ and $b$

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (8, 8)
plt.rcParams["font.size"] = 14

import numpy as np

In [2]:
# our toy data for this task
# experiment with the number of points, more
# points should be easier for debugging as
# statistical noise will be smaller
X = np.random.uniform(0, 10, size=20*10)
temp = 1.3*X + 15 + np.random.normal(0, 1, size=20*10)

In [3]:
# Plot the data, on the x axis
# the minutes of sunshine and on
# the y-axis the temperature
# YOUR CODE HERE
raise NotImplementedError()

Next is the big `LinearRegression` class. Once you fill in all the gaps it will let you perform linear regression and we will keep building on this class' structure during the day. Maybe a class structure like this is overkill for linear regression, but we can use the same structure for our simple neural network later.

We will perform linear regression and find the coefficients `W` and `b` by gradient descent. This is not how you would solve this in reality, but stick with it for the moment so we can create the basis for Logistic Regression later.

In [4]:
class LinearRegression():
    def __init__(self):
        self.W = np.random.uniform(high=0.5, low=-0.5)
        self.b = np.random.uniform(high=0.5, low=-0.5)
    
    def predict(self, X):
        # TODO: for each sample in X return the predicted value, X is a vector!
        # YOUR CODE HERE
        raise NotImplementedError()
    
    def grad_loss(self, x, y_true):
        # TODO?: compute gradient with respect to W and b for one sample x
        # and the true value y_true
        # YOUR CODE HERE
        raise NotImplementedError()
    
    def train(self, x, y, learning_rate):
        # TODO: compute one step of the gradient descent update
        # YOUR CODE HERE
        raise NotImplementedError()
        
    def loss(self, x, y):
        # TODO: compute the loss for the sample x with true value y
        # YOUR CODE HERE
        raise NotImplementedError()

    def accuracy(self, X, y):
        # TODO: compute accuracy for samples X with true values y
        # YOUR CODE HERE
        raise NotImplementedError()

Questions:

* how do you know that you trained for enough epochs?
* visualise how the loss changes over the epochs
* are more epochs always better? How could you show this?
* change the setup to use stochastic gradient descent
* (bonus) visualise the values of W and b over the epochs
* (bonus) can you see a difference for the paths of W and b between mini batch SGD and single sample SGD?

In [5]:
lr = LinearRegression()
print('initial value of W: %.4f and b: %.4f' % (lr.W, lr.b))

In [6]:
from sklearn.utils import shuffle

In [7]:
lr = LinearRegression()
lr.W = 1.3
lr.b = 15.

line = np.linspace(0, 10, 100)

plt.plot(X, temp, 'o')
plt.plot(line, lr.predict(line), c='k');

In [8]:
lr = LinearRegression()
learning_rate = 0.01

# train the model by looping through the
# data 100 times. After each sample we
# update the weights W and bias b
for n in range(100):
    for (x_, y_) in zip(X, temp):
        lr.train(x_, y_, learning_rate)
    train_acc = lr.accuracy(X, temp)

plt.plot(X, temp, 'o')
plt.plot(line, lr.predict(line), c='r');

In [9]:
train_acc

In [10]:
# Modify the training porcedure to use
# stochastic gradient descent. With a
# mini batch size of 10

lr = LinearRegression()
learning_rate = 0.01
batch_size = 10

for n in range(100):
    # YOUR CODE HERE
    raise NotImplementedError()

plt.plot(X, temp, 'o')
plt.plot(line, lr.predict(line), c='r');

## With validation data