# Deep learning from scratch: homework 3

### General instructions

Complete the exercises listed below in this Jupyter notebook - leaving all of your code in Python cells in the notebook itself.  Feel free to add any necessary cells.  

Included with the notebook are 

- a custom utilities file called `custom_plotter.py` that provides various plotting functionalities (for unit tests to help you debug) as well as some other processing code.


- datasets for exercises: `noisy_sin_sample.csv`.

be sure you have these files located in the same directory where you put this notebook to work!

### When submitting this homework:
    
**Make sure all output is present in your notebook prior to submission**

#### <span style="color:#a50e3e;">Exercise 6. </span>  Put your gradient descent tricks to work

With the previous exercises in mind, you are now in control of three very useful tricks for speeding up the convergence of gradient descent when used to tune multilayer perceptrons

**i)** normalization of input and activation output distributions (discussed in the previous exercise)

**ii)** normalized gradient descent (we saw in class - [see slides here](https://jermwatt.github.io/mlrefined/presentations/courses/deep_learning/Lecture_9_optimization_tricks_part_1.slides.html#/1))

**iii)** the addition of a momentum term to the gradient descent step (we saw in class - [see slides here](https://jermwatt.github.io/mlrefined/presentations/courses/deep_learning/Lecture_9_optimization_tricks_part_1.slides.html#/3))

In this exercise you will put those tools to work, employing the toy sinusoidal dataset below.  In particular: show that using a combination of the gradient descent tricks above - or use all three - that you can tune the 4 layer architecture described in Example 7 of Exercise 4 to achieve a Least Squares cost error of 0.5 or less in at most 500 steps of gradient descent.  Use a steplength parameter $\alpha$ of the form $10^{-\gamma}$ where $\gamma$ is the smallest positive integer that produces convergence with your random initial set of weights.

**You should turn in:**
    
**1)** a cost function plot for your run of gradient descent that achieves the dsired goal



**2)** a short explanation of which gradient descent tricks you used



**3)** a plot showing your tuned model - using the best weights you found from your descent run - fit to the data (this model should overfit the given data)


**Hint:**

Feel free to steal useful code chunks from the previous exercise!

In [1]:
# import autograd functionality
import autograd.numpy as np
from autograd.util import flatten_func
from autograd import grad as compute_grad   

# import custom utilities and plotter
import custom_plotter as plotter

# import various other libraries
import copy
import matplotlib.pyplot as plt

# this is needed to compensate for %matplotl+ib notebook's tendancy to blow up images when plotted inline
from matplotlib import rcParams
rcParams['figure.autolayout'] = True

%matplotlib notebook
%load_ext autoreload
%autoreload 2

Feel free to use the following ``gradient_descent`` function below for this exercise.

In [2]:
# gradient descent function
def gradient_descent(g,w,alpha,max_its,beta,version):    
    # flatten the input function, create gradient based on flat function
    g_flat, unflatten, w = flatten_func(g, w)
    grad = compute_grad(g_flat)

    # record history
    w_hist = []
    w_hist.append(unflatten(w))

    # start gradient descent loop
    z = np.zeros((np.shape(w)))      # momentum term
    
    # over the line
    for k in range(max_its):   
        # plug in value into func and derivative
        grad_eval = grad(w)
        grad_eval.shape = np.shape(w)

        ### normalized or unnormalized descent step? ###
        if version == 'normalized':
            grad_norm = np.linalg.norm(grad_eval)
            if grad_norm == 0:
                grad_norm += 10**-6*np.sign(2*np.random.rand(1) - 1)
            grad_eval /= grad_norm
            
        # take descent step with momentum
        z = beta*z + grad_eval
        w = w - alpha*z

        # record weight update
        w_hist.append(unflatten(w))

    return w_hist

In [18]:
# load data
csvname = 'noisy_sin_sample.csv'
data = np.loadtxt(csvname,delimiter = ',')
x = data[:,:-1]
y = data[:,-1:]

# plot everything
plotter_demo = plotter.Visualizer()
plotter_demo.plot_regression_data(x,y)

<IPython.core.display.Javascript object>