# Implementing a Neural Network
In this part of exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.

In [None]:
# import* setup

import numpy as np
import matplotlib.pyplot as plt

from test.clf.neural_net import TwoLayerNet


You will use the class `TwoLayerNet` in the file `test/clf/neural_net.py` to represent instances of your network. The network parameters are stored in the instance variable `self.params` where keys are string parameter names and values are numpy arrays. Below, initialize toy data and a toy model that you will use to develop your implementation.

In [None]:
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.


# Forward pass: compute scores
Open the file `test/clf/neural_net.py` and look at the method `TwoLayerNet.loss`. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters. 

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.

# Forward pass: compute loss
In the same function, implement the second part that computes the data and regularizaion loss.

# Backward pass
Implement the rest of the function. This will compute the gradient of the loss with respect to the variables `W1`, `b1`, `W2`, and `b2`. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:

In [None]:
# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.
# these should all be less than 1e-8 or so



# Training the network
To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function `TwoLayerNet.train` and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement `TwoLayerNet.predict`, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run to train a two-layer network on toy data. You should achieve a training loss less than 0.2.

# Load the data
Now that you have implemented a two-layer network that passes gradient checks and works on toy data, it's time to load up our favorite CIFAR-10 data so can use it to train a classifier on a real dataset.

# Train a network
To train your network you will use SGD with momentum. In addition, adjust the learning rate with an exponential learning rate schedule as optimization proceeds; after each epoch, reduce the learning rate by multiplying it by a decay rate.

# Debug the training
With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn't very good yet.

One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.

In [None]:
# Plot the loss function and train / validation accuracies


In [None]:
# Visualize the weights of the network



# Tune your hyperparameters

**What's wrong?**. Looking at the visualizations above, you can see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that you should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

**Tuning**. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

**Approximate results**. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Your best network gets over 52% on the validation set.

**Experiment**: Your goal in this part of exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).

In [None]:
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_net.                                                            #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on the previous exercises.                          #
#################################################################################


In [None]:
# visualize the weights of the best network




# Run on the test set
When you are done experimenting, you should evaluate your final trained network on the test set; you should get above 48%.
