Permalink
Browse files

Updates and new notebook

- Fixed error in weight update portion of SGD
- Wrote new notebook with flexible NN architecture
  • Loading branch information...
grantbey committed Oct 1, 2016
1 parent 53ef639 commit dabca1267e0708a836689aabcee4ddf5d5cde0b9
Showing with 577 additions and 15 deletions.
  1. +555 −0 nn-from-scratch/MNIST-nn-SGD-flex_arch.ipynb
  2. +16 −14 nn-from-scratch/MNIST-nn-SGD.ipynb
  3. +6 −1 nn-from-scratch/README.md

Large diffs are not rendered by default.

Oops, something went wrong.

Large diffs are not rendered by default.

Oops, something went wrong.
@@ -1,5 +1,7 @@
# NN from scratch
Update: I wrote a simple SGD version of the original scipy.optimize script, and then I re-wrote that to incoporate a flexible architecture. Also, I found an error in the weigh update part of the code. It is fixed here.
The purpose here was to write a neural network "from scratch", which is to say without using any of the available libraries. The advantage being deeper understanding of the principles and how they work, the disadvantages being performance, versatility and effort.
This nn incorporates most of the features we've dealt with so far in the course (that is, up to somewhere in week 3): cross entropy, L2 regularization, and improved weight initialization.
@@ -10,10 +12,13 @@ MNIST-nn-scipy.ipynb uses the scipy.optimize L_BFGS optimizer to minimize the co
MNIST-nn-SGD.ipynb removes the optimizer in exchange for standard stochastic gradient descent. This more closely matches what we have been studying thus far in the Nielsen textbook and as such it will be where I develop this script further.
MNIST-nn-flex_arch.ipynb is the above SGD-based algorithm but with modifications for a more flexible architecture. This makes the individual steps of forward and backpropogation slightly more opaque, so if you're looking for ease-of-understanding, look elsewhere.
Lastly, the MNIST-loader notebook throws warnings about converting uint8 data into float64 during the scaling process. This didn't seem unusual to me. I'm sure I could suppress the warnings, or do the conversion in the array before passing to the scaler.
The to do list:
- Create more versatility in terms of number of layers, number of neurons per layer
- <del>Incoporate gradient descent</del>
- <del>Create more versatility in terms of number of layers, number of neurons per layer</del>
- Incoporate early stopping
- Incoporate a learning rate schedule
- Make use of the validation data (it's sort of ignored in these notebooks for now)

0 comments on commit dabca12

Please sign in to comment.