Switch branches/tags
Nothing to show
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Session 5: Deep (Conv)Nets

Meeting date: November 10th, 2016

Recommended Preparatory Work


Three Key Properties of Convolutional Neural Networks

  1. local receptive fields
  2. shared weights and biases (within a given kernel or filter)
  3. pooling layers

Architecture Changes That Can Improve Classification Accuracy

See this Jupyter notebook for a Theano-focused script (based on Nielsen's code and text) that incrementally improves MNIST digit classification accuracy by:

  1. increasing the number of convolutional-pooling layers
  2. using ReLU units in place of the sigmoid or tanh variety
  3. algorithmically expanding the training data
  4. adding fully-connected layers (modest improvement)
  5. using an ensemble of networks

Why Does ConvNet Training Work (Despite Unstable, e.g., Vanishing, Gradients)?

  1. convolutional layers have fewer parameters because of weight- and bias-sharing
  2. "powerful" regularization techniques (e.g., dropout) to reduce overfitting
  3. ReLU units (quicker training relative to sigmoid/tanh)
  4. using GPUs if we're training for many epochs
  5. sufficiently large set of training data (including algorithmic expansion if possible)
  6. appropriate cost function choice
  7. sensible weight initialization

Other Classes of Deep Neural Nets We Touched on Briefly

  1. recurrent neural networks (RNNs), with special discussion of long short-term memory units (LSTMs)
  2. deep belief networks (DBNs)

TensorFlow for Poets

  • makes it trivial to leverage the powerful neural net image-classification architecture of Inception v3
  • study group member Thomas Balestri quickly trained it into an impressive image-classification tool for consumer products

Up Next

CS231n Convolutional Neural Networks for Visual Recognition notes and lectures