## Neural Network Layers

This notebook is going to lean pretty heavily on [Keras](https://keras.io/), so it's worth taking a minute to understand a little bit about what Keras is.

According to its own documentation, Keras is "a high-level neural networks API, written in Python and capable of running on top of TensorFlow, CNTK, or Theano."

So while Tensorflow, for example, is an API that helps with machine learning in a broader sense, Keras was written specifically to help with neural networks.

Let's see this in action.

## Get training & test data

Remember how we called MNIST a famous dataset? MNIST is so famous, Keras contains [its own copy](https://keras.io/datasets/#mnist-database-of-handwritten-digits)!

In [1]:
from keras.datasets import mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()

Using Theano backend.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)


`x_train` contains 60,000 28x28 images:

In [7]:
x_train.shape

(60000, 28, 28)

And `y_train` contains the labels:

In [9]:
y_train[:5]

array([5, 0, 4, 1, 9], dtype=uint8)

Don't we need validation data? I think we do, but let's ignore that for now.

## Linear

The simplest type of Keras model is a `Sequential` model.

In [10]:
from keras.models import Sequential

model = Sequential()

Called "sequential" because it contains a stack or "sequence" of linear layers.

In [13]:
from keras.layers import Dense

model.add(Dense(10, input_dim=28))

We should know by now that what we're doing here is telling the first layer in our model to take an input of size (, 28) and produce an output of size (, 10).

We can stack as many model layers as we like:

In [4]:
model.add(Dense(5))
model.add(Dense(2))

And we no longer have to specify input dimensions because each layer just takes the output of the previous layer. 

When we're done here we can compile the model:

In [5]:
model.compile(
    loss="categorical_crossentropy", 
    optimizer="sgd", 
    metrics=["accuracy"]
)

Can we run this on MNIST?

In [6]:
model.fit(x_train, y_train, nb_epoch=3, batch_size=32)

Exception: Error when checking model input: expected dense_input_1 to have 2 dimensions, but got array with shape (60000, 28, 28)

I guess not. Let's have a look at the error message:

> `expected dense_input_1 to have 2 dimensions, but got array with shape (60000, 28, 28)`

Hey, that makes sense! We told our model to expect an input of size (, 28) but instead we gave it the entire (60000, 28, 28) MNIST dataset.

And we could try changing the first layer in our model to something like `model.add(Dense(10, input_shape=(28, 28))` but it actually wouldn't be a very good model (and also Keras will spit an error back at us). 

Remember on [Day Twelve](http://theianchanc.com/one-data-science-a-day/day-twelve/) when we said our [spreadsheet neural network](https://docs.google.com/spreadsheets/d/1fXL-hSkdDZaca4Wc7Q7x5wYTlbYxsjgrXvjNk3woKxU/edit?usp=sharing) wasn't a real neural network because it was just a bunch of linear matrix multiplications? 

Right now, that's all our Keras model is. Each `Dense` layer is the equivalent of a single matrix multiplication in the spreadsheet.

## Nonlinear

The interaction of linear and nonlinear layers in a neural network is actually explained by the intimidatingly-named [universal approximation theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem) but we're going to save that for another day (because I don't actually understand it).

For now, let's just get some nonlinearity in here. 