<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Intro-to-Keras" data-toc-modified-id="Intro-to-Keras-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Intro to <code>Keras</code></a></span></li><li><span><a href="#Installing-Keras" data-toc-modified-id="Installing-Keras-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Installing <code>Keras</code></a></span><ul class="toc-item"><li><span><a href="#Using-the-GPU" data-toc-modified-id="Using-the-GPU-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Using the GPU</a></span></li></ul></li><li><span><a href="#Building-Keras-Models-:-ANN-Example" data-toc-modified-id="Building-Keras-Models-:-ANN-Example-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Building <code>Keras</code> Models : ANN Example</a></span><ul class="toc-item"><li><span><a href="#The-Sequential-API" data-toc-modified-id="The-Sequential-API-3.1"><span class="toc-item-num">3.1&nbsp;&nbsp;</span>The Sequential API</a></span></li><li><span><a href="#The-Functional-API" data-toc-modified-id="The-Functional-API-3.2"><span class="toc-item-num">3.2&nbsp;&nbsp;</span>The Functional API</a></span></li></ul></li><li><span><a href="#Training-Keras-Models" data-toc-modified-id="Training-Keras-Models-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Training <code>Keras</code> Models</a></span><ul class="toc-item"><li><span><a href="#Compiling-the-Network" data-toc-modified-id="Compiling-the-Network-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>Compiling the Network</a></span></li><li><span><a href="#Training-the-Network" data-toc-modified-id="Training-the-Network-4.2"><span class="toc-item-num">4.2&nbsp;&nbsp;</span>Training the Network</a></span></li><li><span><a href="#Evaluating-and-Making-Predictions" data-toc-modified-id="Evaluating-and-Making-Predictions-4.3"><span class="toc-item-num">4.3&nbsp;&nbsp;</span>Evaluating and Making Predictions</a></span></li></ul></li><li><span><a href="#Advanced-Keras-Layers" data-toc-modified-id="Advanced-Keras-Layers-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Advanced <code>Keras</code> Layers</a></span><ul class="toc-item"><li><span><a href="#Dropout-Layers" data-toc-modified-id="Dropout-Layers-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>Dropout Layers</a></span></li><li><span><a href="#Merge-Layers" data-toc-modified-id="Merge-Layers-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Merge Layers</a></span></li><li><span><a href="#And-More..." data-toc-modified-id="And-More...-5.3"><span class="toc-item-num">5.3&nbsp;&nbsp;</span>And More...</a></span></li></ul></li></ul></div>

In [None]:
# Imports
# Basics
#import pandas as pd 
import numpy as np
import random

# keras
import keras
from keras.models import Sequential, Model
from keras.layers import Dense, Activation, Dropout, Input
#from keras.layers import Concatenate
from keras.utils import np_utils
from keras.utils.np_utils import to_categorical
from keras.utils.data_utils import get_file
from keras.preprocessing.text import Tokenizer
from keras.utils.vis_utils import model_to_dot, plot_model
from keras.datasets import imdb, reuters
from keras.preprocessing import sequence
from keras.optimizers import SGD, RMSprop

np.random.seed(13)


# Intro to `Keras` 

# Installing `Keras`

You should already have keras installed. But if you ever need to re-install (in a new conda environment, for example), run this at the command line:

```$ conda install -c conda-forge keras```

If that doesn't work:

```$ conda install tensorflow```

... move into a folder for installing tools

```$ git clone https://github.com/fchollet/keras.git```

```$ cd keras```

```$ python setup.py install```

You can run this on your laptop, but it will be very slow. If you have a CUDA 7.5 compatable GPU (or use an AWS instance with one) keras will run way faster.

## Using the GPU

Deep Learning requires tons of matrix computations. GPUs can do this really fast in parallel (many cores!)

If you're working on a computer with a GPU and want to configure it for keras:
  - Install [CUDA](http://docs.nvidia.com/cuda/index.html#axzz4Pa5zY8Qi) 
  - In `.bashrc`/`.bash_profile`: `export THEANO_FLAGS=device=gpu,floatX=float32`

But if not, don't worry, you can just use AWS's big toys! **When using AWS, a Deep Learning AMI is highly recommended**-- use an AMI to get all the packages you need (including CUDA setup) handled up front. We'll walk through these steps in another notebook.

# Building `Keras` Models : ANN Example
The core objects in `Keras` are `Models` and `Layers`
- `Models` set up the container for your network
- `Layers` fill in the architecture (connections, unit types, activation functions, etc)
- The 2 options for `Models`:
  - `Sequential`: The basic one we'll focus on
  - Function API: Specify more complex models, more flexibility
- There are lots of options for types of `Layers`. We'll start with two familiar ones:
  - `Dense` layers are fully connected, meaning every node receives a linear combination of each node in its input layer.
  - `Activation` layers apply an activation function to the value of each node in the previous layer.
  - Alternatively, Dense layers may be created with an activation function as a parameter.
  
We'll look at how to build an ANN (Artificial Neural Network) using both the Sequential and Functional APIs.

## The Sequential API

Sequential Model is like a Sklearn object with extra features. Most importantly, it is an empty container that allows you to design whatever architecture you want (i.e., by adding `Layers`).

Let's start filling it with layers. We can specify all of them at the time we create the model, or we can just create an empty container and add layers to it. Each method produces the same model, but the second one can be used to build models programatically, if you're interested in going that route.

(See the [Keras Documentation on the Sequential API](https://keras.io/getting-started/sequential-model-guide/))

In [None]:
# Sequential model, all layers added at initialization
model1 = Sequential([
    Dense(32, input_dim=784),
    Activation('relu'),
    Dense(10),
    Activation('softmax'),
])

model1.summary()

In [None]:
# Sequential model, empty at initialization, layers added later.
model2 = Sequential()
model2.add(Dense(32, input_dim=784)) # 
model2.add(Activation('relu'))
model2.add(Dense(10))
model2.add(Activation('softmax'))

model2.summary()

## The Functional API

The Functional API is a different approach to building models in keras. Each layer in a network is defined as a function that takes the previous layer as an input. Then the model is created by taking the input and output layers as parameters. 

Choosing between the Sequential and Functional APIs is usually a matter of personal preference, although there are some architectures that cannot be created using the Sequential API (notably siamese networks, in which a Merge layer combines the inputs from two paths of layers).

(See the [Keras Documentation on the Functional API](https://keras.io/getting-started/functional-api-guide/))

In [None]:
# Same as the Sequential models, but needs an explicit Input layer.
# Note that each layer uses the same classes we used to create layers for our Sequential models,
# but the layers are connected by passing them as parameters in layer function calls.
# The model itself takes only the input and output layers as parameters.

input_layer = Input(shape=(784,))
hidden_layer = Dense(32)(input_layer)
hidden_act = Activation('relu')(hidden_layer)
output_layer = Dense(10)(hidden_act)
output_act = Activation('softmax')(output_layer)

model3 = Model(inputs=input_layer, outputs=output_act)
model3.summary()

In [None]:
# Activations do not need to be included as separate layers.

input_layer = Input(shape=(784,))
hidden_layer = Dense(32, activation='relu')(input_layer)
output_layer = Dense(10, activation='softmax')(hidden_layer)

model4 = Model(inputs=input_layer, outputs=output_layer)
model4.summary()

As can be seen from the summary printouts, these three models have the same architecture. The models built with the Functional API do have the benefit of having the Input dimensions included in the summary.

# Training `Keras` Models

Sequential models and Models built with the functional API behave similarly, but there are some subtle differences. 

See the [Keras Models Documentation](https://keras.io/models/about-keras-models/) if you ever get stuck.

## Compiling the Network

The `Model.compile()` function configures a model for training. In order to do this, we need to specify the loss function, the optimizer algorithm, and the metrics we wish to use to train the model.

In [None]:
# Compile the network
model4.compile(loss='sparse_categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

## Training the Network

Similar to `sklearn`, we can train keras models by calling `fit` on `numpy` arrays.

We can specify minibatch learning using the `epochs` and `batch_size` parameters.

In [None]:
from keras import datasets

(X_train, y_train), (X_test, y_test) = datasets.mnist.load_data()

X_train = X_train.reshape(-1,784)/255 # reshape from 28x28 to 784 and
X_test = X_test.reshape(-1,784)/255   # rescale from [0,255] to [0,1]

# Fit the model 
model4.fit(X_train, y_train, epochs=10, batch_size=32)

## Evaluating and Making Predictions

Trained keras models have an `evaluate` method that returns the loss values and metrics values for the model in test mode. They also have a `predict` method for making predictions.

In [None]:
# Evaluate
loss_and_metrics = model4.evaluate(X_test, y_test, batch_size=32)
print('\nLoss and Accuracy:\n', loss_and_metrics)

# Make Predictions
classes = model4.predict(X_test, batch_size=32)
#proba = model4.predict_proba(X_test, batch_size=32)
print('\nClass Predictions:\n', classes)

... probably not a model we'll want to put into production, but at least we're able to train it and make predictions. 

How would you go about building new architectures to train and compare performance?

# Advanced `Keras` Layers

- Layers define:
  - Nodes (number of features in a layer, and how they are connected to nodes in the previous layer)
  - Activations (transformations applied to the data coming into a node from other nodes it is connected to) 
  - Other Properties (for example, reshaping the feature arrays, dropout, etc.)

## Dropout Layers

**Dropout** is a regularization technique used to control overfitting in neural networks. Instead of adding a penalty to our Loss function, Dropout works by randomly setting some proportion of the weights between layers to zero. This prevents the layer from relying too heavily on any of its inputs.

Here is an example of Dropout being used in an ANN that classifies documents from the Reuters newsgroups data set.

In [None]:
# Set the max number of words to keep, 2000 most frequent 
max_features = 2000
(X_train, y_train), (X_test, y_test) = reuters.load_data(num_words=max_features)
maxlen = 10

# Data is stored in sentences, pad any that are shorter than 10 words with zeros
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)

model = Sequential()
# Dense(64) is a fully-connected layer with 64 hidden units.
# The 'kernel_initializer' parameter allows us to set the distribution of the
# randomized weights that the layer is initialized with.
model.add(Dense(64, input_dim=10, kernel_initializer="uniform"))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(64, kernel_initializer="uniform"))
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(46, kernel_initializer="uniform"))
model.add(Activation('softmax'))

# Momentum: how much the weights are adjusted in response to the current gradient.
# Nesterov momentum: weights are adjusted by gradient descent, but gradient is also corrected
# after each step by examining the direction of the gradient at the new location.
sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

model.fit(X_train, y_train,
          epochs=20,
          batch_size=16)
score = model.evaluate(X_test, y_test, batch_size=16)

Mmm...not a great job!  Although with 46 classes it's okay, but we can do better.  We'll return!

## Merge Layers
- Merge multiple layer sequences into a single layer
- A number of options for merging outputs: `concat`, `sum`, `ave`, etc
- Like so:

In [None]:
left_branch_input = keras.layers.Input(shape=(784,))
x1 = keras.layers.Dense(32, )(left_branch_input)

right_branch_input = keras.layers.Input(shape=(784,))
x2 = keras.layers.Dense(32)(right_branch_input)

merged = keras.layers.concatenate([x1, x2])
main_output = keras.layers.Dense(10, activation='sigmoid')(merged)

model = keras.models.Model(inputs=[left_branch_input, right_branch_input], outputs=[main_output])

# This is what the compile and training looks like
# Note that to get this to run you will need to provide two sets of inputs, x_left and x_right
model.compile(optimizer='adam', loss='mean_squared_error')

model.summary()

## And More...

Check out the [Keras Documentation on Layers](https://keras.io/layers/about-keras-layers/) to see what your options are. Keras provides lots (and lots, and lots) of options for ways to connect nodes between layers, for transforming or normalizing data, for specifying activation functions, and so on. If you're feeling brave, [you can also write your own layers](https://keras.io/layers/writing-your-own-keras-layers/).

In the next notebook we'll take a look at Embeddings, Convolutional layers, and several flavors of Recurrent layers.