# Introduction To Keras


This tutorial will introduce you a hig-level neural networks library(Keras), written in Python and capable of running on top of either TensorFlow or Theano. It's userful for researchers to go from idea to result with the least possible delay.

Here are some fancy features of Keras:
> - Allows for easy and fast prototyping (through total **modularity, minimalism, and extensibility**)
- Supports both convolutional networks and recurrent networks, as well as combinations of two
- Supports arbitrary connectivity schemes 
- Runs seamlessly on CPU and GPU


Let's go through these Keras features with more details:
- **Modularity**. A model is understood as a sequence or a graph of standalone, fully-configurable modules that can be plugged together with as little restrictions as possible. In particular, neural layers, cost functions, optimizers, initialization schemes, activation functions, regularization schemes are all standalone modules that you can combine to create new models.
- **Minimalism**. Each module should be kept short and simple. Every piece of code should be transparent upon first reading. No black magic: it hurts iteration speed and ability to innovate.
- **Easy extensibility**. New modules are dead simple to add (as new classes and functions), and existing modules provide ample examples. To be able to easily create new modules allows for total expressiveness, making Keras suitable for advanced research.
- **Work with Python**. No separate models configuration files in a declarative format. Models are described in Python code, which is compact, easier to debug, and allows for ease of extensibility.

Keras is compatible with **Python 2.7-3.5**.


### Tutorial content

We will lead you to get started with Keras through a good example to show you how Keras works. Once you get the general idea, you can follow the instuctions to begin your first program with Keras to solve a hand-writting recognition problem. You should get familiar with some common techniques to optimizer your deep learning model.

1. [Installing the Libraries](#Installing-the-Libraries)
0.background 1.example 2.code 3.graph (no need to run)
2. [Getting Started in 5 Minutes](#Getting-Started-in-5-Minutes)
3. [Create Model for Digit Recognition](#Create-Model-for-Digit-Recognition)
4. [Simple Convolutional Neural Network for MNIST](#Simple-Convolutional-Neural-Network-for-MNIST) 
5. [Further Topics](#Further-Topics)
6. [Summary](#Summary)
7. [References](#References)

If you want to know more, read the complete documentation at [Keras.io](https://keras.io/). But just spend only **30 seconds** to get started with Keras!

## Installing the libraries

Keras uses the following packages:
- keras
- numpy, scipy
- pyyaml

Use your most familiar way to install these Python packages.

## Getting Started in 5 Minutes

The core data structure of Keras is a model, a way to organize layers. The main type of model is the Sequential model, a linear stack of layers.

Here is a simple example of `Sequential` model, but no data is ready to run here(we will run real data in next part):

In [93]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD

model = Sequential()

model.add(Dense(output_dim=60, input_dim=784))
model.add(Activation("relu")) 
model.add(Dense(output_dim=1))
model.add(Activation("softmax"))

model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

# The training data and testing data are assumed to be assigned values before
try:
    model.fit(X_train, Y_train, nb_epoch=5, batch_size=1)
    #model.train_on_batch(X_batch, Y_batch)

    loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)
    classes = model.predict_classes(X_test, batch_size=32)
    #proba = model.predict_proba(X_test, batch_size=32)
except:
    print 'Hi, this is only an example of Keras model and no data to run here.'
    print 'You may see the results in the next parts.'


Hi, this is only an example of Keras model and no data to run here.
You may see the results in the next parts.


#### Import Packages

In [94]:
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.optimizers import SGD


#### Create a Sequential Model:

There are two types of models available in Keras: 
- **Sequential model**
- **Model class used with functional API** （more complex models, such as multi-output models, directed acyclic graphs, or models with shared layers).

The Sequential model is a linear stack of layers.
You can create a Sequential model by passing a list of layer instances to the constructor

In [95]:
model = Sequential()

#### Stacking layers: use `.add()` method:

In [96]:
model.add(Dense(output_dim=64, input_dim=100))
model.add(Activation("relu"))
model.add(Dense(output_dim=10))
model.add(Activation("softmax"))

#### Configure the learning process with `compile()` once your model looks good

In [97]:
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])

Arguments for `complie`
- optimizer: optimizer object.
- loss: objective function. 
- metrics: list of metrics to be evaluated by the model during training and testing.

#### Further configure your optimizer:

In [98]:
model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True))

#### Fit your model with training data in batches:

In [99]:
try:
    model.fit(X_train, Y_train, nb_epoch=5, batch_size=32)
except:
    # No data ready for here
    pass

Arguments for `fit`
- batch_size: integer. Number of samples per gradient update.
- nb_epoch: integer, the number of epochs to train the model.

#### Alternatively, you may feed batches to your model manually:

In [100]:
try:
    model.train_on_batch(X_batch, Y_batch)
except:
    # No data ready for here
    pass

#### Evaluate your performance:

In [101]:
try:
    loss_and_metrics = model.evaluate(X_test, Y_test, batch_size=32)
except:
    # No data ready for here
    pass

Argument for `evaluate`:

- batch_size: integer. Number of samples per gradient update.

#### Generate predictions on test data:

In [102]:
try:
    classes = model.predict_classes(X_test, batch_size=32)
    proba = model.predict_proba(X_test, batch_size=32)
except:
    # No data ready for here
    pass

Generates output predictions for the input samples, processing the samples in a batched way.

Arguments for `predict_classes(or predict_proba)`:
- batch_size: integer.


## Create Model for Digit Recognition

Now that we've installed and loaded the libraries, let's solve the first popular problem of **MNIST Handwritten Digit Recognition**.

The MNIST problem is a dataset developed by Yann LeCun, Corinna Cortes and Christopher Burges for evaluating machine learning models on the handwritten digit classification problem.

Images of digits were taken from a variety of scanned documents, normalized in size and centered. This makes it an excellent dataset for evaluating models, allowing the developer to focus on the machine learning with very little data cleaning or preparation required.

Each image is a 28 by 28 pixel square (784 pixels total). A standard spit of the dataset is used to evaluate and compare models, where 60,000 images are used to train a model and a separate set of 10,000 images are used to test it.

Dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.

<img src="http://3qeqpr26caki16dnhd19sv6by6v.wpengine.netdna-cdn.com/wp-content/uploads/2016/05/Examples-from-the-MNIST-dataset.png" width=700>


### Baseline Model with Multi-Layer Perceptrons

We can get very good results using a very simple neural network model with a single hidden layer. 

In this section we will create a simple multi-layer perceptron model that achieves an error rate of 1.74%. We will use this as a baseline for comparing more complex convolutional neural network models.

First, import all package we need:

In [103]:
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.utils import np_utils

To ensure that the results of our script are reproducible, we use constant random seed.

In [104]:
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

#### Loading the MNIST dataset in Keras

In [105]:
from keras.datasets import mnist

(X_train, y_train), (X_test, y_test) = mnist.load_data()

For the function `mnist.load_data()`
- Return:

    - 2 tuples:
        - X_train, X_test: uint8 array of grayscale image data with shape (nb_samples, 28, 28).
        - y_train, y_test: uint8 array of digit labels (integers in range 0-9) with shape (nb_samples,).

#### Flatten the images pixels into a vector

The training dataset is structured as a **3-dimensional** array of instance, image width and image height. For a multi-layer perceptron model we must **reduce the images down into a vector of pixels**. In this case the 28×28 sized images will be 784 pixel input values.

We can do this transform easily using the reshape() function on the NumPy array. We can also reduce our memory requirements by forcing the precision of the pixel values to be 32 bit, the default precision used by Keras anyway.

In [106]:
# flatten 28*28 images to a 784 vector for each image
num_pixels = X_train.shape[1] * X_train.shape[2]
X_train = X_train.reshape(X_train.shape[0], num_pixels).astype('float32')
X_test = X_test.reshape(X_test.shape[0], num_pixels).astype('float32')

#### Normalize grey value for each pixel

The pixel values are gray scale between 0 and 255. It is almost always a good idea to perform some scaling of input values when using neural network models. We can very quickly normalize the pixel values to the range 0 and 1 by dividing each value by the maximum of 255.

In [107]:
# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255

#### One hot encoding of multi-class values

Finally, the output variable is an integer from 0 to 9. This is a multi-class classification problem. As such, it is good practice to use a **one hot encoding** of the class values, transforming the vector of class integers into a binary matrix.

We can easily do this using the built-in np_utils.to_categorical() helper function in Keras.

In [108]:
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

#### Create Baseline Model

In [None]:
# define baseline model
def baseline_model():
    # create model
    model = Sequential()
    model.add(Dense(num_pixels, input_dim=num_pixels, init='normal', activation='relu'))
    model.add(Dense(num_classes, init='normal', activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

The model is a simple neural network with one hidden layer with the same number of neurons as there are inputs (784). A rectifier activation function is used for the neurons in the hidden layer.

A softmax activation function is used on the output layer to turn the outputs into probability-like values and allow one class of the 10 to be selected as the model’s output prediction. 

Logarithmic loss is used as the loss function (called categorical_crossentropy in Keras) and the efficient ADAM gradient descent algorithm is used to learn the weights.

#### Fit Model

We can now fit and evaluate the model. The model is fit over 10 epochs with updates every 200 images. The test data is used as the validation dataset, allowing you to see the skill of the model as it trains. A verbose value of 2 is used to reduce the output to one line for each training epoch.

Finally, the test dataset is used to evaluate the model and a classification error rate is printed

In [None]:
# build the model
model = baseline_model()
# Fit the model 
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))

Train on 60000 samples, validate on 10000 samples
Epoch 1/10
4s - loss: 0.2791 - acc: 0.9203 - val_loss: 0.1421 - val_acc: 0.9577
Epoch 2/10
5s - loss: 0.1122 - acc: 0.9678 - val_loss: 0.0996 - val_acc: 0.9696
Epoch 3/10
6s - loss: 0.0724 - acc: 0.9790 - val_loss: 0.0784 - val_acc: 0.9749
Epoch 4/10


Running the example might take a few minutes when run on a CPU (you may adjust the nb_epoch for shorter time). You should see the output below. This very simple network defined in very few lines of code achieves a respectable error rate of 1.74%.

## Simple Convolutional Neural Network for MNIST

Now, we try a relatiely advanced model-Convolutional Neural Network for MNIST.

If you are not familiar with Convolutional Neural Network(CNN), here's a general introduction for it: [CNN Introduction](http://cs231n.github.io/convolutional-networks/#overview)

Here are an example of CNN (not relevant to our codes and problem to solve):

<img src="https://www.ais.uni-bonn.de/deep_learning/images/Convolutional_NN.jpg" width=700>

#### Data Preprocess

In [None]:
import numpy
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.layers.convolutional import Convolution2D
from keras.layers.convolutional import MaxPooling2D
from keras.utils import np_utils
from keras import backend as K
K.set_image_dim_ordering('th')


# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)

# load data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
# reshape to be [samples][pixels][width][height]
X_train = X_train.reshape(X_train.shape[0], 1, 28, 28).astype('float32')
X_test = X_test.reshape(X_test.shape[0], 1, 28, 28).astype('float32')

# normalize inputs from 0-255 to 0-1
X_train = X_train / 255
X_test = X_test / 255
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]

#### Create CNN Model

In [None]:
def CNN_model():
    # create model
    model = Sequential()
    model.add(Convolution2D(32, 5, 5, border_mode='valid', input_shape=(1, 28, 28), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.2))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dense(num_classes, activation='softmax'))
    # Compile model
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    return model

Convolutional neural networks are more complex than standard multi-layer perceptrons, so we will start by using a simple structure to begin with that uses all of the elements for state of the art results. Below summarizes the network architecture.

1. The first hidden layer is a convolutional layer called a Convolution2D. The layer has 32 feature maps, which with the size of 5×5 and a rectifier activation function. This is the input layer, expecting images with the structure outline above [pixels][width][height].

2. Next we define a pooling layer that takes the max called MaxPooling2D. It is configured with a pool size of 2×2.

3. The next layer is a regularization layer using dropout called Dropout. It is configured to randomly exclude 20% of neurons in the layer in order to reduce overfitting.

4. Next is a layer that converts the 2D matrix data to a vector called Flatten. It allows the output to be processed by standard fully connected layers.

5. Next a fully connected layer with 128 neurons and rectifier activation function.
Finally, the output layer has 10 neurons for the 10 classes and a softmax activation function to output probability-like predictions for each class.

6. As before, the model is trained using logarithmic loss and the ADAM gradient descent algorithm.

#### Run Model

We evaluate the model the same way as before with the multi-layer perceptron. The CNN is fit over 10 epochs with a batch size of 200.

In [None]:
# build the model
model = CNN_model()
# Fit the model
model.fit(X_train, y_train, validation_data=(X_test, y_test), nb_epoch=10, batch_size=200, verbose=2)
# Final evaluation of the model
scores = model.evaluate(X_test, y_test, verbose=0)
print("Baseline Error: %.2f%%" % (100-scores[1]*100))

#### Result

Running the example, the accuracy on the training and validation test is printed each epoch and at the end of the classification error rate is printed.

Epochs may take 60 to 90 seconds to run on the CPU, or about 15 minutes in total depending on your hardware. You can see that the network achieves an error rate of 1.10, which is better than our simple multi-layer perceptron model above.

## Further Topics

- [How to Build Complex Models](https://keras.io/getting-started/functional-api-guide/)
- [How to Save Keras Model You Trained](https://keras.io/getting-started/faq/#how-can-i-save-a-keras-model)
- [How to Run It on GPU](https://keras.io/getting-started/faq/#how-can-i-run-keras-on-gpu)

## Summary

**Keras** is a very popular and useful Python package for deep learning, especially for research. You will fall into the love for it because of the high efficiency for development. In addition, if you have the access to GPU, you can easily run your code with simple configuration and commands.

Keras documentation is very clear and easy to read. This tutorial is just a simple start for you, so you may have a closer look at the original documentation if you have interest at it.

Good luck for you!

## References
1. Keras Documentation: https://keras.io/
2. Handwritten Digit Recognition using Convolutional Neural Networks in Python with Keras: http://machinelearningmastery.com/handwritten-digit-recognition-using-convolutional-neural-networks-python-keras/
3. Convolutional Neural Networks: http://cs231n.github.io/convolutional-networks/#overview