# Introduction

This tutorial will introduce you to Keras - a high level neural networks API written in Python that was developed with a focus on enabling fast experimentation. In their own words, 

_"Being able to go from idea to result with the least possible delay is key to doing good research."_

Keras is library that allows for fast prototyping for both convolutional and recurrent networks through user friendliness, modularity and extensibility. As we walk through the basics of Keras and the sequential model, aided by code examples on the IMDB dataset (conveniently also included in Keras) you will see these perks actively in play.





## Tutorial Content

- [Installing Keras and its dependencies](#Installing-Keras-and-its-dependencies)
- [Key features of Keras](#Key-features-of-Keras)
- [Basics](#Basics)
    - [Neural Networks](#Neural-Networks)
    - [Models](#Models)
    - [Layers](#Layers)
- [Keras Sequential Model](#Keras-Sequential-Model)
    - [Initializing models, layers and datasets](#Initializing-models,-layers-and-datasets)
    - [Visualizing models and layers](#Visualizing-models-and-layers)
    - [Parameters: Input shape](#Parameters:-Input-shape)
    - [Model 1](#Model-1)
    - [Model 2](#Model-2)
    - [Model 3](#Model-3)
- [Conclusion](#Conclusion)
- [Summary and References](#Summary-and-References)


## Installing Keras and its dependencies

Installing [Keras](https://keras.io) is as easy is typing a couple of commands into your shell. Keras can work with any one of [TensorFlow](https://tensorflow.org), [CNTK](https://github.com/microsoft/CNTK) or [Theano](http://deeplearning.net/software/theano/). In this tutorial we will be using TensorFlow as the backend for Keras. Both of these also require [NumPy](https://numpy.org/) which you should already be well acquainted with.
[Pip](https://pip.pypa.io/en/stable/) accepts a space seperated list of things to install, so everything we need for this tutorial can be installed by running this next block right in jupyter (only uncomment and run it if you need to install keras for the first time or have a corrupted version of it).

In [20]:
import sys
# !{sys.executable} -m pip install --user --upgrade --force-reinstall keras

Or you can run this in the same shell you launch jupyter from:

$ pip3 install numpy tensorflow keras

We need to install tensorflow here because keras builds on top of tensorflow and essentially serves as a wrapper for tensorflow to make it more easily accessible and extensible.

After the installation finishes, make sure the following commands work for you and you are able to import all the necessities.

In [1]:
import warnings
warnings.filterwarnings('ignore') #to hide annoying future warnings from tensorflow/numpy

import numpy
import json
from keras.datasets import imdb
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.layers import Embedding, Conv1D, GlobalMaxPooling1D
from keras.layers import Bidirectional, LSTM

from IPython.display import Image
from IPython.core.display import HTML
from keras.preprocessing import sequence

Using TensorFlow backend.


## Key features of Keras

Keras sells itself based on 4 key features. For data scientists, these features are exceptionally useful in cases where they would want to analyze a data through the lens of a machine learning model (like a neural net) 

__User friendliness:__ Keras follows the best practices for reducing cognitive load on the user with consistent and simple APIs and minimizes the number of actions required for common use cases.

__Modularity:__ A model in Keras is understood as a sequence of configurable modules. These can be plugged in together with as few restrictions as possible.

__Easy Extensibility:__ New modules are simple to add yourself which makes it ideal for advanced research.

__Python:__ The models in Keras are described in compact Python code which is easy to debug and extend upon.

Throughout this tutorial we will be highlighting these key features.

## Basics

### Neural Networks

A neural network is a set of algorithms tied together in "layers." A model of a neural network might often look like something in the image below. Each node in a layer can take as input any linear combination of the values of the nodes in the layer before it.

Keras takes this general idea and makes it easier to understand and visualize by making each model consist of explicit layers with explicit algorithms assigned to them.

In [2]:
Image(url= "https://i.stack.imgur.com/Kc50L.jpg")

### Models

There are two main types of models in Keras - the Sequential model and the Model class used with the functional API.

The common attributes of a model are:
- __model.layers__ is a flattened list of the layers that comprise the model
- __model.inputs__ is a list of input tensors to the model
- __model.outputs__ is a list of output tensors
- __model.summary()__ prints a summary of the model

We will see examples of these once we understand what layers are.

### Layers

A layer in Keras corresponds to the common conception of a layer in a neural network.

It has the following attributes:
- __layer.get_weights()__ gets the weights of the layer as a list of numpy ndarrays
- __layer.set_weights()__ sets the weights of the layer from a list of numpy ndarray
- __layer.get_config()__ returns a dictionary with the configuration of the layer

We now proceed to see examples of these functions and how to set up a model in Keras.

## Keras Sequential Model

### Initializing models, layers and datasets

We can import models and layers from keras as follows.

In [3]:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation

We will be using the IMDB dataset that comes packaged with Keras. This is a dataset of 25,000 movie reviews from IMDB labeled by sentiment (postive or negative). The reviews are processed and encoded as a sequence of word indexes. These are also indexed by overall frequency in the dataset which allows for quick filtering operations like "only consider the top 5,000 most frequent words but ignore the top 20 most common." A query like this would make sense because intuitively words outside this range probably don't affect the sentiment. The most common words would just be connectors and propositions whereas words the aren't frequent at all are probably nouns or other non-indicators of sentiment.

This is an interesting dataset to demonstrate Keras' power with, since this would seem like something complicated to do from scratch.

In [4]:
from keras.datasets import imdb
(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=5000, skip_top=20)

The x's here is a ndarray of samples. Each sample consists of a list of words indicated by the word frequency rank index which tells us what words were used in each review. The y's here are ndarray of labels, each label being 0 or 1 to indicate whether a review was positive or negative.

We can initialize a sequential model in Keras as follows

In [5]:
model = Sequential()
print("model : ", model)

model :  <keras.engine.sequential.Sequential object at 0x7f5a10244a58>


The model is a Sequential object. We can then interact with this object by adding layers to it as such.
A dense layer can be initialized with a single argument - the number of nodes you want the layer to have.

In [6]:
num_nodes = 50
layer1 = Dense(num_nodes)

We can now add this layer to the model with the add() method. If we take a look at the model's layers now we'll see that the Dense layer has been added.

In [7]:
model.add(layer1)
print("model.layers : ", model.layers)

model.layers :  [<keras.layers.core.Dense object at 0x7f5a10244d30>]


Now we can initialize and add a relu activation layer on top of the Dense layer. 

In [8]:
layer2 = Activation('relu')
model.add(layer2)
print("model.layers : ", model.layers)

model.layers :  [<keras.layers.core.Dense object at 0x7f5a10244d30>, <keras.layers.core.Activation object at 0x7f5a0755c470>]


Keras can make this even easier by letting you directly initialize a model with layers

In [9]:
model = Sequential([
    Dense(32, input_shape=(400,)),
    Activation('relu'),
    Dense(10),
    Activation('softmax')
])
print("model.layers : ", model.layers)

model.layers :  [<keras.layers.core.Dense object at 0x7f5a0755c780>, <keras.layers.core.Activation object at 0x7f5a0755c908>, <keras.layers.core.Dense object at 0x7f5a0755c940>, <keras.layers.core.Activation object at 0x7f5a0755cac8>]


In most common practical scenarios, you want to follow dense layers with activation layers like we did above since the default linear activation isn't very useful.

### Visualizing models and layers

The get_config() method gives us a good insight into what defines a layer and what can be modified in a layer.

In [10]:
print(json.dumps(layer1.get_config(), indent=4, sort_keys=True)) #json dumps to pretty print

{
    "activation": "linear",
    "activity_regularizer": null,
    "bias_constraint": null,
    "bias_initializer": {
        "class_name": "Zeros",
        "config": {}
    },
    "bias_regularizer": null,
    "dtype": "float32",
    "kernel_constraint": null,
    "kernel_initializer": {
        "class_name": "VarianceScaling",
        "config": {
            "distribution": "uniform",
            "mode": "fan_avg",
            "scale": 1.0,
            "seed": null
        }
    },
    "kernel_regularizer": null,
    "name": "dense_1",
    "trainable": true,
    "units": 50,
    "use_bias": true
}


A good way of visualising a model is the summary() method. We use json.loads() and json.dumps() to pretty print.

In [11]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 32)                12832     
_________________________________________________________________
activation_2 (Activation)    (None, 32)                0         
_________________________________________________________________
dense_3 (Dense)              (None, 10)                330       
_________________________________________________________________
activation_3 (Activation)    (None, 10)                0         
Total params: 13,162
Trainable params: 13,162
Non-trainable params: 0
_________________________________________________________________


### Parameters: Input Shape

Before we can start using the model on our data, we need to reinstantiate it because we missed one key thing - the input shape parameter. When creating a model in Keras, adding an input shape to the first layer suffices as it calculates it for the remaining layers implicitly. In the case of our imdb data, lets see what the input shape is.

In [12]:
x_train = sequence.pad_sequences(x_train, maxlen=400) #only let each sample be 400 long
x_test = sequence.pad_sequences(x_test, maxlen=400) #only let each sample be 400 long
print('x_train shape:', x_train.shape)
print('x_test shape:', x_test.shape)
print('y_train shape:', y_train.shape)
print('y_test shape:', y_test.shape)

x_train shape: (25000, 400)
x_test shape: (25000, 400)
y_train shape: (25000,)
y_test shape: (25000,)


After have padded the sequences, we see the sample shapes are (25000, 400), meaning 25000 samples with 400 features in each. Now we can use this information initiate a new model that is actually capable of running on this data.

### Model 1

Let's start by making a simple sequential model. Just add a couple of relu activated layers and squash the result into a sigmoid. We can use the summary() method to get a very nice and clean outlook on all the layers in our model.

In [13]:
model_1 = Sequential()
model_1.add(Dense(200, activation = "relu", input_shape=(400,)))
model_1.add(Dense(200, activation = "relu"))
model_1.add(Dense(200, activation = "relu"))
model_1.add(Dense(1, activation = "sigmoid"))
model_1.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_4 (Dense)              (None, 200)               80200     
_________________________________________________________________
dense_5 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_6 (Dense)              (None, 200)               40200     
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 201       
Total params: 160,801
Trainable params: 160,801
Non-trainable params: 0
_________________________________________________________________


Compile and fit this model with a mean squared optimizer ("rmsprop").

In [14]:
model_1.compile(
 optimizer = "rmsprop",
 loss = "binary_crossentropy",
 metrics = ["accuracy"]
)

model_1.fit(
 x_train, y_train,
 epochs= 2,
 batch_size = 50,
 validation_data=(x_test, y_test)
)

Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.callbacks.History at 0x7f5a02b621d0>

With 51% accuracy on the test data, that was a pretty bad result. Let's try it with a different popular optimizer, adam.

In [15]:
model_1.compile(
 optimizer = "adam",
 loss = "binary_crossentropy",
 metrics = ["accuracy"]
)

model_1.fit(
 x_train, y_train,
 epochs= 2,
 batch_size = 50,
 validation_data=(x_test, y_test)
)

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.callbacks.History at 0x7f5a02c8bba8>

The output accuracy is still barely better than randomly guessing at little over 50%. At this point maybe we should realize that our model was bad to begin with and we need to redo it. This would normally be a fairly disheartening realization after spending hours writing this model. With Keras, it took us all of 10 minutes to make this model. We have no hesitation completely dropping the model and starting from scratch on a new model.

### Model 2

Some preliminary research has led us to the intuition that a bidirectional LSTM might be good for predicting on this kind of dataset. We're not sure, however and don't want to spend all the time writing such a complicated algorithm. With Keras, this is again just a matter of 5 lines of code to construct a model. We saw that the training accuracy was decent last time but the generalization was poor, so let's throw in a dropout layer too.

In [16]:
model_2 = Sequential()
model_2.add(Embedding(5000, #max features
                      64, #number of nodes in embedding layer
                      input_length=400)) #number of features in each input sample
model_2.add(Dropout(0.5))
model_2.add(Bidirectional(LSTM(64)))
model_2.add(Dense(1, activation='sigmoid'))
model_2.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_1 (Embedding)      (None, 400, 64)           320000    
_________________________________________________________________
dropout_1 (Dropout)          (None, 400, 64)           0         
_________________________________________________________________
bidirectional_1 (Bidirection (None, 128)               66048     
_________________________________________________________________
dense_8 (Dense)              (None, 1)                 129       
Total params: 386,177
Trainable params: 386,177
Non-trainable params: 0
_________________________________________________________________


Again, let's compile this model and test its fit.

In [17]:
model_2.compile(optimizer='adam', 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

model_2.fit(x_train, numpy.array(y_train),
          batch_size=32,
          epochs=2,
          validation_data=[x_test, numpy.array(y_test)])

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.callbacks.History at 0x7f5a0755c3c8>

We got a much better accuracy this time at 85% (which varies a little every run) but we want to do even better. Again, instead of fine tuning these parameters, let's try a completely different model because we can!

### Model 3

By now you get the idea. We can add even more layers with more nodes and combine them in more complex ways with barely any additional effort. To top it all, the code is still self-explanatory because of the style keras enforces with its variable names and we can clearly see what layers the model consists of, what the input is and what form the output of the compile is. Since this is our final attempt, let's get serious and use a convolutional layer with max pooling (again with an embedding layer and a couple of dense relu activated layers).

In [18]:
model_3 = Sequential()

# Input Layers
model_3.add(Embedding(5000,               # max features
                      50,                 # number of nodes in embedding layer
                      input_length=400))  # equal to maxlen

# convolutional 1 dimensional layer
model_3.add(Conv1D(250,                   # number of nodes in conv layer
                   3,                     # kernel size
                   padding='valid',
                   activation='relu'))
model_3.add(GlobalMaxPooling1D())         # use max pooling for the convolutional layer

#HIDDEN LAYER
model_3.add(Dense(200))                   # number of nodes in hidden layer
model_3.add(Dropout(0.2))                 # dropout to avoid overfitting
model_3.add(Activation('relu'))           # simple relu activation

model_3.add(Dense(200))
model_3.add(Dropout(0.2))
model_3.add(Activation('relu'))

#OUTPUT LAYER
model_3.add(Dense(1))                     # sigmoid activated layer to squash the output
model_3.add(Activation('sigmoid'))

model_3.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
embedding_2 (Embedding)      (None, 400, 50)           250000    
_________________________________________________________________
conv1d_1 (Conv1D)            (None, 398, 250)          37750     
_________________________________________________________________
global_max_pooling1d_1 (Glob (None, 250)               0         
_________________________________________________________________
dense_9 (Dense)              (None, 200)               50200     
_________________________________________________________________
dropout_2 (Dropout)          (None, 200)               0         
_________________________________________________________________
activation_4 (Activation)    (None, 200)               0         
_________________________________________________________________
dense_10 (Dense)             (None, 200)              

Now that we have a good idea what the model and its layers look like, lets check its performance with an adam optimizer and cross entropy as the loss function.

In [19]:
model_3.compile(loss='binary_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

model_3.fit(x_train, y_train,
          batch_size=32,
          epochs=2,
          validation_data=(x_test, y_test))

Train on 25000 samples, validate on 25000 samples
Epoch 1/2
Epoch 2/2


<keras.callbacks.callbacks.History at 0x7f5a07230828>

After just 2 epochs and under 100s seconds of runtime (on a low end CPU) we have 88.92% accuracy. What's even more astonishing than how quickly this model got good is how easy it was to write.

## Conclusion

We saw three different models operate on the IMDB dataset. The output varied significantly in accuracy and we used completely different kinds of layers in the three models. When a model didn't work, we had no probelm switching to a new one. Making these three entirely different models using even tensorflow would have taken significantly longer and would have been a lot more code. We probably would never even have tried the third model and just kept trying to adjust the second model's parameters because who would want to write hours to code just to try a different approach that might or not be slightly better? 
Keras made the process of experimenting with different models significantly faster and more fun. This enables data scientists like us to spend our time thinking about the high level ideas of which models to use with what types of layers instead of debugging implementations that others have already written hundreds of times. We can very quickly test the hypothetical models that we imagine with Keras and quickly improve upon them. Quoting again what was said at the start of this tutorial:

_"Being able to go from idea to result with the least possible delay is key to doing good research."_

This is the power of Keras. 

## Summary and References

If you're not convinced to use Keras from this tutorial, let the developers make a case for themselves

https://keras.io/why-use-keras/

For a more complete guide on the sequential model, go here.

https://keras.io/getting-started/sequential-model-guide/

To define more complex models without having to extend the library, Keras has the functional API. It works very similarly to the Sequential model we covered here but removes the constraint of being sequential.

https://keras.io/getting-started/functional-api-guide/

Keras supports a huge assortment of layers that are commonly used machine learning techniques. 
To learn more about Keras's layers, go here.

https://keras.io/layers/about-keras-layers/

However, the single best resource is learning by doing. Here dozens of simple Keras based applications with open source code that you can modify and play with.

https://github.com/keras-team/keras/tree/master/examples







