In [3]:
from keras.models import Sequential
from keras.layers import Dense
import tensorflow as tf

Sequential?

# Getting started with the Keras Sequential model

The Sequential model is a linear stack of layers.

You can create a Sequential model by passing a list of layer instances to the constructor:

In [4]:
model = Sequential([
    Dense(32, activation='relu', input_shape=(784,)),
    Dense(10, activation=(tf.nn.softmax)),
])

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_4 (Dense)              (None, 10)                330       
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________


You can also simply add layers via the .add() method:

In [6]:
model = Sequential()

model.add(Dense(32, activation='relu', input_shape=(784,)))
model.add(Dense(10, activation=(tf.nn.softmax)))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_7 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_8 (Dense)              (None, 10)                330       
Total params: 25,450
Trainable params: 25,450
Non-trainable params: 0
_________________________________________________________________


## Specifying the input shape

The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because following layers can do automatic shape inference) needs to receive information about its input shape. There are several possible ways to do this:

* Pass an input_shape argument to the first layer. This is a shape tuple (a tuple of integers or None entries, where None indicates that any positive integer may be expected). In input_shape, the batch dimension is not included.
* Some 2D layers, such as Dense, support the specification of their input shape via the argument input_dim, and some 3D temporal layers support the arguments input_dim and input_length.
* If you ever need to specify a fixed batch size for your inputs (this is useful for stateful recurrent networks), you can pass a batch_size argument to a layer. If you pass both batch_size=32 and input_shape=(6, 8) to a layer, it will then expect every batch of inputs to have the batch shape (32, 6, 8).

As such, the following snippets are strictly equivalent:

In [7]:
model = Sequential()
model.add(Dense(32, input_shape=(784,)))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_9 (Dense)              (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


In [8]:
model = Sequential()
model.add(Dense(32, input_dim=784))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 32)                25120     
Total params: 25,120
Trainable params: 25,120
Non-trainable params: 0
_________________________________________________________________


## Compilation

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:

* An optimizer. This could be the string identifier of an existing optimizer (such as rmsprop or adagrad), or an instance of the Optimizer class. See: optimizers.
* A loss function. This is the objective that the model will try to minimize. It can be the string identifier of an existing loss function (such as categorical_crossentropy or mse), or it can be an objective function. See: losses.
* A list of metrics. For any classification problem you will want to set this to metrics=['accuracy']. A metric could be the string identifier of an existing metric or a custom metric function.

We will separately go over optimizers, loss functions, and metrics in a later lesson.

In [9]:
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# For a binary classification problem
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# For a mean squared error regression problem
model.compile(optimizer='rmsprop',
              loss='mse')

# For custom metrics
import keras.backend as K

def mean_pred(y_true, y_pred):
    return K.mean(y_pred)

model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy', mean_pred])

## Training

Keras models are trained on Numpy arrays of input data and labels. For training a model we can use three functions:

* The fit function, this is the most basic
* The fit_generator. This is a bit more complicated as it takes in a generator instead of a numpy array. Often used for large datasets.
* the train_on_batch function which allows you to do a single gradient update over one batch of samples.

In [10]:
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=100))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

# Generate dummy data
import numpy as np
data = np.random.random((1000, 100))
labels = np.random.randint(2, size=(1000, 1))

model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_11 (Dense)             (None, 32)                3232      
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 33        
Total params: 3,265
Trainable params: 3,265
Non-trainable params: 0
_________________________________________________________________


In [11]:
model.fit?

In [12]:
model.fit(
    data, 
    labels, 
    batch_size=32, 
    epochs=10, verbose=2, 
    callbacks=None, 
    validation_split=0.2, 
    validation_data=None, 
    shuffle=True, 
    class_weight=None, 
    sample_weight=None, 
    initial_epoch=0)

Train on 800 samples, validate on 200 samples
Epoch 1/10
 - 1s - loss: 0.7269 - acc: 0.4887 - val_loss: 0.7307 - val_acc: 0.4200
Epoch 2/10
 - 0s - loss: 0.7093 - acc: 0.4887 - val_loss: 0.7234 - val_acc: 0.4900
Epoch 3/10
 - 0s - loss: 0.6972 - acc: 0.5150 - val_loss: 0.7282 - val_acc: 0.4650
Epoch 4/10
 - 0s - loss: 0.6943 - acc: 0.5225 - val_loss: 0.7169 - val_acc: 0.4600
Epoch 5/10
 - 0s - loss: 0.6873 - acc: 0.5500 - val_loss: 0.7169 - val_acc: 0.4900
Epoch 6/10
 - 0s - loss: 0.6818 - acc: 0.5637 - val_loss: 0.7169 - val_acc: 0.4900
Epoch 7/10
 - 0s - loss: 0.6761 - acc: 0.5650 - val_loss: 0.7164 - val_acc: 0.4600
Epoch 8/10
 - 0s - loss: 0.6777 - acc: 0.5400 - val_loss: 0.7166 - val_acc: 0.4700
Epoch 9/10
 - 0s - loss: 0.6683 - acc: 0.5913 - val_loss: 0.7164 - val_acc: 0.4750
Epoch 10/10
 - 0s - loss: 0.6652 - acc: 0.5938 - val_loss: 0.7160 - val_acc: 0.4850


<keras.callbacks.History at 0xd5a07f0>

In [13]:
# The model will continue training where it left off
model.train_on_batch(
    data[:32],
    labels[:32],
    class_weight=None, 
    sample_weight=None,)

[0.68163818, 0.5625]

In [14]:
def data_gen():
    for datum, label in zip(data, labels):
        yield datum[None, :], label

In [15]:
# Be careful with steps per epoch because it can outlast the generator
model.fit_generator(
    data_gen(), 
    steps_per_epoch=900, 
    epochs=1, 
    verbose=1, 
    callbacks=None, 
    validation_data=None, # This can be a generator or a dataset
    validation_steps=None, 
    class_weight=None, 
    max_q_size=10, 
    workers=1, 
    pickle_safe=False, 
    initial_epoch=0)



Epoch 1/1


<keras.callbacks.History at 0xe444978>

## Evaluation

All evaluation methods have the same extra methods: X, X_on_batch, and X_generator, so I will leave it to the reader to explore those. I will show off all the Xs below:

* evaluate/test (test_on_batch)
* predict/predict_classes/predict_proba (only predict for generator etc)

In [16]:
model.evaluate(
    data, 
    labels, 
    batch_size=32, 
    verbose=1, 
    sample_weight=None)



[0.68309658432006837, 0.53900000000000003]

In [17]:
model.predict(
    data, 
    batch_size=32, 
    verbose=1)




array([[ 0.53566039],
       [ 0.63484144],
       [ 0.62468487],
       [ 0.7107358 ],
       [ 0.67245078],
       [ 0.49630192],
       [ 0.59395534],
       [ 0.58334672],
       [ 0.74042112],
       [ 0.61619568],
       [ 0.48081085],
       [ 0.63051391],
       [ 0.69783056],
       [ 0.62741548],
       [ 0.61384833],
       [ 0.69043112],
       [ 0.54444921],
       [ 0.62646997],
       [ 0.53946596],
       [ 0.60386479],
       [ 0.68730891],
       [ 0.45763505],
       [ 0.57500559],
       [ 0.64550632],
       [ 0.58632278],
       [ 0.55860955],
       [ 0.71110231],
       [ 0.68769383],
       [ 0.74919945],
       [ 0.61052531],
       [ 0.65815222],
       [ 0.58698326],
       [ 0.43145257],
       [ 0.67634189],
       [ 0.40130466],
       [ 0.58171219],
       [ 0.49728695],
       [ 0.59845603],
       [ 0.60621035],
       [ 0.59468526],
       [ 0.63213211],
       [ 0.66215461],
       [ 0.62677372],
       [ 0.65563422],
       [ 0.74653304],
       [ 0

In [18]:
model.predict_classes(
    data, 
    batch_size=32, 
    verbose=1)



array([[1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [0],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [0],
       [0],
       [1],
       [1],
       [0],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
       [1],
    

In [19]:
model.predict_proba(
    data, 
    batch_size=32, 
    verbose=1)



array([[ 0.53566039],
       [ 0.63484144],
       [ 0.62468487],
       [ 0.7107358 ],
       [ 0.67245078],
       [ 0.49630192],
       [ 0.59395534],
       [ 0.58334672],
       [ 0.74042112],
       [ 0.61619568],
       [ 0.48081085],
       [ 0.63051391],
       [ 0.69783056],
       [ 0.62741548],
       [ 0.61384833],
       [ 0.69043112],
       [ 0.54444921],
       [ 0.62646997],
       [ 0.53946596],
       [ 0.60386479],
       [ 0.68730891],
       [ 0.45763505],
       [ 0.57500559],
       [ 0.64550632],
       [ 0.58632278],
       [ 0.55860955],
       [ 0.71110231],
       [ 0.68769383],
       [ 0.74919945],
       [ 0.61052531],
       [ 0.65815222],
       [ 0.58698326],
       [ 0.43145257],
       [ 0.67634189],
       [ 0.40130466],
       [ 0.58171219],
       [ 0.49728695],
       [ 0.59845603],
       [ 0.60621035],
       [ 0.59468526],
       [ 0.63213211],
       [ 0.66215461],
       [ 0.62677372],
       [ 0.65563422],
       [ 0.74653304],
       [ 0