<a href="https://colab.research.google.com/github/wllgrnt/keras-examples/blob/master/Chapter7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 7

## Advanced deep-learning best practices


In [2]:
import keras
import numpy as np

Using TensorFlow backend.


## The Keras functional API

If we want to go beyond to `Sequential` model (e.g. with independent inputs, multiple outputs, internal branching), we have to use the functional API.

With the functional API, we treat layers as functions that take tensors and return tensors.

In [0]:
# Example of the functional API - layer takes an input tensor, returns an output tensor.
input_tensor = keras.Input(shape=(32,))
dense = keras.layers.Dense(32, activation="relu")
output_tensor = dense(input_tensor)

W0621 14:35:03.740296 140189048977280 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.



In [0]:
# A Sequential model and its functional API equivalent

seq_model = keras.models.Sequential()
seq_model.add(keras.layers.Dense(32, activation="relu", input_shape=(64,)))
seq_model.add(keras.layers.Dense(32, activation="relu"))
seq_model.add(keras.layers.Dense(10, activation="softmax"))

input_tensor = keras.Input(shape=(64,))
x = keras.layers.Dense(32, activation="relu")(input_tensor)
x = keras.layers.Dense(32, activation="relu")(x)
output_tensor = keras.layers.Dense(10, activation="softmax")(x)

# Keras retrieves every layer involved in going from the input to the output,
# and brings them together into a graph-like structure.
model = keras.models.Model(input_tensor, output_tensor)

model.summary()

# Once the Model is created, we compile, fit, evalute as before


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         (None, 64)                0         
_________________________________________________________________
dense_18 (Dense)             (None, 32)                2080      
_________________________________________________________________
dense_19 (Dense)             (None, 32)                1056      
_________________________________________________________________
dense_20 (Dense)             (None, 10)                330       
Total params: 3,466
Trainable params: 3,466
Non-trainable params: 0
_________________________________________________________________


### Multi-input models

Here we create an example question-answering model, in which the question and the reference material are both fed in to separate LSTMs and then combined to give an answer.

In [0]:
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500

text_input = keras.Input(shape=(None,), dtype="int32", name="text")

# Embed the inputs into a sequence of length-64 vectors
embedded_text = keras.layers.Embedding(64, text_vocabulary_size)(text_input)

# Encode into a single vector via an LSTM
encoded_text = keras.layers.LSTM(32)(embedded_text)

question_input = keras.Input(shape=(None,), dtype="int32", name="question")

embedded_question = keras.layers.Embedding(32, question_vocabulary_size)(question_input)
encoded_question = keras.layers.LSTM(16)(embedded_question)

concatenated = keras.layers.concatenate([encoded_text, encoded_question], axis=-1)

answer = keras.layers.Dense(answer_vocabulary_size, activation="softmax")(concatenated)

model = keras.models.Model([text_input, question_input], answer)

model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["acc"])

model.summary()

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
text (InputLayer)               (None, None)         0                                            
__________________________________________________________________________________________________
question (InputLayer)           (None, None)         0                                            
__________________________________________________________________________________________________
embedding_3 (Embedding)         (None, None, 10000)  640000      text[0][0]                       
__________________________________________________________________________________________________
embedding_4 (Embedding)         (None, None, 10000)  320000      question[0][0]                   
__________________________________________________________________________________________________
lstm_3 (LS

We train this model either by feeding a list of NumPy arrays as inputs, or a dictionary mapping input names to Numpy arrays (assuming you named your inputs)

In [0]:
num_samples = 1000
max_length = 100

# Generate dummy Numpy data
text = np.random.randint(1, text_vocabulary_size, size=(num_samples, max_length))

question = np.random.randint(1, question_vocabulary_size, size=(num_samples, max_length))

# Answers are one-hot encoded, not integers
answers = np.random.randint(0,1, size=(num_samples, answer_vocabulary_size))

# Option one
#model.fit([text, question], answers, epochs=10, batch_size=128)

# Option two
model.fit({"text": text, "question": question}, answers, epochs=10,x batch_size=128)



### Multi-output models

e.g. A network that attempts to simultaneously predict different properties of the data, such as taking a series of social media posts from a single anonymous individual  and trying to predict their attributes (spooky).

In [3]:
vocabulary_size = 50000
num_income_groups = 10

posts_input = keras.Input(shape=(None,), dtype="int32", name="posts")
embedded_posts = keras.layers.Embedding(256, vocabulary_size)(posts_input)
x = keras.layers.Conv1D(128, 5, activation="relu")(embedded_posts)
x = keras.layers.MaxPooling1D(5)(x)
x = keras.layers.Conv1D(256, 5, activation="relu")(x)
x = keras.layers.Conv1D(256, 5, activation="relu")(x)
x = keras.layers.MaxPooling1D(5)(x)
x = keras.layers.Conv1D(256, 5, activation="relu")(x)
x = keras.layers.Conv1D(256, 5, activation="relu")(x)
x = keras.layers.GlobalMaxPooling1D()(x)
x = keras.layers.Dense(128, activation="relu")(x)

age_prediction = keras.layers.Dense(1, name="age")(x)
income_prediction = keras.layers.Dense(num_income_groups, activation="softmax", name="income")(x)
gender_prediction = keras.layers.Dense(1, activation="sigmoid", name="gender")(x)

model = keras.models.Model(posts_input, [age_prediction, income_prediction, gender_prediction])

W0625 10:26:41.459146 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W0625 10:26:41.537805 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W0625 10:26:41.539778 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W0625 10:26:41.612673 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3976: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.



To train this model, we need different loss functions for different heads of the network. But gradient descent requires a scalar, so we combine all the losses by summing them.



In [4]:
# Option 1
# model.compile(optimizer="rmsprop",
#              loss=["mse", "categorical_crossentropy", "binary_crossentropy"])

# Option 2
model.compile(optimizer="rmsprop",
              loss={"age": "mse",
                    "income": "categorical_crossentropy",
                    "gender": "binary_crossentropy"}
             )


W0625 10:28:23.992760 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

W0625 10:28:24.041905 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3295: The name tf.log is deprecated. Please use tf.math.log instead.

W0625 10:28:24.072448 140193643763584 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/nn_impl.py:180: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where


We can weight the sum so that each loss has a different importance - This is useful when the losses' values have different scales.

In [0]:
model.compile(optimizer="rmsprop",
              loss={"age": "mse",
                    "income": "categorical_crossentropy",
                    "gender": "binary_crossentropy"},
              loss_weights={"age": 0.25,
                    "income": 1.0,
                    "gender": 10.0}
             )


We fit as before by passing an list of arrays of a dict of arrays.

```
model.fit(posts, {"age": age_targets,
                  "income": income_targets,
                  "gender": gender_targets},
          epochs=10, batch_size=64)

```

### Directed acyclic graphs of layers

In addition to multiple-input and multiple-output networks, we can have any arbitrary directed acyclic graph of layers as our internal network topology.

Notable examples of neural-network components that are implemented as graphs include the Inception module and residual connections.


#### Inception modules

This is a stack of modules, each of which looks like a small independent network, split into branches. E.g. three branches, starting with a 1x1 convolution, followed by a 3x3 convolution, then finally concatenating the resulting features. This  helps the network learn spatial features and channel-wise features separately.

In [0]:
# Example implementation of an Inception module (assyming 4d input tensor x)

x = keras.Input(shape=(None, None, 3), dtype="float32", name="input")


branch_a = keras.layers.Conv2D(128, 1, activation="relu", strides=2)(x)

branch_b = keras.layers.Conv2D(128, 1, activation="relu")(x)
branch_b = keras.layers.Conv2D(128, 3, activation="relu", strides=2)(branch_b)


branch_c = keras.layers.AveragePooling2D(3, strides=2)(x)
branch_c = keras.layers.Conv2D(128, 3, activation="relu")(branch_c)

branch_d = keras.layers.Conv2D(128, 1, activation="relu")(x)
branch_d = keras.layers.Conv2D(128, 3, activation="relu")(branch_d)
branch_d = keras.layers.Conv2D(128, 3, activation="relu", strides=2)(branch_d)

output = keras.layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

#### Residual connections

This network component tackles vanishing gradients and representational bottlenecks , by making the output of an earlier layer available as an input to a later layer - the earlier output is summed with the later activation.

In [0]:
# Example implementation of a residual connection when the feature-map sizes are the same
# Uses identity residual connections

x = keras.Input(shape=(None, None, 128), dtype="float32", name="input")


y = keras.layers.Conv2D(128, 3, activation="relu", padding="same")(x)
y = keras.layers.Conv2D(128, 3, activation="relu", padding="same")(y)
y = keras.layers.Conv2D(128, 3, activation="relu", padding="same")(y)

# Add the original x back to the output features
y = keras.layers.add([y,x])

In [0]:
# Example implementation when the feature map sizes differ, using a linear residual connection

x = keras.Input(shape=(None, None, 3), dtype="float32", name="input")


y = keras.layers.Conv2D(128, 3, activation="relu", padding="same")(x)
y = keras.layers.Conv2D(128, 3, activation="relu", padding="same")(y)
y = keras.layers.MaxPooling2D(2,strides=2)(y)


# Use a 1x1 convolution to linearly downsample the original x tensor to the same shape as y
residual = keras.layers.Conv2D(128, 1, strides=2, padding="same")(x)

# Add the original x back to the output features
y = keras.layers.add([y,residual])

### Layer weight sharing

If we call the same layer twice, we reuse the same weights with every call - this lets us share branches. E.g. with a Siamese LSTM (?) aka a shared LSTM, where we process two inputs with a single LSTM layer.

In [27]:
lstm = keras.layers.LSTM(32)

# Variable-length sequences of vectors of size 128
left_input = keras.Input(shape=(None, 128))
left_output = lstm(left_input)

right_input = keras.Input(shape=(None, 128))
right_output = lstm(right_input)



merged = keras.layers.concatenate([left_output, right_output], axis=-1)

predictions = keras.layers.Dense(1, activation="sigmoid")(merged)

model = keras.models.Model([left_input, right_input], predictions)

model.compile(optimizer="rmsprop", loss="binary_crossentropy", metrics=["acc"])

model.summary()

# model.fit([left_data, right_data], targets)

__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_3 (InputLayer)            (None, None, 128)    0                                            
__________________________________________________________________________________________________
input_4 (InputLayer)            (None, None, 128)    0                                            
__________________________________________________________________________________________________
lstm_2 (LSTM)                   (None, 32)           20608       input_3[0][0]                    
                                                                 input_4[0][0]                    
__________________________________________________________________________________________________
concatenate_4 (Concatenate)     (None, 64)           0           lstm_2[0][0]                     
          

### Models as layers

We can think of models as a "bigger layer", e..g `y = model(x)`, and again the weights are shared. E.g. for a vision model with a dual camera: 

In [28]:
xception_base = keras.applications.Xception(weights=None, include_top=False)

# 250x250 rgb images
left_input = keras.Input(shape=(250,250,3))
right_input = keras.Input(shape=(250,250,3))

left_features = xception_base(left_input)
right_features = xception_base(right_input)

merged_features = keras.layers.concatenate([left_features, right_features], axis=-1)

W0625 11:25:35.062937 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:174: The name tf.get_default_session is deprecated. Please use tf.compat.v1.get_default_session instead.

W0625 11:25:36.231467 140193643763584 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:1834: The name tf.nn.fused_batch_norm is deprecated. Please use tf.compat.v1.nn.fused_batch_norm instead.

