##### Copyright 2020 The TensorFlow Authors.

In [1]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
## Start Alexey Note:
- Running the notebook gave me a GPU related error (sorry, forgot to copy the error message `sad face`). Some quick [internet searching](https://www.tensorflow.org/guide/gpu) led me to the following code snippet which resolved my problem:
```python
gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)
```
 - I also had to `restart` another notebook's kernel which was using a lot of my GPU's memory. Though I can see this setting is limited my GPU usage, it isn't precise... The notebook uses 1439 MiB whereas the limit is 1024 MB (~977 MiB).
 
 ## End Alexey Note
---

# The Sequential model

<table class="tfo-notebook-buttons" align="left">
  <td>
    <a target="_blank" href="https://www.tensorflow.org/guide/keras/sequential_model"><img src="https://www.tensorflow.org/images/tf_logo_32px.png" />View on TensorFlow.org</a>
  </td>
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/tensorflow/docs/blob/snapshot-keras/site/en/guide/keras/sequential_model.ipynb"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
  <td>
    <a target="_blank" href="https://github.com/keras-team/keras-io/blob/master/guides/sequential_model.py"><img src="https://www.tensorflow.org/images/GitHub-Mark-32px.png" />View source on GitHub</a>
  </td>
  <td>
    <a href="https://storage.googleapis.com/tensorflow_docs/docs/site/en/guide/keras/sequential_model.ipynb"><img src="https://www.tensorflow.org/images/download_logo_32px.png" />Download notebook</a>
  </td>
</table>

## Setup

In [2]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

gpus = tf.config.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only allocate 1GB of memory on the first GPU
  try:
    tf.config.set_logical_device_configuration(
        gpus[0],
        [tf.config.LogicalDeviceConfiguration(memory_limit=1024*4)])
    logical_gpus = tf.config.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPUs")
  except RuntimeError as e:
    # Virtual devices must be set before GPUs have been initialized
    print(e)


1 Physical GPUs, 1 Logical GPUs


## When to use a Sequential model

A `Sequential` model is appropriate for **a plain stack of layers**
where each layer has **exactly one input tensor and one output tensor**.

Schematically, the following `Sequential` model:

In [3]:
# Define Sequential model with 3 layers
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu", name="layer1"),
        layers.Dense(3, activation="relu", name="layer2"),
        layers.Dense(4, name="layer3"),
    ]
)
# Call model on a test input
x = tf.ones((3, 3))
y = model(x)

is equivalent to this function:

In [4]:
# Create 3 layers
layer1 = layers.Dense(2, activation="relu", name="layer1")
layer2 = layers.Dense(3, activation="relu", name="layer2")
layer3 = layers.Dense(4, name="layer3")

# Call layers on a test input
x = tf.ones((3, 3))
y = layer3(layer2(layer1(x)))

A Sequential model is **not appropriate** when:

- Your model has multiple inputs or multiple outputs
- Any of your layers has multiple inputs or multiple outputs
- You need to do layer sharing
- You want non-linear topology (e.g. a residual connection, a multi-branch
model)

---
## Start Alexey Note:
 - The first thing this notebook does is define a "Sequential Model", which looks something like this:

<img src="../Images/SequentialModel.jpg" alt="Diagram of a sequential model" width="700"/>

 - Where "A Sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor." The image shows a "dense" neural network model, where each node in one layer is connected to every other node in neighboring layers. This is the kind of network that is created in this tutorial, implying there are non-dense and non-stack neural networks as well. Note that the exact network in the notebook is 3 layers of sizes 2, 3, and 4 respectively with the first 2 layers having the "rectified linear unit activation function" ("relu" for short).
 - An [activation function](https://deepai.org/machine-learning-glossary-and-terms/activation-function) is used to introduce nonlinearity to a neural network, giving them the ability to "learn" complex tasks that you wouldn't necessarily be able to learn without it. The link in the previous sentence claims that neural networks without activation functions can be decomposed into a single matrix acting on the input, which is really cool and I believe the proof is simple: every layer is effectively a matrix, so all layers can be matrix-multiplied together to get a single matrix acting on input and resulting in an output.
 - A diagram showing a "feed-forward" neural network is obtained from [here](https://deepai.org/machine-learning-glossary-and-terms/activation-function):

<img src="../Images/FeedForward.svg" alt="Diagram of a feed forward neural network" width="700"/>

 - The curvy line in a circle is the activation function. Take a look at layer 2. Each neuron is obtaining the values of the previous layer's neurons after being passed in the previous layer's activation function. Layer 2's neurons then take these values and compute their values with the following function:

<img src="../Images/NeuronFunction.jpg" alt="Formula for a neuron" width="300"/>

 - Where the function `f` is the activation function and `B` is some bias. So each successive neuron takes the values of the previous layer's neurons to dtermine it's own value. Though it looks like [Keras'](https://keras.io/api/layers/core_layers/dense/) `Dense` layer puts the bias inside the activation function:

<img src="../Images/NeuronFunction2.jpg" alt="Formula for a neuron" width="300"/>

## End Alexey Note
---

## Creating a Sequential model

You can create a Sequential model by passing a list of layers to the Sequential
constructor:

In [5]:
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)

Its layers are accessible via the `layers` attribute:

In [6]:
model.layers

[<keras.layers.core.dense.Dense at 0x7ffb5e4c99a0>,
 <keras.layers.core.dense.Dense at 0x7ffb5e4c9b80>,
 <keras.layers.core.dense.Dense at 0x7ffb5e4c9130>]

You can also create a Sequential model incrementally via the `add()` method:

In [7]:
model = keras.Sequential()
model.add(layers.Dense(2, activation="relu"))
model.add(layers.Dense(3, activation="relu"))
model.add(layers.Dense(4))

model.layers

[<keras.layers.core.dense.Dense at 0x7ffb5e5027f0>,
 <keras.layers.core.dense.Dense at 0x7ffb5e4dea30>,
 <keras.layers.core.dense.Dense at 0x7ffba7fe24f0>]

Note that there's also a corresponding `pop()` method to remove layers:
a Sequential model behaves very much like a list of layers.

In [8]:
model.pop()
print(len(model.layers))  # 2

2


Also note that the Sequential constructor accepts a `name` argument, just like
any layer or model in Keras. This is useful to annotate TensorBoard graphs
with semantically meaningful names.

In [9]:
model = keras.Sequential(name="my_sequential")
model.add(layers.Dense(2, activation="relu", name="layer1"))
model.add(layers.Dense(3, activation="relu", name="layer2"))
model.add(layers.Dense(4, name="layer3"))

## Specifying the input shape in advance

Generally, all layers in Keras need to know the shape of their inputs
in order to be able to create their weights. So when you create a layer like
this, initially, it has no weights:

In [11]:
layer = layers.Dense(3)
layer.weights  # Empty

[]

It creates its weights the first time it is called on an input, since the shape
of the weights depends on the shape of the inputs:

In [12]:
# Call layer on a test input
x = tf.ones((1, 4))
y = layer(x)
layer.weights  # Now it has weights, of shape (4, 3) and (3,)

[<tf.Variable 'dense_6/kernel:0' shape=(4, 3) dtype=float32, numpy=
 array([[ 0.03278744,  0.82709455,  0.50594497],
        [-0.07007271,  0.85455394, -0.3165413 ],
        [-0.6249998 , -0.16881174, -0.3493585 ],
        [ 0.01035941, -0.5942502 , -0.5780734 ]], dtype=float32)>,
 <tf.Variable 'dense_6/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

---
## Start Alexey Note:
  - The Keras API documentation can be found [here](https://keras.io/api/).
  - Continuing through the tutorial was mostly straight forward, but I had trouble parsing the following code:

In [13]:
layer = layers.Dense(3)
layer.weights
x = tf.ones((1, 4))
y = layer(x)
layer.weights

[<tf.Variable 'dense_7/kernel:0' shape=(4, 3) dtype=float32, numpy=
 array([[ 0.14241827,  0.8592663 , -0.57716835],
        [ 0.16350663,  0.49261248,  0.870757  ],
        [-0.01283383, -0.02807742,  0.3844235 ],
        [ 0.75283515, -0.43364245, -0.7708735 ]], dtype=float32)>,
 <tf.Variable 'dense_7/bias:0' shape=(3,) dtype=float32, numpy=array([0., 0., 0.], dtype=float32)>]

 - What this code is doing is first making a `Dense` layer with `3` neurons. This layer initially has no weights associated with it, they are only defined after a tensor is passed into it. Then, a `(1,4)` sized-tensor is passed into the layer, resulting in the weights to be of size `(4,3)`, since the layer now knows the input size and output size, so by basic matrix multiplication we see that `(1,4)x(4,3)=(1,3)`, which is the number of neurons in the layer (and the number of generated weights).
 - If you were curious like me, you might wonder how the weights are randomized? It turns out that `kernel_initializer` value when creating the `Dense` layer controls that. By default, it is the `GlorotUniform` initializer (see [here for `GlorotUniform`](https://keras.io/api/layers/initializers/#glorotuniform-class) and [here for `Dense` layer initialization](https://keras.io/api/layers/core_layers/dense/)). Biases are randomly selected as well, and the way they are selected can be found by looking at `Dense` layer class as well. The weights can also be manually set with [the `set_weights` method](https://keras.io/api/layers/base_layer/#set_weights-method). The call to `layer(x)` produces random weights once and then the weights are the same from then on unless a new layer is created or the weights are manually changed.
 - The rest of the tutorial seemed much more technical that I'm comfortable with so far, so I wasn't able to follow along very well despite reading it all. I will revisit it later as I learn more about Tensorflow, but I left it included in here for the curious.
 
## End Alexey Note
 ---

Naturally, this also applies to Sequential models. When you instantiate a
Sequential model without an input shape, it isn't "built": it has no weights
(and calling
`model.weights` results in an error stating just this). The weights are created
when the model first sees some input data:

In [14]:
model = keras.Sequential(
    [
        layers.Dense(2, activation="relu"),
        layers.Dense(3, activation="relu"),
        layers.Dense(4),
    ]
)  # No weights at this stage!

# At this point, you can't do this:
# model.weights

# You also can't do this:
# model.summary()

# Call the model on a test input
x = tf.ones((1, 4))
y = model(x)
print("Number of weights after calling the model:", len(model.weights))  # 6
model.summary()

Number of weights after calling the model: 6
Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (1, 2)                    10        
                                                                 
 dense_9 (Dense)             (1, 3)                    9         
                                                                 
 dense_10 (Dense)            (1, 4)                    16        
                                                                 
Total params: 35
Trainable params: 35
Non-trainable params: 0
_________________________________________________________________


Once a model is "built", you can call its `summary()` method to display its
contents:

In [15]:
model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_8 (Dense)             (1, 2)                    10        
                                                                 
 dense_9 (Dense)             (1, 3)                    9         
                                                                 
 dense_10 (Dense)            (1, 4)                    16        
                                                                 
Total params: 35
Trainable params: 35
Non-trainable params: 0
_________________________________________________________________


However, it can be very useful when building a Sequential model incrementally
to be able to display the summary of the model so far, including the current
output shape. In this case, you should start your model by passing an `Input`
object to your model, so that it knows its input shape from the start:

In [16]:
model = keras.Sequential()
model.add(keras.Input(shape=(4,)))
model.add(layers.Dense(2, activation="relu"))

model.summary()

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_11 (Dense)            (None, 2)                 10        
                                                                 
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________


Note that the `Input` object is not displayed as part of `model.layers`, since
it isn't a layer:

In [17]:
model.layers

[<keras.layers.core.dense.Dense at 0x7ffb5e4f3610>]

A simple alternative is to just pass an `input_shape` argument to your first
layer:

In [18]:
model = keras.Sequential()
model.add(layers.Dense(2, activation="relu", input_shape=(4,)))

model.summary()

Model: "sequential_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_12 (Dense)            (None, 2)                 10        
                                                                 
Total params: 10
Trainable params: 10
Non-trainable params: 0
_________________________________________________________________


Models built with a predefined input shape like this always have weights (even
before seeing any data) and always have a defined output shape.

In general, it's a recommended best practice to always specify the input shape
of a Sequential model in advance if you know what it is.

## A common debugging workflow: `add()` + `summary()`

When building a new Sequential architecture, it's useful to incrementally stack
layers with `add()` and frequently print model summaries. For instance, this
enables you to monitor how a stack of `Conv2D` and `MaxPooling2D` layers is
downsampling image feature maps:

In [19]:
model = keras.Sequential()
model.add(keras.Input(shape=(250, 250, 3)))  # 250x250 RGB images
model.add(layers.Conv2D(32, 5, strides=2, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(3))

# Can you guess what the current output shape is at this point? Probably not.
# Let's just print it:
model.summary()

# The answer was: (40, 40, 32), so we can keep downsampling...

model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(3))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.Conv2D(32, 3, activation="relu"))
model.add(layers.MaxPooling2D(2))

# And now?
model.summary()

# Now that we have 4x4 feature maps, time to apply global max pooling.
model.add(layers.GlobalMaxPooling2D())

# Finally, we add a classification layer.
model.add(layers.Dense(10))

# And now?
model.summary()

Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 123, 123, 32)      2432      
                                                                 
 conv2d_1 (Conv2D)           (None, 121, 121, 32)      9248      
                                                                 
 max_pooling2d (MaxPooling2D  (None, 40, 40, 32)       0         
 )                                                               
                                                                 
Total params: 11,680
Trainable params: 11,680
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_6"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 123, 123, 32)      2432      
                            

Very practical, right?


## What to do once you have a model

Once your model architecture is ready, you will want to:

- Train your model, evaluate it, and run inference. See our
[guide to training & evaluation with the built-in loops](
https://www.tensorflow.org/guide/keras/train_and_evaluate/)
- Save your model to disk and restore it. See our
[guide to serialization & saving](https://www.tensorflow.org/guide/keras/save_and_serialize/).
- Speed up model training by leveraging multiple GPUs. See our
[guide to multi-GPU and distributed training](https://keras.io/guides/distributed_training/).

## Feature extraction with a Sequential model

Once a Sequential model has been built, it behaves like a [Functional API
model](https://www.tensorflow.org/guide/keras/functional/). This means that every layer has an `input`
and `output` attribute. These attributes can be used to do neat things, like
quickly
creating a model that extracts the outputs of all intermediate layers in a
Sequential model:

In [20]:
initial_model = keras.Sequential(
    [
        keras.Input(shape=(250, 250, 3)),
        layers.Conv2D(32, 5, strides=2, activation="relu"),
        layers.Conv2D(32, 3, activation="relu"),
        layers.Conv2D(32, 3, activation="relu"),
    ]
)
feature_extractor = keras.Model(
    inputs=initial_model.inputs,
    outputs=[layer.output for layer in initial_model.layers],
)

# Call feature extractor on test input.
x = tf.ones((1, 250, 250, 3))
features = feature_extractor(x)

Here's a similar example that only extract features from one layer:

In [21]:
initial_model = keras.Sequential(
    [
        keras.Input(shape=(250, 250, 3)),
        layers.Conv2D(32, 5, strides=2, activation="relu"),
        layers.Conv2D(32, 3, activation="relu", name="my_intermediate_layer"),
        layers.Conv2D(32, 3, activation="relu"),
    ]
)
feature_extractor = keras.Model(
    inputs=initial_model.inputs,
    outputs=initial_model.get_layer(name="my_intermediate_layer").output,
)
# Call feature extractor on test input.
x = tf.ones((1, 250, 250, 3))
features = feature_extractor(x)

## Transfer learning with a Sequential model

Transfer learning consists of freezing the bottom layers in a model and only training
the top layers. If you aren't familiar with it, make sure to read our [guide
to transfer learning](https://www.tensorflow.org/guide/keras/transfer_learning/).

Here are two common transfer learning blueprint involving Sequential models.

First, let's say that you have a Sequential model, and you want to freeze all
layers except the last one. In this case, you would simply iterate over
`model.layers` and set `layer.trainable = False` on each layer, except the
last one. Like this:

```python
model = keras.Sequential([
    keras.Input(shape=(784)),
    layers.Dense(32, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(32, activation='relu'),
    layers.Dense(10),
])

# Presumably you would want to first load pre-trained weights.
model.load_weights(...)

# Freeze all layers except the last one.
for layer in model.layers[:-1]:
  layer.trainable = False

# Recompile and train (this will only update the weights of the last layer).
model.compile(...)
model.fit(...)
```

Another common blueprint is to use a Sequential model to stack a pre-trained
model and some freshly initialized classification layers. Like this:

```python
# Load a convolutional base with pre-trained weights
base_model = keras.applications.Xception(
    weights='imagenet',
    include_top=False,
    pooling='avg')

# Freeze the base model
base_model.trainable = False

# Use a Sequential model to add a trainable classifier on top
model = keras.Sequential([
    base_model,
    layers.Dense(1000),
])

# Compile & train
model.compile(...)
model.fit(...)
```

If you do transfer learning, you will probably find yourself frequently using
these two patterns.

That's about all you need to know about Sequential models!

To find out more about building models in Keras, see:

- [Guide to the Functional API](https://www.tensorflow.org/guide/keras/functional/)
- [Guide to making new Layers & Models via subclassing](
https://www.tensorflow.org/guide/keras/custom_layers_and_models/)