## The `Conv2D` layer

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

`Conv2D` performs a 2 dimensional convolution. Here we will look into this layer, by going over its four main arguments ([docs](https://keras.io/api/layers/convolution_layers/convolution2d/)):
* `filters`
* `kernel_size`
* `padding`
* `stride`

Visual representation of `Conv2D` can be found on Keras' [references for `Conv2DTranspose` operation](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2DTranspose#references_1):

* [A guide to convolution arithmetic for deep learning](https://arxiv.org/abs/1603.07285v1). Its [Github repository](https://github.com/vdumoulin/conv_arithmetic) includes animated demonstrations
* [Deconvolutional Networks](https://www.matthewzeiler.com/mattzeiler/deconvolutionalnetworks.pdf)

The animations below are taken from [conv_arithmetic](https://github.com/vdumoulin/conv_arithmetic/tree/master), in accordance with the MIT license.

**No Padding, Strides=1**

![no padding and no stride](../images/conv2d/no_padding_no_strides.gif)

**Padding and Strides=1**

![padding and no strides](../images/conv2d/full_padding_no_strides.gif)

**No Padding and Strides=2**

![no padding and strides](../images/conv2d/no_padding_strides.gif)

**Padding and Strides=2**

![padding and strides](../images/conv2d/padding_strides.gif)

### A single `Conv2D` layer
Defining a model with a single `Conv2D` layer:


Note: `keras.Input` is a class in the Keras API that is used to instantiate a Keras tensor ([docs](https://keras.io/api/layers/core_layers/input/)). It is used to define the shape of the input data for a Keras model. In the given code excerpt, `keras.Input(shape=(8, 8, 1))` creates an input layer that takes in a 4D tensor with shape `(batch_size, 8, 8, 1)`. The `shape` parameter specifies the shape of the input tensor, where `8` is the height and width of the input image, and `1` is the number of channels (grayscale). 



In [11]:
inputs = keras.Input(shape=(8, 8, 1))
x = layers.Conv2D(filters=3, kernel_size=2, activation="relu", padding='VALID')(inputs)
model = keras.Model(inputs=inputs, outputs=x)

In [12]:
model.summary()

Model: "model_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_5 (InputLayer)        [(None, 8, 8, 1)]         0         
                                                                 
 conv2d_4 (Conv2D)           (None, 7, 7, 3)           15        
                                                                 
Total params: 15
Trainable params: 15
Non-trainable params: 0
_________________________________________________________________


Let's look into the model summary:
* The `None` in a dimension stands for an arbitrary size - that means that it can change between training and inference (common), or during training (much less common). 
* The 15 parameters are nine weights and one bias term per kernel, for two kernels.
* The spatial map is reduced from an 8 by 8 matrix to a 6 by 6. This is given the padding set to 'valid'

### A `Conv2D` on an input with more than a single channel (size of depth axis > 1)

Below we demonstrate a dimensional analysis for a case where the input layer is larger than one.

In [13]:
inputs = keras.Input(shape=(8, 8, 3))
x = layers.Conv2D(filters=2, kernel_size=3, activation="relu", padding='VALID')(inputs)
model = keras.Model(inputs=inputs, outputs=x)

In [14]:
model.summary()

Model: "model_5"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_6 (InputLayer)        [(None, 8, 8, 3)]         0         
                                                                 
 conv2d_5 (Conv2D)           (None, 6, 6, 2)           56        
                                                                 
Total params: 56
Trainable params: 56
Non-trainable params: 0
_________________________________________________________________


* The number of parameters increased from 20 to 56. The calculation is as follows:
  * For each of the two kernels there are:
    * 9 weights for each of the 3 depth levels (RGB) of the input, which results 27 weights.
    * 1 bias
  * 28 parameters per kernel, resulting in 56 parameters for two kernels.

**For each single value provided by a kernel, it performs a convolution across _all_ depth 'layers' of its input layer.**

This is true regardless of where the input is. Look at the second layer of the network here, and validate that the number of its paramters is in according with the suggestion above.

In [25]:
inputs = keras.Input(shape=(8, 8, 3))
x = layers.Conv2D(filters=2, kernel_size=3, activation="relu", padding='VALID')(inputs)
x = layers.Conv2D(filters=4, kernel_size=3, activation="relu", padding='VALID')(x)
model = keras.Model(inputs=inputs, outputs=x)

In [26]:
model.summary()

Model: "model_11"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_12 (InputLayer)       [(None, 8, 8, 3)]         0         
                                                                 
 conv2d_16 (Conv2D)          (None, 6, 6, 2)           56        
                                                                 
 conv2d_17 (Conv2D)          (None, 4, 4, 4)           76        
                                                                 
Total params: 132
Trainable params: 132
Non-trainable params: 0
_________________________________________________________________


**To the student**: Why does the second layer has 38 parameters? Can you derive this number? 

### Padding

Here we are changing the `padding` argument from `VALID` to `SAME`. Compare the model summary of the two cases.  What is different? Can you say why? Hint: looking at the [docs]([Title](https://keras.io/api/layers/convolution_layers/convolution2d/)) for the `Conv2D` class can provide some insight about the purpose of the `padding` argument. 

Advanced: the [Tensorflow notes](https://www.tensorflow.org/api_docs/python/tf/nn#notes_on_padding_2) on padding describe specific calculation, including edge cases

In [6]:
inputs = keras.Input(shape=(8, 8, 1))
x = layers.Conv2D(filters=2, kernel_size=3, activation="relu", padding='SAME')(inputs)
model = keras.Model(inputs=inputs, outputs=x)

In [7]:
model.summary()

Model: "model_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_4 (InputLayer)        [(None, 8, 8, 1)]         0         
                                                                 
 conv2d_4 (Conv2D)           (None, 8, 8, 2)           20        
                                                                 
Total params: 20
Trainable params: 20
Non-trainable params: 0
_________________________________________________________________


### Stride

The default stride is 1 of `Conv2D`.
Let's compare it to a stride of 2:

In [31]:

inputs = keras.Input(shape=(8, 8, 1))
x = layers.Conv2D(filters=2, kernel_size=3, strides=2, activation="relu", padding='SAME')(inputs)
model = keras.Model(inputs=inputs, outputs=x)

In [32]:
model.summary()

Model: "model_14"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_15 (InputLayer)       [(None, 8, 8, 1)]         0         
                                                                 
 conv2d_20 (Conv2D)          (None, 4, 4, 2)           20        
                                                                 
Total params: 20
Trainable params: 20
Non-trainable params: 0
_________________________________________________________________


**To the student**: What is the difference between the model summary of this network to the one with the default `strides=1`? Why?

### `kernel_size` and the 1D Convolutional Layer (1x1)

In [33]:
inputs = keras.Input(shape=(8, 8, 1))
x = layers.Conv2D(filters=2, kernel_size=1, activation="relu", padding='SAME')(inputs)
model = keras.Model(inputs=inputs, outputs=x)

In [34]:
model.summary()

Model: "model_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_16 (InputLayer)       [(None, 8, 8, 1)]         0         
                                                                 
 conv2d_21 (Conv2D)          (None, 8, 8, 2)           4         
                                                                 
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________


In [17]:
inputs = keras.Input(shape=(8, 8, 3))
x = layers.Conv2D(filters=2, kernel_size=1, activation="relu", padding='SAME')(inputs)
model = keras.Model(inputs=inputs, outputs=x)

* Each kernel has 1 weight and 1 bias, totalling in 4 parameters for both kernels.

In [18]:
model.summary()

Model: "model_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_8 (InputLayer)        [(None, 8, 8, 3)]         0         
                                                                 
 conv2d_7 (Conv2D)           (None, 8, 8, 2)           8         
                                                                 
Total params: 8
Trainable params: 8
Non-trainable params: 0
_________________________________________________________________


**To the student**: With three input channels, how are the number of parameters calculated for a 1D convolution?

## Explicit Definition of a `Conv2D` filter

The `conv2d` filter can be defined explicitly and directly by the user. This might be useful for learning purposes.

`Keras'` `conv2d` layer is an abstraction of `tf.nn.conv2d`. Using `tf.nn.conv2d` directly allows us to define the filters directly, and observe the result of the computation ([docs](https://www.tensorflow.org/api_docs/python/tf/nn/conv2d)):

In [12]:
import numpy as np
import tensorflow as tf

Input shape (`x_in`): `batch_shape + [in_height, in_width, in_channels]`

Filter shape (`kernel_in`): `[filter_height, filter_width, in_channels, out_channels]`

In [13]:
x_in = np.array([[
  [[2], [1], [2], [0], [1]],
  [[1], [3], [2], [2], [3]],
  [[1], [1], [3], [3], [0]],
  [[2], [2], [0], [1], [1]],
  [[0], [0], [3], [1], [2]], ]])

x_in.shape

(1, 5, 5, 1)

In [14]:
kernel_in = np.array([
 [ [[2, 0.1]], [[3, 0.2]] ],
 [ [[0, 0.3]], [[1, 0.4]] ], ])

kernel_in.shape

(2, 2, 1, 2)

In [7]:
x = tf.constant(x_in, dtype=tf.float32)
kernel = tf.constant(kernel_in, dtype=tf.float32)

In [8]:
tf.nn.conv2d(x, kernel, strides=[1, 1, 1, 1], padding='VALID')


<tf.Tensor: shape=(1, 4, 4, 2), dtype=float32, numpy=
array([[[[10.       ,  1.9000001],
         [10.       ,  2.2      ],
         [ 6.       ,  1.6      ],
         [ 6.       ,  2.       ]],

        [[12.       ,  1.4      ],
         [15.       ,  2.2      ],
         [13.       ,  2.7      ],
         [13.       ,  1.7      ]],

        [[ 7.       ,  1.7      ],
         [11.       ,  1.3      ],
         [16.       ,  1.3000001],
         [ 7.       ,  1.       ]],

        [[10.       ,  0.6      ],
         [ 7.       ,  1.4      ],
         [ 4.       ,  1.5      ],
         [ 7.       ,  1.4000001]]]], dtype=float32)>