# CNN 

### Filters 


!['Convolutions'](p1.png)

- The first step for a CNN is to break up the image into smaller pieces, aka patches. 
- CNN uses filters to split an image into smaller patches. 
- The size of these patches matches the filter size.

Slide filter horizontally or vertically to focus on a different piece of the image.

- The amount by which the filter slides is referred to as the **'stride'**.
  - The stride is a hyperparameter which we can tune. 
- Increasing the stride reduces the size of the model by reducing the number of total patches each layer observes.
  - However, this usually comes with a reduction in accuracy.

- Important idea: **Grouping together adjacent pixels** and treating them as a collective.

    - In a non-convolutional neural network, we would have ignored this adjacency. 
    - In a normal network, we would have connected every pixel in the input image to a neuron in the next layer. In doing so, we would not have **taken advantage of the fact that pixels in an image are close together for a reason and have special meaning**.

    - By taking advantage of this local structure, our CNN learns to classify local patterns, like shapes and objects, in an image.

##### Filter Depth
- It's common to have more than one filter. 
    - Different filters pick up different qualities of a patch. For example, one filter might look for a particular color, while another might look for a kind of object of a specific shape. 
    - The amount of filters in a convolutional layer is called the **filter depth**.
    
How many neurons does each patch connect to?

- If we have a depth of **k**, we connect each patch of pixels to **k** neurons in the next layer.
    - This gives us the height of **k** in the next layer, as shown below. 
    - In practice, **k** is a hyperparameter we tune, and most CNNs tend to pick the same starting values.

Having multiple neurons for a given patch ensures that the CNN can learn to capture whatever characteristics from given data. 
- The CNN isn't "programmed" to look for certain characteristics. 
- Rather, it learns on its own which characteristics to notice.

### Tensorflow Strides, Dapth and Padding 

![](p2.PNG)

Given 
```python

input = tf.placeholder(tf.float32, (None, 32, 32, 3))

# height, width, input_depth, output_depth = 8, 8, 3, 20
filter_weights = tf.Variable(tf.truncated_normal((8, 8, 3, 20))) 
filter_bias = tf.Variable(tf.zeros(20))

# batch, height, width, depth
strides = [1, 2, 2, 1] 
padding = 'SAME'

conv = tf.nn.conv2d(input, filter_weights, strides, padding) + filter_bias
```

- Output shape of conv is [1, 16, 16, 20] - A 4D to account for batch size.
- It's not [1, 14, 14, 20] because TensorFlow uses padding algorithm different than $ \frac{(n + 2p -f)}{s} +1  $

- If we switch padding from `SAME` to `VALID` then the output shape is [1, 13, 13, 20]

Summary: 
- **SAME Padding**, the output height and width are computed as:
    - $ out\_height =  ceil( \frac{in\_height} {strides[1]} ) $
    - $ out\_width  = ceil( \frac{in\_width} {strides[2]} ) $
    
- **VALID Padding**, No padding. Output height and width are computed as:
    - $ out\_height =  ceil(\frac{in\_height - filter\_height + 1} {strides[1]}) $
    - $ out\_width  =  ceil(\frac{in\_width  - filter\_width  + 1} {strides[2]}) $

- **Non Tensorflow**: $ output\_height = \frac{(n + 2p -f)}{s} +1  $ 
    - n: input height 
    - p: padding   
    - f: filter height
    - s: stride 

### Number of parameters 

**Given**
- Input of shape 32x32x3 (HxWxD)
- 20 filters of shape 8x8x3 (HxWxD)
- A stride of 2 for both the height and width (S)
- Zero padding of size 1 (P)

**Output Layer**
- $ output\_shape = \frac{(n + 2p -f)}{s} +1  $  = 14x14x20 (HxWxD)

**How many parameters does the convolutional layer have (without parameter sharing)?**

- Without parameter sharing, each neuron in the output layer must connect to each neuron in the filter. 
    - Each neuron in the output layer must also connect to a single bias neuron.
- parameters = (8 * 8 * 3 + 1) * (14 * 14 * 20) = 756,560
    - 8 * 8 * 3 is the number of weights, plus 1 for the bias. 
    - Each weight is assigned to every single part of the output (14 * 14 * 20).
    - why not times with  20?


### Parameter Sharing

The weights, `w`, are shared across patches for a given layer in a CNN to detect the **object or feature** regardless of where in the image the **object** is located.

- This is known as *statistical invariance* or *translation invariance*

The classification of a given patch in an image is determined by the weights and biases corresponding to that patch.
- If we want a **cat** that’s in the top left patch to be classified in the same way as a **cat** in the bottom right patch, we need the weights and biases corresponding to those patches to be the same, so that they are classified the same way.
- This is exactly what we do in CNNs. The weights and biases we learn for a given output layer are shared across all patches in a given input layer. 
    - Note that as we increase the depth of our filter, the number of weights and biases we have to learn still increases, as the weights aren't shared across the output channels.
- There’s an additional benefit to sharing parameters. 
    - If we did not reuse the same weights across all patches, we would have to learn new parameters for every single patch and hidden layer neuron pair. 
    - This does not scale well, especially for higher fidelity images. 
    - Thus, sharing parameters not only helps us with translation invariance, but also gives us a smaller, more scalable model.
    
**Given**
- Input of shape 32x32x3 (HxWxD)
- 20 filters of shape 8x8x3 (HxWxD)
- A stride of 2 for both the height and width (S)
- Zero padding of size 1 (P)

**Output Layer**
- $ output\_shape = \frac{(n + 2p -f)}{s} +1  $  = 14x14x20 (HxWxD)

**How many parameters does the convolutional layer have (with parameter sharing)?**
- This is the number of parameters actually used in a convolution layer **tf.nn.conv2d()**
- With parameter sharing, each neuron in an output channel shares its weights with every other neuron in that channel
- So the number of parameters is equal to the number of neurons in the filter, plus a bias neuron, all multiplied by the number of channels in the output layer
```python
(8 * 8 * 3 + 1) * 20 = 3840 + 20 = 3860
```

