## Convolutional Neural Networks

-  Broadly speaking, the CNN learns to recognize basic lines and curves, then shapes and blobs, and then increasingly complex objects within the image as we progress through the layers of the network. Finally, it classifies the image by combining the larger, more complex objects. 
-  CNN share parameters across space.
-  An image has width, height, and depth (# of color channels).
-  CNNs take out a 'patch' or **kernel** of the image moving across the whole image and create a mini network (with fewer weights that are shared across space). Instead having layers of matrices to multiply, we have stacks of convolutions.
-  They will form a pyramid - at the bottom it is a big shallow image of RGB, then progressively squeeze the spatial dimensions at each layer while increasing the depth. At the top the spatial information is squeezed out but the parameters that map the content to the image remain.
-  Each layer of the pyramid or stack is called a **feature map**.
-  The **stride** is the number of pixels that you are shifting when moving the window or filter across the image.
-  Valid padding - remain withing image borders/edges or **same padding** - go off the edge of the image and pad with zeroes such that output map size is exactly the same as input map.

#### Breaking up an Image

-  The first step for a CNN is to break up the image into smaller pieces. We do this by selecting a width and height that defines a filter.
-  The filter looks at small pieces, or patches, of the image. These patches are the same size as the filter.
-  We then simply slide this filter horizontally or vertically to focus on a different piece of the image.
-  The amount by which the filter slides is referred to as the 'stride'. The stride is a hyperparameter which can be tuned. Increasing the stride reduces the size of your model by reducing the number of total patches each layer observes. However, this usually comes with a reduction in accuracy.
-  The key idea is that we are *grouping together adjacent pixels* and treating them as a collective because  pixels in an image are close together for a reason and have special meaning. 

#### Filter depth

- It's common to have more than one filter. Different filters pick up different qualities of a patch. For example, one filter might look for a particular color, while another might look for a kind of object of a specific shape. The amount of filters in a convolutional layer is called the filter depth.
-  How many neurons does each patch connect to?
-  That’s dependent on our filter depth. If we have a depth of k, we connect each patch of pixels to k neurons in the next layer. This gives us the height of k in the next layer, as shown below. In practice, k is a hyperparameter we tune, and most CNNs tend to pick the same starting values.

#### Parameter Sharing

-  Translational invariance - we don't care where the object is located in the image.
-  To classify an object, we have to use the same weights and biases for objects of the same type regardless of where they are in the image so that they are both classified as the same type.
-  This is exactly what we do in CNNs. The weights and biases we learn for a given output layer are shared across all patches in a given input layer. Note that as we increase the depth of our filter, the number of weights and biases we have to learn still increases, as the weights aren't shared across the output channels.

#### Padding
-  We keep the same dimensions between layers by adding zeroes.
-  TensorFlow uses the following equation for 'SAME' vs 'VALID'

SAME Padding, the output height and width are computed as:

out_height = ceil(float(in_height) / float(strides[1]))

out_width = ceil(float(in_width) / float(strides[2]))

VALID Padding, the output height and width are computed as:

out_height = ceil(float(in_height - filter_height + 1) / float(strides[1]))

out_width = ceil(float(in_width - filter_width + 1) / float(strides[2]))

#### Dimensionality 

-  From what we've learned so far, how can we calculate the number of neurons of each layer in our CNN?

-  Given:

    our input layer has a width of W and a height of H <br />
    our convolutional layer has a filter size F<br />
    we have a stride of S<br />
    a padding of P<br />
    and the number of filters K,<br />


-  the following formula gives us the width of the next layer: W_out =[ (W−F+2P)/S] + 1.
-  The output height would be H_out = [(H-F+2P)/S] + 1.
-  And the output depth would be equal to the number of filters D_out = K.
-  The output volume would be W_out * H_out * D_out.
-  Knowing the dimensionality of each additional layer helps us understand how large our model is and how our decisions around filter size and stride affect the size of our network.