# Deep Learning
# Convolutional Neural Networks (CNN)

## What is CNN?

Since the 1980s, convolutional neural networks (CNNs) have been employed in image recognition as a result of research into the visual cortex of the brain.

A **convolutional neural network (CNN, or ConvNet) is a type of artificial neural network used to interpret visual imagery in deep learning**.

CNNs are different versions of multilayer perceptrons. Multilayer perceptrons are typically completely connected networks, meaning that each neuron in one layer is linked to all neurons in the following layer. Since all the nodes connected to each  other in every layer ANN's are vulnerable to data overfitting.

**Image search services, self-driving cars, automatic video classification systems**, and other applications rely on them. Furthermore, CNNs are not limited to visual perception; they excel at a variety of other tasks as well, including **speech recognition and natural language processing**.

## Why Convolution?
Concrete data points must be provided when using ANN. The breadth of the snout and the length of the ears, for example, must be explicitly included as data points in a model attempting to distinguish between dogs and cats.

These spatial properties are extracted from image input when CNN is used. When thousands of features need to be extracted, CNN is suitable. CNN gathers these features on its own rather than having to measure each one individually.

Because 2-dimensional images must be translated to 1-dimensional vectors when using ANN, image classification tasks become more challenging. This rapidly increases the number of trainable parameters. Increasing the number of trainable parameters necessitates more storage and processing power.

Assume an image with dimensions of 68 X 68 X 3. The dimension of the input feature is now 12,288. If we have larger images, this will be even bigger. Now, if we provide a neural network such a large input, the number of parameters will skyrocket (depending on the size of the input and neural network). More processing and memory requirements will follow, which most of us will be unable to handle.

Therefore, CNN can be a better option than ANN in such cases.

Some advantages of Convolutions:

1. Parameter sharing: a feature detector that is useful in onne part of the image is probably useful in another part of the image.
2. Sparsity of connections: In each layer, each output value depends only on a small number of inputs.

To put it together:
- CNN has weight sharing property.
- CNN extracts spatial properties 
- Using ANN rapidly increases the number of trainable parameters

With these 2 mechanisms, convolutions can enable the neurons to be trained with fewer parameters and they are more prone to overfitting. They can be trained with smaller datasets.

## Types of CNNs

1) 1D CNN: Kernel (filter) moves in only one direction with them. These kinds of CNNs are typically applied to time-series data.
2) 2D CNN: Kernels moves in two directions. These are commonly used in picture labeling and processing.
3) 3D CNN: Kernel travels in three directions. This form of CNN commonly used on 3D images such as CT scans and MRIs.
Most of the time, we'll run into 2D CNNs because they're frequently used with image data. Some of the applications of CNNs are:

- Image recognition with little preprocessing
- Recognize handwriting
- Applications for computer vision
- In banking, it is used to read the numbers on checks.
- In postal services, it is used to read zip numbers on envelopes.

# Architecture of CNNs

## Inspiration from Visual Cortex Architecture

It is discovered that many neurons in the brain’s visual cortex have a tiny local receptive field, which means that they only respond to visual stimuli in a specific portion of the visual field. Different neurons' receptive fields may overlap, and when they do, they tile the entire visual field.

Furthermore, the researchers discovered that some neurons only respond to images of horizontal lines, while others only respond to lines of varying orientations. They also discovered that some neurons have bigger receptive fields and respond to more complicated patterns that are a mixture of lower-level patterns. These findings lead to the hypothesis that higher-level neurons are predicated on the outputs of lower-level neurons.

## CNN Architecture
Deep Learning CNN models takes each input image and passes it through a sequence of convolution layers using filters(kernels) , Pooling layers, fully connected layers (FC) and finally uses a classification function to identify the object.

![image.png](attachment:ddbb4c18-e59d-4b68-a19f-91174c5339a0.png)

A typical CNN architecture consists of a few convolutional layers (each followed by a ReLU layer), a pooling layer, , another few convolutional layers (+ReLU), another pooling layer, and so on. Because of the convolutional layers, the image shrinks as it passes through the network, but it also gets deeper and catches more features. After convolution and pooling layers  a feedforward neural network with a few fully connected layers (+ReLUs) is added, and the prediction is made by the final layer of a classification function.

In summary, deep Learning CNN models takes each input image and passes it through a sequence of  convolution layers using filters(kernels), pooling layers (downsampling/subsampling), fully connected layers (FC) and finally uses a classification function to identify the object.

## Classical CNN Models
Variations of this core architecture have been evolved over time, resulting in incredible advancements in the discipline. Understanding how CNNs work can be learned by looking at those models. 5 famous models are introduced below.

1. Perhaps the most well-known CNN architecture is the **LeNet-5** architecture. Yann LeCun invented it in 1998, and it's been frequently utilized for handwritten digit recognition since then (MNIST)

![image.png](attachment:a48191a8-a09b-4a91-bdb1-0e16e473c56b.png)

2. **AlexNet** was the first to stack convolutional layers directly on top of one another, rather than stacking a pooling layer on top of each conv layer.

![image.png](attachment:b2877de0-357a-4c1d-9a1f-dff01ec7d8af.png)

3. **GoogLeNet** uses inception modules, which are subnetworks that allow it to use parameters considerably more efficiently than earlier architectures: In fact, GoogLeNet has ten times fewer parameters than AlexNet (roughly 6 million instead of 60 million).
4. **VGGNet** has a very simple and classic architecture, 2/3 convolutional layers and a pooling layer, then 2/3 convolutional layers and a pooling layer, and so on (16/19 convolutional layers in total depending on the variant). Finally there is a dense network with 2 hidden layers and the output layer. It uses a lot of but only 3x3 size filters.
5. **Residual Network (or ResNet)** composed of 152 layers (other variants had 34, 50, and 101 layers). It confirmed a common trend: models are becoming more complex and containing fewer parameters. The use of skip connections (or shortcut connections) is important to being able to train such a deep network: the signal entering into one layer is also added to the output of a layer a little higher up the stack.

## Filters

Filters in Convolutional Neural Networks identify spatial patterns such as edges in an image by detecting changes in the picture's intensity values.

![image.png](attachment:d1ff8d40-473c-422c-91c7-aa3bb7dcb515.png)

**Convolution of an image with various filters can be used to accomplish tasks such as edge detection, blurring, and sharpening.**

![image.png](attachment:f75c7051-d338-4b9b-ab51-6e9ea4a11c81.png)

### Stride and Padding

Stride determines how the filter convolves around the input volume. In the first portion of the example, the filter convolved around the input volume by moving one unit at a time. The stride is the amount by which the filter shifts. Programmers will raise the stride if they want the receptive fields to overlap less and the spatial dimensions to be smaller.

Stride (1,1) ise her pixel once saga 1er hucre, daha sonra asagi 1 hucre ve yine saga birer hucre araliklarla gidecek sekilde hareket eder. stride: yürüyerek geçmek, atlayarak geçme - padding: doldurmak

![image.png](attachment:edde7876-8560-40b4-bbbb-fdab0ee4fe72.png)

**Padding**, which is a feature that adds blank or empty pixels to the image frame works in conjunction with stride, to allow for a minimal reduction in size in the output layer. It is a method of enlarging the size of a picture to compensate for the fact that stride lowers the size.

![image.png](attachment:4630b3c5-7369-4944-8371-8b4bde65169f.png)

If we have no padding, then the convolution called **valid convolution** ; if we imply padding, them the convolution we get called **same convolution** which means that the output is the same as input. 

To sum up, stride  is the amount by which the filter shifts and padding is a method of enlarging the size of a picture to compensate for the fact that stride lowers the size.

Default stride parameter is (1,1) and default padding parameter is "valid", which means no padding.