## ReLU (Rectified Linear Units) Layers

After each conv layer, it is convention to apply a nonlinear layer (or activation layer) immediately afterward.The purpose of this layer is to introduce nonlinearity to a system that basically has just been computing linear operations during the conv layers (just element wise multiplications and summations).In the past, nonlinear functions like tanh and sigmoid were used, but researchers found out that ReLU layers work far better because the network is able to train a lot faster (because of the computational efficiency) without making a significant difference to the accuracy. It also helps to alleviate the vanishing gradient problem, which is the issue where the lower layers of the network train very slowly because the gradient decreases exponentially through the layers. The ReLU layer applies the function f(x) = max(0, x) to all of the values in the input volume. In basic terms, this layer just changes all the negative activations to 0.This layer increases the nonlinear properties of the model and the overall network without affecting the receptive fields of the conv layer.

[Paper](http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf) by the great Geoffrey Hinton (aka the father of deep learning).

## Pooling Layer
The function of the pooling layer is to progressively reduce the spatial size of the representation to **reduce the amount of parameters and computation** in the network, and hence to also **control overfitting**. No learning takes place on the pooling layers.
![MaxPool](img/MaxPool.png)
Pooling units are obtained using functions like **max-pooling**, **average pooling** and even **L2-norm pooling**.
![max-average-pooling](img/max-average-pooling.png)
Max pooling extracts the most important features like edges whereas, average pooling extracts features so smoothly. For image data, you can see the difference. Although both are used for same reason, I think max pooling is better for extracting the extreme features. Average pooling sometimes can’t extract good features because it takes all into count and results an average value which may/may not be important for object detection type tasks.

[max vs average](https://www.quora.com/What-is-the-impact-of-different-pooling-methods-in-convolutional-neural-networks-Are-there-any-papers-that-compare-justify-different-pooling-strategies-max-pooling-average-etc)

## Dropout Layers
Now, dropout layers have a very specific function in neural networks. In the last section, we discussed the problem of overfitting, where after training, the weights of the network are so tuned to the training examples they are given that the network doesn’t perform well when given new examples. The idea of dropout is simplistic in nature. This layer “drops out” a random set of activations in that layer by setting them to zero. Simple as that. Now, what are the benefits of such a simple and seemingly unnecessary and counterintuitive process? Well, in a way, it forces the network to be redundant. By that I mean the network should be able to provide the right classification or output for a specific example even if some of the activations are dropped out. It makes sure that the network isn’t getting too “fitted” to the training data and thus helps alleviate the overfitting problem. An important note is that this layer is only used during training, and not during test time.

[Paper](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf) by Geoffrey Hinton.

## Network in Network Layers
![NIN1](img/NIN1.png)

![NIN](img/NIN2.png)

mlpconv=convolution+1×1convolution+1×1convolution

A network in network layer refers to a conv layer where a 1 x 1 size filter is used. Now, at first look, you might wonder why this type of layer would even be helpful since receptive fields are normally larger than the space they map to. However, we must remember that these 1x1 convolutions span a certain depth, so we can think of it as a 1 x 1 x N convolution where N is the number of filters applied in the layer. Effectively, this layer is performing a N-D element-wise multiplication where N is the depth of the input volume into the layer.
![NiN](img/NetinNet.png)

[C4W2L05 Network in Network](https://www.youtube.com/watch?v=c1RBQzKsDCk)

[Paper](https://arxiv.org/pdf/1312.4400v3.pdf) by Min Lin.([local](pdf/13_network in net work.pdf))

## Inception Network
- Why choose? Use all parallel!
- Using 1 X 1 convolution (network in network) to save compution cost.
![GoogleNet](img/GoogleNet.png)
![Inception](img/Inception.png)
![111](img/111.png)
![112](img/112.png)
![Branch](img/Branch.png)
![name](img/name.png)

[C4W2L06 Inception Network](https://www.youtube.com/watch?v=C86ZXvgpejM)(Video)

[C4W2L07 Inception Network](https://www.youtube.com/watch?v=KfV8CJh7hE0)(Video)

[Paper:Going Deeper with Convolutions](pdf/2015_GoogleNet.pdf)

***
## Referrence
[A Beginner's Guide To Understanding Convolutional Neural Networks ](https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks-Part-2/)(Post)
