# Perceptron and Multi-Layer Perceptron

First the imports, Always

## Artificial Neural Networks

There are many ways of knitting the nodes of a neural network together, and each way results in a more or less complex behavior. Possibly the simplest of all topologies is the feed-forward network. Signals flow in one direction only; there is never any loop in the signal paths.

![title](img/NNFL_1.png)


Typically, ANN’s have a layered structure. The input layer picks up the input signals and passes them on to the next layer, the so-called ‘hidden’ layer. (Actually, there may be more than one hidden layer in a neural network.) Last comes the output layer that delivers the result.

Note: Input layer is not a layer as such, and should not be called an input layer, rather, should be named only as "Inputs". However, since many of the textbooks do adhere to this notation, we use the term input layer here.

![title](img/NNFL_2.png)


Just like a biological neuron has dendrites to receive signals, a cell body to process them, and an axon to send signals out to other neurons, the artificial neuron has a number of input channels, a processing stage, and one output that can fan out to multiple other artificial neurons.

#### 1) Each input gets scaled up or down

When a signal comes in, it gets multiplied by a weight value that is assigned to this particular input. That is, if a neuron has three inputs, then it has three weights that can be adjusted individually. During the learning phase, the neural network can adjust the weights based on the error of the last test result.

#### 2) All signals are summed up

In the next step, the modified input signals are summed up to a single value. In this step, an offset is also added to the sum. This offset is called bias. The neural network also adjusts the bias during the learning phase.

This is where the magic happens! At the start, all the neurons have random weights and random biases. After each learning iteration, weights and biases are gradually shifted so that the next result is a bit closer to the desired output. This way, the neural network gradually moves towards a state where the desired patterns are “learned”.

#### 3) Activation

Finally, the result of the neuron’s calculation is turned into an output signal. This is done by feeding the result to an activation function (also called transfer function).

# The Perceptron

The most basic form of an activation function is a simple binary function that has only two possible results.

![title](img/NNFL_3.png)


This function returns 1 if the input is positive or zero, and 0 for any negative input. A neuron whose activation function is a function like this is called a perceptron.

![title](img/NNFL_4.png)


A single perceptron, as bare and simple as it might appear, is able to learn where this line is, and when it finished learning, it can tell whether a given point is above or below that line.

## Single Perceptron for a two input AND gate 

## Now lets try the same thing with a XOR Gate using MLP

Let's initialize the dataset

# Multi-Layered Perceptron

As we can see the XOR Gate fails on a single perceptron. A look at the graph will give us enough intuition as to why this is the case.

![title](img/xor.png)

 We need two lines instead of the one generated by the perceptron. Since we obviously can't use a single perceptron for this case we have to make our architecture slightly more complex. We'll use the backpropagation algorithm. This has two phases :
 
 1) <b>Forward Propagation</b> (Exactly what happens during a normal perceptron)
 
 2) <b>Backpropagation</b> (Calculating the errors and updating weights. This differs between layers)

![title](img/backprop1.png)

![title](img/backprop2.png)



### Sigmoid Function (Activation)

Lets make the activation function that we'll be using. This activation function is called sigmoid and is extremely useful because of one particular property that makes it valuable during backpropagation. It also squashes the output between 0 and 1 making it easier for us to calculate probabilities for the output signals.
#### This is the equation
![title](img/sigmoid_equation.png)


![title](img/sigmoid_curve.png)


We see that when we differentiate this equation we get a very simple equation.
#### The differentiation
![title](img/sigmoid_derivative.jpg)
 

Using this during the backpropagation phase of the algorithm is computationally less intensive than calculating other gradients. (And easier to code :P)

## Assembling the MLP together

### First initialize all the variables and set the hyperparameters

### Now for the actual algorithm