<h1 style="text-align: center">Introduction to Neural Networks and The Preceptron</h1>

<img style='width:300px' src='images/move_on.jpg'/>

## Neural Network Lecture Overview

##### 1. Introduction to Neural Networks and The Perceptron 
##### 2. Multilayer Perceptron and Deeper Neural Networks 
        - Application : Classifier with Keras
##### 3. Convolutional Neural Networks
        - Application: CNN image classifier with Keras and CoLab 

## Applications of Neural Networks

- Clustering
- Pattern Recognition
- Image Recognition (CNN)
- Time Series Forecasting (RNN)
- Audio/Video/Image Generation (GAN) 

LIMITATIONS
- Good for prediction bad for inference 
- Computationally expensive 

## The Perceptron

The perceptron algorithm is about learning the weights for the inputs in order to draw linear decision boundary that allows us to discriminate between the two linearly separable classes.

<img src='images/perceptron_binary.png'/>

### McCulloch-Pitts Perceptron

The idea for a perceptron (aka a neuron the building block of neural networks) was formulated in 1943 by Warren McCulloch and Walter Pitts. Modeled after the neurons the brain they proposed a mathematical process that can take in a set of inputs, apply some sort of aggregation function <i>(g)</i> on the inputs and subsequently apply a decision function/<b>activation function</b> <i>(f)</i> on the aggregation. If the aggregation is greater than a certain threshold the process returns a 1, else a 0. This can be used to for a binary classification. 

<img src='images/mcCulloch_Pitts.png' />

This model required manually determining the threshold and each input was given equal importance, so a few years later Frank Rosenblatt proposed a model that could "learn" the threshold. 

### Frank Rosenblatt's Perceptron

<img src='images/rosenblatt_perceptron_schematic.png'/>

Rosenblatt introduced the concept of weights (w) being applied to each input with larger weights giving more importance to the feature it is applied to. These weights can then be updated based on whether the perceptron made a correct classification or not. 

<h3>Unit Step Function</h3>

Before diving into the process of updating weights let's cover how the basic decision function in a perceptron works: 
If the sum of the inputs multiplied by their corresponding weights is greater than a certain threshold the function returns a 1 otherwise it returns a 0 (or -1). 

Aggregation : 
<p style='font-size:24px'>$z = w_{1}x_{1}... + w_{n}x_{n}$</p>

Decision : 
<p style='font-size:24px'>$ g(z) =
  \begin{cases}
    1       & \quad \text{if } n > \theta \\
    0  & \quad \text   otherwise
  \end{cases}
$</p>

#### How do we determine the best <b style="font-size: 16px">$\theta$</b> aka threshold (also written as $w_{0}$)?


If we move <b style="font-size: 16px">$\theta$</b> to other side as a weight to an input of 1 we can update this too based on if the perceptron is able to classify things correctly. 

Aggregation : 
<p style='font-size:24px'>$z = \theta * 1 + w_{1}x_{1}... + w_{n}x_{n}$</p>
<p style='font-size:24px'>$z = w_{0} * 1 + w_{1}x_{1}... + w_{n}x_{n}$</p>

Decision : 
<p style='font-size:24px'>$ g(z) =
  \begin{cases}
    1       & \quad \text{if } n > 0 \\
    0  & \quad \text   otherwise
  \end{cases}
$</p>

<h4>Unit Step Function aka Heaveside Step Function</h4>

<img src='images/unit_step.svg'/>

<h3>Perceptron Learning</h3>

A perceptron learns by updating the weights it applies to its inputs. The algorithm functions as follows : 

1. Initialize the weights to 0 or small random numbers.
2. For each training sample x(i):
    - Calculate the output value.
    - Update the weights.

If the output value matches the true output value, the weights do not need to be updated. If the preceptron predicts a 1 when it should be a zero it means the weights need to be made smaller and vice versa. The weights are updated as follows: 

<p style='font-size:24px'>$\Delta w_j:=w_j+\delta w_j$</p>

<p style='font-size:24px'>$\Delta w_j = \eta * (target − output ) * x_j$</p>



This process continues until all of our inputs are classified correctly (convergence). However, this is not the most efficient way to update our weights. 

### Adaptive Linear Neuron (Adaline) 

In contrast to perceptron learning, the <b>delta rule</b> of the Adaline updates the weights based on an output of a  linear activation function rather than a unit step function --> The output is a continuous value rather than a 1 or 0. 

##### What does this allow us to do? 

This allows us to differentiate it and in turn define a cost function : 

<p style="font-size: 20px">$J(w)= \frac{1}{2}\sum_{i} (target^{(i)}−output^{(i)})^2$</p>

If you recall from linear regression the amount we adjust our weights (called coefficients in linear regression) is dependent on how steep our cost curve for a given set of weights. We can determine "steepness" by taking the partial derivate of the cost function $J(w)$ with respect to weight ($\frac{\partial J}{\partial w}$)

<p style="font-size: 20px">$\frac{\partial J}{\partial w_j} = \sum_{i}(target^{(i)}−output^{(i)})* x_{j}^{(i)}$</p>

If you want to see this derivation - visit the first link in the resources section. 

We multiply this value by a learning rate as we did before to soften the change and add this to our previous weight. 

<p style="font-size: 20px">$\Delta w = \eta * \sum_{i}(target^{(i)}−output^{(i)})* x_{j}^{(i)}$</p>
<p style="font-size: 20px">$w := w + \Delta w $</p>

## Next Up : Multilayer Perceptrons

How do we deal with data that is not linearly separable? 

## Resources

https://sebastianraschka.com/Articles/2015_singlelayer_neurons.html#the-perceptron-learning-rule 

https://www.youtube.com/watch?v=ntKn5TPHHAk