# Introduction to NN and AE


## Outline

- Introduction to neural network
- Implementation of a simple neural network ([reference](https://towardsdatascience.com/build-a-simple-neural-network-using-numpy-2add9aad6fc8))
- Autoencoder
- Breaking the limitation of the number of layers

## Neural Networks

The recent hype of **AI** is originated in the breakthrough of **deep neural networks**, which is a sub-domain of **machine learning**. For the rest of this course, the algorithms we are going to talk about are all deep-neural-network-related. Hence, before we go any further, we would first learn some foundations of the neural networks.

<img src='https://www.datacatchup.com/wp-content/uploads/2019/05/image.png' width='40%' description='The relationship between AI, machine learning, and deep learing.'/>

A [neural network](https://en.wikipedia.org/wiki/Neural_network) is a network or circuit of neurons, it can refer to a biological neural network made up of real biological neurons, or an artificial neural network that mathematically represents its biological counterpart. 

<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/1/10/Blausen_0657_MultipolarNeuron.png/1280px-Blausen_0657_MultipolarNeuron.png' width='60%'/>

A biological neural network (i.e., **[neural circuit](https://en.wikipedia.org/wiki/Neural_circuit)**) is a population of neurons interconnected by synapses to carry out a specific function when activated. The description of such networks can be found in Herbert Spencer's *Principles of Psychology, 3rd edition (1872)*, Theodor Meynert's *Psychiatry (1884)*, William James' *Principles of Psychology (1890)*, and Sigmund Freud's *Project for a Scientific Psychology (composed 1895)*. The first rule of neuronal learning was described by Hebb in 1949, in the Hebbian theory.

<img src='https://www.researchgate.net/profile/Erguen_Akguen/publication/326417061/figure/fig2/AS:648990437679105@1531742786278/Similarity-between-biological-and-artificial-neural-networks-Arbib-2003a-Haykin_W640.jpg' width='50%'/>

In 1943, McCulloch and Pitts created a computational model for neural networks based on mathematics and algorithms. They called this model threshold logic. The model paved the way for neural network research to split into two distinct approaches. One approach focused on biological processes in the brain and the other focused on the application of neural networks to artificial intelligence.

Farley and Clark (1954) first used computational machines, then called calculators, to simulate a Hebbian network at MIT. Other neural network computational machines were created by Rochester, Holland, Habit, and Duda (1956).

Rosenblatt (1958) created the **[perceptron](https://en.wikipedia.org/wiki/Perceptron)**, an algorithm for pattern recognition based on a two-layer learning computer network using simple addition and subtraction. With mathematical notation, Rosenblatt also described circuitry not in the basic perceptron, such as the exclusive-or circuit, a circuit whose mathematical computation could not be processed until after the **[backpropagation](https://en.wikipedia.org/wiki/Backpropagation)** algorithm was created by Werbos (1975).

Now, let's go back to 1975 and rebuild a simple artificial neural network.





## Implementation of a simple neural network

Before the implementation, we need to formulate the problem in a proper form. The figure bellow illustrates a simple neural netowrk with a single input layer (of two variables) mapping directly toward one output variable. 

<img src='https://upload.wikimedia.org/wikipedia/commons/thumb/4/42/A_simple_neural_network_with_two_input_units_and_one_output_unit.png/375px-A_simple_neural_network_with_two_input_units_and_one_output_unit.png' width='30%' />

According to this *graph representation*, we can re-phrase our problem as: **finding the optimal values of $w_1$ and $w_2$ such that the formula $w_1 x_1$ + $w_2 x_2$ = $Y$ yeilds the best performance**. We can further extend the two input variables to $n$ variables, and the problem becomes a *$Y$ = $WX$ problem, where $W = \{w_1, w_2, ..., w_n\}$ and $X = \{x_1, x_2, ..., x_n\}$.

At this level of representation, we would say that a single-layuered NN is almost identical to *[linear regression](https://en.wikipedia.org/wiki/Linear_regression)*, which is also a *$Y$ = $WX$ + $\epsilon$* problem.

As described in previous section, NN is a mathematical mimic of biological neurons, where the *input* stimulus does not directly transfer to the next neron, but through an *activation* process. A neuron is *activated* if the strength of its input passes a certain threshold, $\theta$. Hence, in addition to the simple graph representation, a more precise flowchart can be illustrated as following.

<img src='https://upload.wikimedia.org/wikipedia/commons/6/60/ArtificialNeuronModel_english.png' width='60%' />

In the new version, the earlier model is wrapped by an activation function, $\Psi$, which sneds out a signal $o$ if the value of $\Psi(\Sigma(W^TX))$ is greater than the threshold $\theta$. 

Now, the form of our neural network looks exactly like *[logistic regression](https://en.wikipedia.org/wiki/Logistic_regression)*. So, why do we need neural networks if it is the same as regression?

The true distinction between neural networks and other machine learning models is **the way it seraches for the optimal weights, backward-propagation**. 