<a href="https://colab.research.google.com/github/williambrunos/Introduction-To-ML/blob/main/Class_5/Neural_Networks.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Neural Networks

Neural networks is a computacional way to mimic the way that human beings learn things. To understand that, we need to understand how a neuron work.

## How dows neurons works? (Overly Simplified)

A **neuron** is the main cell of the **neuro system**. It is composed of **dendrites**, wich ones recieves informations from other neurons trough eletrical signs. If these eletrical signs trespasses a certain threshold, the **nucleus** of the neuron allows the passage of this information recieved trough the tail of the neuron and to the **axons**, which passes these eletrical signs to other neurons.

![Neuron](https://upload.wikimedia.org/wikipedia/commons/thumb/b/b5/Neuron.svg/1200px-Neuron.svg.png)

Now, imagine that a certain information needs to be processed. For doing this, our neuron system sends some informations to a certain neuron layer, which are capable of understanding and extracting patterns of this informations.

These processed and with a certain complexity informations are passed through the axons to other neurons, wich are capable of processing and undertanding patterns even more complexes than the other informations from the previous layer.

And this process keeps going on, until the system has the full information processed.

So...how does it works artificialy? Let's see!

## The Artificial Neuron

Those eletrical signals recieved by the neurons on neuro system can be represented artificially by input values $x_i$, which one represents a certain information comming to be processed by the neuron. Each information has a certain weight $w_i$, which represents how strong is the presence of the input  $x_i$ for the calculus of the information.

These inputs can either be excitatory or inhibitory. Inhibitory inputs are those that have maximum effect on the decision making irrespective of other inputs i.e., if x_3 is 1 (not home) then my output will always be 0 i.e., the neuron will never fire, so x_3 is an inhibitory input. Excitatory inputs are NOT the ones that will make the neuron fire on their own but they might fire it when combined together.

The weighted sum of the inputs for the neurons are done with a certain bias $b_0$:

$$v = \sum_{i=1}^{m}(x_i w_i + b_0)$$

The value of v could represent our processed information, but this would simplify a lot our system architecture, because our info would be represented by a linear function, which does not needs any complex architecture, wich is the case of neural networks.

So, the value of $v$ is passed as an argument for a function called the **activation function** $\phi(v)$, that do a more complex transformation on the information than a simple linear transformation. The output of this function is the resultant transformed information called $y$, which can be used by other neurons in the next layer or can be the resultant info.

![Artificial Neuron](https://www.gabormelli.com/RKB/images/thumb/3/31/artificial-neuron-model.png/600px-artificial-neuron-model.png)

And the output of the processes is passed on to the next layers in a hierarchical manner, some of the neurons will fire and some won’t and this process goes on until it results in a final response.

## The Perceptron

The perceptron is the most simple example of a artificial neuron, wich has the output based on the function below:

$$y = 1, \; if \; \sum_{i=1}^{m}(x_iw_i + b_0) \ge 0$$
$$y = 0, \; if \; \sum_{i=1}^{m}(x_iw_i + b_0) < 0 $$

**obs**: The McCulloch Pitts Neuron didn't had the weights, being just a sum of the input values to outuput the function. 

See more: [McCulloch Pitts Neuron](https://towardsdatascience.com/mcculloch-pitts-model-5fdf65ac5dd1)

Why does not we use the McCulloch Pitts Neuron on nowdays? Because it is a very simplistic way to model a system. It needs fixed thresholds and gives the same importance to all the inputs, and we want a model that learns the weights of each input and the thresholds!

## Multi Layer Perceptron (MLP)

Notice that one unique perceptron would be very simplistic with the problem. So, we are going to use various layers of perceptrons, each on with one or more than one perceptron.

![MLP](https://www.researchgate.net/publication/334609713/figure/fig1/AS:783455927406593@1563801857102/Multi-Layer-Perceptron-MLP-diagram-with-four-hidden-layers-and-a-collection-of-single.jpg)

## Neural Networks Implementation with Scikit-Learn

In [1]:
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
import numpy as np 

from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_boston

In [2]:
X, y = load_boston(return_X_y=True)


    The Boston housing prices dataset has an ethical problem. You can refer to
    the documentation of this function for further details.

    The scikit-learn maintainers therefore strongly discourage the use of this
    dataset unless the purpose of the code is to study and educate about
    ethical issues in data science and machine learning.

    In this special case, you can fetch the dataset from the original
    source::

        import pandas as pd
        import numpy as np


        data_url = "http://lib.stat.cmu.edu/datasets/boston"
        raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
        data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
        target = raw_df.values[1::2, 2]

    Alternative datasets include the California housing dataset (i.e.
    :func:`~sklearn.datasets.fetch_california_housing`) and the Ames housing
    dataset. You can load the datasets as follows::

        from sklearn.datasets import fetch_california_h

In [3]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)

On neural networks, we use learning based on optimizers. Because of this, we have to standardize the values on the dataframe. Doing that, the model will not priorize some inputs rather than others just because the values are not in the same range of values.

In [None]:
# See the documentation
# MinMaxScaler?

In [4]:
mm = MinMaxScaler()

In [5]:
X_train = mm.fit_transform(X_train)
X_test = mm.transform(X_test)

In [6]:
mlp = MLPRegressor(hidden_layer_sizes=(500, 500, 500), max_iter=1000, solver='lbfgs', shuffle=False)

In [7]:
mlp.fit(X_train, y_train)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
  self.n_iter_ = _check_optimize_result("lbfgs", opt_res, self.max_iter)


MLPRegressor(hidden_layer_sizes=(500, 500, 500), max_iter=1000, shuffle=False,
             solver='lbfgs')

In [8]:
mlp.score(X_test, y_test)

0.8099558466927437