# Introduction to Neural Network

Neural network is fundamentally numeric computation, so any software with decent numeric computation capabilities can be used to construct and train a neural network. That said, while in theory you can construct a neural network in Excel, in practice it will be very troublesome since Excel is not designed with neural network in mind. Libraries are that specifically geared toward neural network include:
- Google's <a href="https://www.tensorflow.org/">Tensorflow</a>
- Microsoft's <a href="https://github.com/Microsoft/CNTK">CNTK</a>
- Facebook's <a href="http://pytorch.org/">PyTorch</a> and <a href="https://caffe2.ai/">Caffe2</a>
- Intel's <a href="https://ai.intel.com/neon/">neon</a>
- <a href="http://deeplearning.net/software/theano/">Theano</a> and <a href="http://caffe.berkeleyvision.org/">Caffe</a>

In this course we will focus on using <a href="https://keras.io/">```keras```</a>, which is a high-level library for constructing neural networks. Keras runs on top of a numerical computation library of your choice, defaulting to ```tensorflow```. A library such as Keras significantly simplify the workflow of constructing and training neural networks. 

<img src="http://www.ticoneva.com/econ/econ4130/images/nn_libraries.png" width="80%">

## A Simple Example: Binary Neural Network Classifier

As a first example, we will train a neural network to the following classification task:

|y|x1|x2|
|-|-|-|
|0|1|0|
|1|0|1|

To be clear: there is absolutely no need to use neural network for such as simple task. A simpler model will train a lot faster and potentially with better accuracy.

We first generate the data:

In [None]:
import numpy as np 
from sklearn.model_selection import train_test_split

#Generate 2000 samples. [1,0] -> 0, [0,1] -> 1
X = np.repeat([[1,0]], 1000, axis=0)
y = np.repeat([0], 1000, axis=0)
X = np.append(X,np.repeat([[0,1]], 1000, axis=0),axis=0)
y = np.append(y,np.repeat([1], 1000, axis=0),axis=0)

#Shuffle and split data into train set and test set
X_train, X_test, y_train, y_test = train_test_split(X,y)

We will construct a neural network classifier. Below is the simplest neural network one can come up with, with only one hidden neuron:

In [None]:
from keras.layers import Input, Dense
from keras.models import Model

# Set up layers 


# Set up model


#start training

Out-of-sample test can be conducted with ```model.evaluate()```:

The first number is the model's loss while the subsequent numbers are the metrics we specified. In our case, they are ```binary_crossentropy``` and ```accuracy``` respectively.

Unlike OLS, a neural network's performance could vary across runs. Run the code a few more times and see how the performance vary.

Make prediction (this is called *inference* in machine learning) with ```model.predict()```:

## Activations

Different activation can have profound impact on model performance. Besides ```sigmoid```, which is just a different name for the logistic function, there are other activation function such as ```tanh``` and ```relu```. ```relu```, which stands for **RE**ctified **L**inear **U**nit, is a particular common choice due to its good performance.

In [None]:
#Replace 'sigmoid' with 'relu' for the hidden layer


## Neural Network Regression

Next we are going use a neural network in a regression task. The true data generating process (DGP) is as follows:

$$
y = x^5 -2x^3 + 6x^2 + 10x - 5
$$

The model does not know the true DGP, so it needs to figure out the relationship between $y$ and $x$ from the data.

First we generate the data:

In [None]:
#Generate 1000 samples
X = np.random.rand(1000,1)
y = X**5 - 2*X**3 + 6*X**2 + 10*X - 5

#Shuffle and split data into train set and test set
X_train, X_test, y_train, y_test = train_test_split(X,y)

Then we construct the model:

In [None]:
#Layers


#Model


#Training


#Evaluate


We are going to run the model through different settings. The function contains everything we have coded previously:

In [None]:
def polyNN(data,hidden_count=100,epochs=200):
    
    X_train, X_test, y_train, y_test = data
    
    #Layers
    inputs = Input(shape=(X_train.shape[1],))
    x = Dense(hidden_count, activation='relu')(inputs)
    predictions = Dense(1, activation='linear')(x)

    #Model
    model = Model(inputs=inputs, outputs=predictions)
    model.compile(optimizer='adam',
                  loss='mean_squared_error')
    model.fit(X_train,y_train,epochs=epochs,verbose=0) #Do not display progress
    print("Hidden count:",str(hidden_count).ljust(5),
          "Parameters:",str(model.count_params()).ljust(8),
          "loss:",model.evaluate(x=X_test,y=y_test,verbose=0))

Now we can easily try out different settings:

In [None]:
#data = X_train, X_test, y_train, y_test
data = train_test_split(X,y)

#Try various neuron count
polyNN(data,hidden_count=1)
polyNN(data,hidden_count=10)
polyNN(data,hidden_count=50)
polyNN(data,hidden_count=100)
polyNN(data,hidden_count=500)

Here we see the universal approximation theory in work: the more neurons we have the better the fit.

One trick that can often improve performance: *standardizing* data.

In [None]:
from sklearn import preprocessing

#Standardize X


#Run the model again


## Deep Learning

*Deep learning* is the stacking of multiple layers of neurons. Holding the number of parameters constant, this often performs better than having only a single hidden layer.

In [None]:
#Write a function polyDNN() that have a variable number of layers



Let us run the model through various settings. I have chosen the neuron count such that the number of parameters is roughly the same as in the single-layer case.

In [None]:
polyDNN(data,hidden_count=1,layers=2)
polyDNN(data,hidden_count=10,layers=2)
polyDNN(data,hidden_count=15,layers=2)
polyDNN(data,hidden_count=36,layers=2)