<a href="https://colab.research.google.com/github/xixihaha1995/esp_proj3/blob/main/_3_neural_network.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline
from IPython.display import HTML, display

def set_background(color):    
    script = (
        "var cell = this.closest('.jp-CodeCell');"
        "var editor = cell.querySelector('.jp-Editor');"
        "editor.style.background='{}';"
        "this.parentNode.removeChild(this)"
    ).format(color)
    
    display(HTML('<img src onerror="{}" style="display:none">'.format(script)))

# Neural Networks

A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a **biological** neural network, made up of biological neurons, or an **artificial** neural network, used for solving artificial intelligence (AI) problems.<br>

So, each node represnet one neuron, the formula for value coming out from the neuron 2 in the hidden layer is below:<br>
$a_{hidden,2} = activation_{function}(w_{1}a_{input,1} + w_{2}a_{input, 2} + bias)$<br>
where $a$ stands for the **activation**(amplitude/magnitude/value)  coming out from the neuron, which is controlled by the $activation_{function}$<br>
Some [common activation](https://en.wikipedia.org/wiki/Activation_function#Table_of_activation_functions) functions:<br>


1.   Identity. $Identity(x) = x$
2.   Binary step. $Binary_{step}(x) = \begin{cases} 
      0 & x\leq 0 \\
      1 & x > 0
   \end{cases}$
3. Logistic(or sigmoid, $σ$, soft step). $\sigma(x) = \frac{1}{1+e^{-x}}$
4. Others (**ReLU**, **tanh**):
![Common activation functions](https://drive.google.com/uc?export=view&id=1xe8pJv_gAm0kIBgccWVZa0UCD1t56_Kh)

## Recurrent Neural Networks
References: [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) by colah'blog.<br>
1. Traditional neural network cannot use previous events to inform later ones. TO be specific, traditional neural network will integrate all the information from initial time till the end time.
![Traditional neural network](https://drive.google.com/uc?export=view&id=1gQoLLUb8u4ZSWAzTenykECongVTcxNi-)
2. Recurrent neural networks address this issue. RNN allow information from the past to persist.
![An unrolled recurrent neural network.](https://drive.google.com/uc?export=view&id=1CWZjuBwIiA50s6dEUki_CJNgeX7h27Ue)
3. The problem of long-term dependecies. Sometimes, we only need to look at recent information to perform the present task. For example, if we are trying to predict the last word in "the clouds are in the ***__***", it's pretty obvious the word is going to be ***sky***.<br>
Unfortunately, as the gap between the relevant information and the place that it's needed grows, RNNs become unable to learn to connect the information. For example, trying to predict the last word in the text “I grew up in France… I speak fluent ***French***.”
4. RNN, LSTM, GRU
Long Short Term Memory networks – usually just called “LSTMs” – are a special kind of RNN, capable of learning long-term dependencies. They were introduced by [Hochreiter & Schmidhuber (1997) ](http://www.bioinf.jku.at/publications/older/2604.pdf)<br>
A slightly more dramatic variation on the LSTM is the Gated Recurrent Unit, or GRU, introduced by [Cho, et al. (2014)](http://arxiv.org/pdf/1406.1078v3.pdf).
![Simple RNN structure](https://drive.google.com/uc?export=view&id=1a3mEns6mvwkySJ0bdcYgJimwKBHzBgfg)
![LSTM structure](https://drive.google.com/uc?export=view&id=13fNKVSLdTJctnw5vM8wYQljJPA-NlJEq)
![GRU structure](https://drive.google.com/uc?export=view&id=1_Uc6o7v1u_7ea1szvAV5ABjs74-mkTbt)


Defining a Neural Network in PyTorch
====================================
Deep learning uses artificial neural networks (models), which are
computing systems that are composed of many layers of interconnected
units. By passing data through these interconnected units, a neural
network is able to learn how to approximate the computations required to
transform inputs into outputs. In PyTorch, neural networks can be
constructed using the ``torch.nn`` package.

Introduction
------------
PyTorch provides the elegantly designed modules and classes, including
``torch.nn``, to help you create and train neural networks. An
``nn.Module`` contains layers, and a method ``forward(input)`` that
returns the ``output``.

In this recipe, we will use ``torch.nn`` to define a neural network
intended for the `MNIST
dataset <https://pytorch.org/docs/stable/torchvision/datasets.html#mnist>`__.

Setup
-----
Before we begin, we need to install ``torch`` if it isn’t already
available.

::

   pip install torch


Steps
-----

1. Import all necessary libraries for loading our data
2. Define and initialize the neural network
3. Specify how data will pass through your model
4. [Optional] Pass data through your model to test

1. Import necessary libraries for loading our data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For this recipe, we will use ``torch`` and its subsidiaries ``torch.nn``
and ``torch.nn.functional``.




In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
'''
While the former defines nn.Module classes, the latter uses a functional (stateless) approach.
To dig a bit deeper: nn.Modules are defined as Python classes and have attributes, 
e.g. a nn.Conv2d module will have some internal attributes like self.weight. 
F.conv2d however just defines the operation and needs all arguments to be passed (including the weights and bias).
'''

2. Define and intialize the neural network
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Our network will recognize images. We will use a process built into
PyTorch called convolution. Convolution adds each element of an image to
its local neighbors, weighted by a kernel, or a small matrix, that
helps us extract certain features (like edge detection, sharpness,
blurriness, etc.) from the input image.

There are two requirements for defining the ``Net`` class of your model.
The first is writing an **``__init__``** function that references
``nn.Module``. This function is where you define the fully connected
layers in your neural network.

Using convolution, we will define our model to take 1 input image
channel, and output match our target of 10 labels representing numbers 0
through 9. This algorithm is yours to create, we will follow a standard
MNIST algorithm.




In [None]:
class Net(nn.Module):
    def __init__(self):
      super(Net, self).__init__()

      # First 2D convolutional layer, taking in 1 input channel (image),
      # outputting 32 convolutional features, with a square kernel size of 3
      self.conv1 = nn.Conv2d(1, 32, 3, 1)
      # Second 2D convolutional layer, taking in the 32 input layers,
      # outputting 64 convolutional features, with a square kernel size of 3
      self.conv2 = nn.Conv2d(32, 64, 3, 1)

      # Designed to ensure that adjacent pixels are either all 0s or all active
      # with an input probability
      self.dropout1 = nn.Dropout2d(0.25)
      self.dropout2 = nn.Dropout2d(0.5)

      # First fully connected layer
      self.fc1 = nn.Linear(9216, 128)
      # Second fully connected layer that outputs our 10 labels
      self.fc2 = nn.Linear(128, 10)

my_nn = Net()
print(my_nn)

We have finished defining our neural network, now we have to define how
our data will pass through it.

3. Specify how data will pass through your model
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When you use PyTorch to build a model, you just **have to **define the
``forward`` function, that will pass the data into the computation graph
(i.e. our neural network). This will represent our feed-forward
algorithm.

You can use any of the Tensor operations in the ``forward`` function.




In [None]:
class OneHiddenLayerNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=drop_prob_):
        super().__init__()
        
        # Inputs to hidden layer linear transformation
        self.hidden = nn.Linear(input_dim, hidden_dim)
        # Output layer, 10 units - one for each digit
        self.output = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
  
    def forward(self, x):
        # Pass the input tensor through each of our operations
        x = self.hidden(x)
        x = self.output(self.relu(x))
        return x

class LSTMNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=drop_prob_):
        super(LSTMNet, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        
        self.lstm = nn.LSTM(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        
    def forward(self, x, h):
        out, h = self.lstm(x, h)
        out = self.fc(self.relu(out[:,-1]))
        return out, h
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device))
        return hidden

class GRUNet(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim, n_layers, drop_prob=drop_prob_):
        super(GRUNet, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        
        self.gru = nn.GRU(input_dim, hidden_dim, n_layers, batch_first=True, dropout=drop_prob)
        self.fc = nn.Linear(hidden_dim, output_dim)
        self.relu = nn.ReLU()
        
    def forward(self, x, h):
        out, h = self.gru(x, h)
        out = self.fc(self.relu(out[:,-1]))
        return out, h
    
    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        hidden = weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().to(device)
        return hidden

class Net(nn.Module):
    def __init__(self):
      super(Net, self).__init__()
      self.conv1 = nn.Conv2d(1, 32, 3, 1)
      self.conv2 = nn.Conv2d(32, 64, 3, 1)
      self.dropout1 = nn.Dropout2d(0.25)
      self.dropout2 = nn.Dropout2d(0.5)
      self.fc1 = nn.Linear(9216, 128)
      self.fc2 = nn.Linear(128, 10)

    # x represents our data
    def forward(self, x):
      # Pass data through conv1
      x = self.conv1(x)
      # Use the rectified-linear activation function over x
      x = F.relu(x)

      x = self.conv2(x)
      x = F.relu(x)

      # Run max pooling over x
      x = F.max_pool2d(x, 2)
      # Pass data through dropout1
      x = self.dropout1(x)
      # Flatten x with start_dim=1
      x = torch.flatten(x, 1)
      # Pass data through fc1
      x = self.fc1(x)
      x = F.relu(x)
      x = self.dropout2(x)
      x = self.fc2(x)

      # Apply softmax to x 
      output = F.log_softmax(x, dim=1)
      return output

4. [Optional] Pass data through your model to test
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To ensure we receive our desired output, let’s test our model by passing
some random data through it.




In [None]:
# Equates to one random 28x28 image
random_data = torch.rand((1, 1, 28, 28))

my_nn = Net()
result = my_nn(random_data)
print (result)

tensor([[-2.2414, -2.2682, -2.3167, -2.2478, -2.4035, -2.2451, -2.2415, -2.3222,
         -2.3839, -2.3740]], grad_fn=<LogSoftmaxBackward0>)




Each number in this resulting tensor equates to the prediction of the
label the random tensor is associated to.

Congratulations! You have successfully defined a neural network in
PyTorch.

Learn More
----------

Take a look at these other recipes to continue your learning:

- `What is a state_dict in PyTorch <https://pytorch.org/tutorials/recipes/recipes/what_is_state_dict.html>`__
- `Saving and loading models for inference in PyTorch <https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_models_for_inference.html>`__

