# Tutorial 3
## Outline
* Numba
* Neural network in matrix notation
* Back propagation
* Activation functions
* Q&A on HW#2


## Numba and Code Acceleration

Numba will pre-compile code so that it can be executed more efficiently.<br>
[Numba documentation](http://numba.pydata.org/numba-doc/latest/user/index.html)

In [3]:
import numba
import numpy as np


In [4]:
# @numba.jit(nopython=True)
def test():
    i=0
    for a in range(100000):
        i+=a
    return i

%timeit test()

7.57 ms ± 565 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [7]:
@numba.jit(nopython=True)
def test():
    i=0
    for a in range(100000):
        i+=a
    return i

%timeit test()

96.7 ns ± 4.35 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [8]:
@numba.jit(nopython=True)
def test():
    return np.sum(np.arange(1,100000))

%timeit test()

49.4 µs ± 2.96 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)


## Neural network in matrix notation
![Neural network](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/img/example_network.svg) <br>




### Back propagation formula
The four equations for doing back propagation:
$$\begin{eqnarray}\delta^L&=&\nabla_aC\odot\sigma'(z^L) \\
\delta^l&=&((w^{l+1})^T\delta^{l+1})\odot\sigma'(z^l) \\ \frac{\partial C}{\partial b_j^l}&=&\delta_j^l \\
\frac{\partial C}{\partial w_{jk}^l}&=&a_k^{l-1}\delta_j^l
\end{eqnarray}$$

Credit: [Neural Networks and Deep Learning, Ch. 2](http://neuralnetworksanddeeplearning.com/chap2.html)

$$h_j^l = \sum_{i=0}^{n^{l-1}} w_{ij}^{l-1} h_i^{l-1}$$
$$h^l = w^{l - 1} h ^{l - 1} + b$$

In [None]:
import numpy as np

#as the diagram above architecture
#[3, 4, 2]

class NN():
    def __init__(self, architecture, learning_rate, activation_function):
        #initialize the model
        self.arch = architecture
        self.activation = activation_function
        self.learning_rate = learning_rate
        self.depth = len(self.arch)
    
    def init_weight(self):
        self.weights = []
        self.biases = []
        for l in range(self.depth - 1):
            prev_layer_number = self.arch[l]
            current_layer_number = self.arch[l + 1]
            #tip: generate random matrix for weights rather than zeros for homework
            self.weights.append(np.zeros(current_layer_number, prev_layer_number))
            self.biases.append(np.zeros(current_layer_number))
    
    def feed_forward(self, x):
        self.z_s = []
        self.a_s = [x]
        for l in range(self.depth - 1):
            z_l = self.weights[l].dot(self.a_s[-1]) + self.biases[l]
            a_l = self.activation(z_l)
            self.z_s.append(z_l)
            self.a_s.append(a_l)
            
        return self.a_s[-1]
    
    def calc_error(self, y, activation_grad):
        #todo
        self.errors = []
    
    def calc_grad(self):
        #todo
        
    def back_prop(self):
        for l in range(self.depth - 1):
            self.weights[l] = self.weights[l] - self.learning_rate * self.weights_grad[l]
            self.biases[l] = self.biases[l] - self.learning_rate * self.biases_grad[l]
    
    def fit(self, x, y, activation_grad):
        self.feed_forward(x)
        self.calc_error(y, activation_grad)
        self.calc_grad()
        self.back_prop()
    
    def predict(self, x):
        return self.feed_forward(x)
    
        

In [None]:
np.random.seed(0)
nn = NN([6, 2, 2], activation_function = tanh)
nn.init_weights()
print("Initialized prediction:", nn.predict([]))
nn.fit(x, y, tanh_grad)
print("Error in nodes", nn.errors)
print("Prediction after fitting once:", nn.predict([]))

## Activation functions
### Linear
$y=x$
<br>$y'=1$ <br>

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline
x=np.linspace(-5,5,2000)
y=x
plt.plot(x,y)
plt.title("Linear activation");

### tanh
$y=\tanh(x)$
<br>$y\in(-1,1)$
<br>$y'=1-y^2$



In [None]:
x=np.linspace(-5,5,2000)
y=np.tanh(x)
plt.plot(x,y)
plt.title("tanh");

### sigmoid
$y={\displaystyle \frac{1}{1+e^{-x}} }$
<br><br>$y\in(0,1)$
<br>$y'=y(1-y)$<br>

In [None]:
x=np.linspace(-5,5,2000)
y=1/(1+np.exp(-x))
plt.plot(x,y)
plt.title("sigmoid");

### ReLU
$y={\displaystyle \begin{equation}
\begin{cases} &x \:\:\: x\geqslant0 \\
& 0 \:\:\: x<0
\end{cases}
\end{equation}}$
<br><br>$y\in[0,\infty)$
<br><br>$y'=\begin{equation}
\begin{cases}
& 1\:\:\: x\geqslant0 \\
& 0\:\:\: x<0
\end{cases}
\end{equation}$


In [None]:
x=np.linspace(-5,5,2000)
y=x*(x>=0)
plt.plot(x,y)
plt.title("ReLU (Rectified Linear Unit)");

### softmax
$y_i=f_i(\vec{x})={\displaystyle \frac{e^{x_i}}{\sum_{j=1}^J e^{x_j}}}$
<br>$y_i\in[0,1]$
<br><br>${\displaystyle \frac{\partial y_i}{\partial x_j}=y_i(\delta_{ij}-y_j)}$


### Many more activation functions
[Check here](https://en.wikipedia.org/wiki/Activation_function)

## Some useful materials for better understanding NN
[Neural Networks, Manifolds, and Topology - Colah's blog](http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/)<br>
[How the backpropagation algorithm works - Neural Networks and Deep Learning](http://neuralnetworksanddeeplearning.com/chap2.html
)