<a href="https://colab.research.google.com/github/yitongknows/ML_Notes/blob/master/LectureNotes/Week2_ANN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Date: Sept. 21, 2020 Monday 
## Artificial Neural Networks Part 1

### Example 1 - Build a simple ANN algo
Can think of a bias as a offset to the model

In [41]:
import math

# data (first column is the bias term)
x = [[1, 0.1, -0.2],
     [1, -0.1, 0.9],
     [1, 1.2, 0.1],
     [1, 1.1, 1.5]]

# labels (desired output)
t = [0, 0, 0, 1]

# initial weights
w = [1, -1, 1]
# w = [-6, 2.6, 4]

iteration = 50
learning = 10

def simple_ann(x, w, t, iteration, learning):
  E = []

  for ii in range(iteration):
    y = []
    err = []

    for n in range(len(x)):
      v = 0
      #going through the columns in each row
      # can think of them as features
      for p in range(len(x[0])):
        v = v + x[n][p] * w[p]
      y.append(1 / (1 + math.e**(-v))) #sigmoidal activation

      #adding the MSE
      err.append((y[n] - t[n])**2) # MSE error

      # gradient descent to compute new weights
      for p in range(len(w)):
        d = x[n][p]*(y[n]-t[n])*(1-y[n])*(y[n])
        w[p] = w[p] - learning * d

    E.append(2*sum(err)/len(x))

  return (y, w, E)

In [42]:
(y, w, E) = simple_ann(x, w, t, iteration, learning)
print(y)
print(w)
print(E)

[0.0006555859880223744, 0.052128479309599574, 0.04486688739303947, 0.9484229735074757]
[-6.714065405238172, 2.6821640865189815, 4.525289511723171]
[0.9216700394753333, 0.47174135286641483, 0.4301829615348312, 0.4421539206947274, 0.5517151805512346, 0.24024312038969337, 0.04117720770325407, 0.016367680329803912, 0.014618707903067763, 0.013421342040779494, 0.01246489244609104, 0.01166518326866741, 0.010979611629285312, 0.010381661072788465, 0.009853313343121933, 0.009381662074157875, 0.008957103933209306, 0.008572278345583679, 0.008221407302216046, 0.007899865061899094, 0.007603887061594066, 0.007330366731150998, 0.0070767098102673745, 0.006840727443246292, 0.006620556129589401, 0.006414596714760084, 0.006221467162903266, 0.006039965491063455, 0.005869040319786843, 0.005707767217182989, 0.005555329508569304, 0.0054110025695456995, 0.005274140865918842, 0.005144167181055287, 0.005020563600874198, 0.004902863922786668, 0.004790647226983471, 0.004683532403159226, 0.004581173467666569, 0.004

Note: the accuracy is only 50% with weights [1 -1 1]
we can improve our algorithm by changing
the weights.

### Lost Function for ANN

Mean squared error (MSE):

$$ Error = \frac{1}{N} \sum_{i=1}^{N}
(y_i-t_i)^2 $$

**$N$** is the number of training samples  
**$y_i$** is the prediction value  
**$t_i$** is the true value  

Note that there is another type of error call "Cross Entropy" (CE)

$$ CE = -\frac{1}{N}\sum_{n=1}^{N}\sum_
{k=1}^Kt_{n,k}log(y_{n,k}) $$

**$N$** is the number of training samples  
**$K$** is the number of classes  
**$y$** is the model predicted output  
**$t$** is the target label  

### Weight Tuning Options


*   Randomly picking it until it works
*   Change one weight at a time in the direction that reduces error
* **Gradient descent**



### Sigmoid (Logistic) Activation

$$ \sigma = \frac{1}{1+e^{-x}} = \frac{e^x}{e^x + 1}$$

### Backpropagation: MSE
$$ Error = \frac{1}{N} \sum_{i=1}^{N}
(y_i-t_i)^2 $$

**$N$** is the number of training samples  
**$y_i$** is the prediction value  
**$t_i$** is the true value

**Gradient Descent**:

$$ \frac{dE}{dw_p} = \frac{2}{N}(x_p)((y_n-t_n)(1-y)(y))) $$

### Tanh Activation

$$ tanh=\frac{e^x-e^{-x}}{e^x+e^{-x}}
=\frac{e^{2x}-1}{e^{2x}+1}$$
