10 月 14 号

##Artificial Neural Network

**Image recognition and classification** is a subdomain of computer vision. It is an algorithm that looks at an image and **assigns it a label** from a collection of predefined labels or categories

什么是ANN？
1. Artificial neural networks (ANN) are one of the **most powerful  artificial intelligence and machine learning algorithms.**
2. An ANN is considered a **universal function approximator** that transforms inputs into outputs.
3. As the name suggests, it **draws inspiration from neurons in our brain** and the way they are connected.

例如：
\begin{matrix}
&Biological \quad Neuron    \quad        &Artificial \quad Neuron \\
&树突                           &inputs \\
&细胞体                         &node \\
&轴突                           &output \\
&Synapse                       &weight
\end{matrix}

## What is an Artificial Neuron?
1. An artificial neuron is a simple biological neuron model in an artificial neural network.
2. It performs certain calculations to detect input data capabilities.
3. One of the most simplest models of neuron is called perceptron (or threshold logic unit).

**An example of Artificial Neuron**

Neuron = a linear function & an activation function
\begin{align}
output = f(\omega_{1} \times x_{1} + \omega_{2} \times x_{2} + \theta)
\end{align}
As this example, linear combination is: $(\omega_{1} \times x_{1} + \omega_{2} \times x_{2} + \theta)$, and the activation funcation is: $f$.

我们有很多种不同的weight, $\omega$ 和activation function, $f$.

例如：
\begin{align}
f(x) = \frac{1}{1 + e^{-x}}
\end{align}

$$f(x) = \left\{
\begin{aligned}
   &1, x > 0 \\
  -&1, x \le 0
\end{aligned}
\right.
$$

##Perception Learing rules

公式：

$$\Delta \omega_{i}  = \eta (T-O)x_{i}$$
$$\Delta \theta = \eta (T-O)$$
$$\omega_{i} = \omega_{i} + \Delta \omega_{i}$$
$$\theta = \theta + \Delta \theta$$

Here : $O$ represents "output", $T$ is label and $(T-O)$ is the error.

$\theta$ is bias, $\omega$ is weight, $x_{i}$ is input, $\eta$ is learing rate.

注意：这个公式只对perception生效（即：activity function是sign function）
其他function不要用这个公式

##Convergence Criterion
当我们的model在给定的training data上不再出错时，我们的训练结束(converge)

##Stopping Rules
1. Use maximum **training time**
2. The maximum **number of training** cycles allowed
3. Use minimum **change of accuracy**

**Why do not convergence for our stopping criterion?**

Because we may not achieve converge (e.x. non-linear data cannot converge by using Perception model)

In [None]:
import math
class Perception:
  def __init__(self):
    self.w = [0.1, 0.5]     # \omega_{i}
    self.theta = -0.8       # \theta
    self.learningRate = 0.2 # \eta
  def response(self,x):
    """ Perceptron output """
    # Calculate weighted sum
    y = x[0] * self.w[0] + x[1] * self.w[1] + self.theta
    # If weighted sum >= 0, return 1. Otherwise return 0
    # f(x):
    if y >= 0:
      return 1
    else:
      return 0
  def updateWeights(self,x,iterError): # iterError: T- O
    """ Weights update """
    # wi = wi + eta * (T-O) * xi
    self.w[0] += self.learningRate * iterError * x[0]
    self.w[1] += self.learningRate * iterError * x[1]
  def updateBias(self,iterError):
    """ Bias update """
    # theta = theta + eta * (T-O)
    self.theta += self.learningRate * iterError
  def train(self,data):
    """ Training """
    learned = True # Should perform training
    round = 0      # Initialize round to 0
    while learned:
      totalError = 0.0
      # 这个for loop是在iter所有的training data
      for x in data:
        r = self.response(x)
        if x[2] != r:                       # T - O != 0
          roundError = x[2]- r
          self.updateWeights(x,roundError)  # w_i = w_i + eta * (T-O) * xi
          self.updateBias(roundError)       # theta = theta + eta * (T-O)
          totalError += abs(roundError)
      round += 1

      # round: 最大循环次数
      if math.isclose(totalError, 0) or round >= 100:
        print("Total number of rounds (epochs): ", round)
        print("Final weights: ", self.w)
        print("Final bias: ", self.theta)
        learned = False

In [None]:
if __name__ == '__main__':
  perception = Perception()
  trainset = [[0, 0, 0], [0, 1, 0], [1, 0, 0], [1, 1, 1]]
  perception.train(trainset)

Total number of rounds (epochs):  3
Final weights:  [0.30000000000000004, 0.49999999999999994]
Final bias:  -0.8


##Remark:
1. The weight and bias are not unique. For example, if we give it different initialization, the output would be different. (SKlearn perception would ramdomly initialize all parameter for you, except learing rate.)
2. what is learning rate? $\eta$ is larger, the change of weights($\omega_{i}$) and bias($\theta$) would be fast, 'learing' is faster; conversely, $\eta$ is smaller, the change of weights and bias would be slow, 'learing' would be slower.
3. Some terminology: **Learning** AND **Epoch**

    Learning: process of updating weights in the perceptron.

    Epoch refers to one cycle through the full training dataset.

##Pros and Cons with different learning rate
1. when the learning rate $\eta$ is big: learning is fast, but might hard to converge. (例如在minimum附近震荡)
2. when the learning rate $\eta$ is small: learning is slow, and easy converge to local minimum.

##Decision Boundary
EXAMPLE:

if we get our training result: $\omega_{1} = 0.3$, $\omega_{2} = 0.5$, $\theta = -0.8$

$$y = \left\{
  \begin{aligned}
  &0, 3x_{1} + 5x_{2} < 8 \\
  &1, 3x_{1} + 5x_{2} ≥ 8
  \end{aligned}
\right.
$$

The straight line defined by $3x_1 + 5x_2 = 8$ is called Decision Boundary of the perceptron.

**Decision Boundary:**

In a **binary classification problem**, a decision boundary or decision surface is a hypersurface that **partitions** the underlying vector space **into two sets**, one for each class. The classifier will **classify all the points on one side of the decision boundary as belonging to one class and all those on the other side as belonging to the other class.**

##Shortage
Question: Can we apply the same perceptron learning
procedure for the XOR gate, which has the truth table on
the right? If so, show all the steps. If not, explain why.