# Main Content
- ANN architectures
- Multi-Layer Perceptrons
- MNIST digit classification

# From Biological to Artificial Neurons
## Biological Neurons
## Logical Computations with Neurons
## The Perceptron
It is based on a slightly different artificial neuron called a **linear threshold unit(LTU)**.

![10](images/10-4.png)

The most common step function used in Perceptron is the **Heaviside step function**. Sometimes **Sign function**.

![10](images/e10-1.png)

A perceptron is simply composed of a single layer of LTUs, with each neuron connected to all the inputs.

![10](images/10-5.png)

#### How is a perceptron trained?

![10](images/e10-2.png)

An example on the iris dataset.

In [1]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:,(2,3)] # petal length, petal width
y = (iris.target == 0).astype(np.int) # Iris Setosa

per_clf = Perceptron(random_state = 42)
per_clf.fit(X,y)

y_pred = per_clf.predict([[2,0.5]])
print(y_pred)

[1]




In fact, Scikit learn's Perceptron class is equivalent to using an `SGDClassifier` with the hyperparameters:`loss='perceptron', learning_rate ='constant', eta0=1(learning rate), penalty=None(no regulazation)`

**NOTE:** Perceptrons do not output a class probability as Logistic Regressioin does. They make predictions based on a hard threshold. So Logistic Regression is preferable.

To solve trival problems like Exclusive OR(XOR) classification problem, many researchers dropped **connectionism** in favor of higher-level problems such as logic, problem solving and search. However, it turns out some of the limitations can be eliminated by stacking multiple Perceptrons, which is **Multi-layer Perceptron(MLP)**.

![10](images/10-6.png)

# Multi-Layer Perceptron and Backpropagation

![10](images/10-7.png)

#### Backpropagation -- the first way to trian MLP. 
Today we would describe it as Gradient Descent using reverse-mode autodiff.

**Description**: for each training instance the backpropagation algorithm first makes a prediction(forward pass), measures the error, then goes through each layer in reverse to measure the error contribution from each connection(reverse pass), and finally slightly tweaks the connection weights to reduce the error(Gradient Descent step).

In order for this algorithm to work properly, the authors made a key change to MLP's architecture: **replaced the step function with the logistic function. $\sigma(z)= 1/(1+exp(z))$**.

This was essential because the step function contains only flat segments, so there is no gradient to work with, while the logistic function has a well-defined nonzero derivative everywhere, allowing GD to make some progress at every step.

##### Other activation functions instead of Logistic function
- The hyperbolik tangent funciton $tanh(z)=2\sigma(2z)-1$:
    - S shaped, continuous and differentiable
    - output value ranges from -1 to 1, which tends to make each layer's output more or less normalized at the begining of training. THis helps speed up convergence.

- The ReLU funciton $ReLU(z)=max(0,z)$:
    - continuous
    - not differentiable at z=0
    - fast to compute
    - does not have a maximum output value, which helps reduce some issues during Gradient Descent.
    
![10](images/10-8.png)

![10](images/10-9.png)

**Biological neurons seem to implement a roughly sigmoid (S-shaped) activation function. But it turns out that ReLU activation function generally works better in ANNs. This is one of the cases where the biological analogy was misleading.**

# Training an MLP with TensorFlow's High-Level API
The `DNNClassifier` class makes it trivial to train a deep neural network with any number of hidden layers, and a softmax output layer to output estimated class probabilities.

In [12]:
import sklearn
import numpy as np
import tensorflow as tf
from sklearn.datasets import fetch_mldata
from sklearn.cross_validation import train_test_split


from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("datasets/MNIST_data/", one_hot=True)

# mnist = fetch_mldata('MNIST original')
X_train = mnist.train.images
y_train = mnist.train.labels
X_test = mnist.test.images
y_test = mnist.test.labels

Extracting datasets/MNIST_data/train-images-idx3-ubyte.gz
Extracting datasets/MNIST_data/train-labels-idx1-ubyte.gz
Extracting datasets/MNIST_data/t10k-images-idx3-ubyte.gz
Extracting datasets/MNIST_data/t10k-labels-idx1-ubyte.gz


**I had some problems with the following codes, plz refer to this picture to understand.**
```
feature_columns = tf.contrib.learn.infer_real_valued_columns_from_input(X_train)
# feature_columns = tf.contrib.estimator.multi_class_head(X_train)
dnn_clf = tf.contrib.learn.DNNClassifier(hidden_units=[300,100],n_classes=10, feature_columns=feature_columns)
dnn_clf.fit(x=X_train, y=y_train, batch_size=50, steps=40000)
```

![10](images/c