### grp

## Hands-On Machine Learning with Scikit-Learn & TensorFlow

## CHAPTER 10: Introduction to Artificial Neural Networks

### Artificial Neural Networks [ANNs]:
-  emulates the brain's architecture for building artificial intelligence machines
-  examples:
    -  classifying billions of images
    -  speech recognition services
    -  user recommnedation systems

### Deep Neural Network [DNNs]:
-  when an ANN contains 2 or more hidden layers

### Why use ANNs / DNNs?:
-  large volumes of data available to train neural networks
-  ANNs outperform ML techniques on very large and complex datasets
-  powerful GPUs available for faster computing power

### Biological Neurons:
-  found in animal cerebral cortexes
-  contains a nucleas and branching extensions called **_dendrites_** and **_axon_**
-  **_axon_** splits into many branches called **_telodendria_** with **_synaptic terminals_ [_synapses_]** (connected to the **_dendrites_**) at the end tip
-  biological neurons receive short electrical impulses called **_signals_** from other neurons via **_synapses_**
-  neurons fire its own signals when it receives many signals from other neurons

### Artificial Neurons:
-  contains 1 or more **binary** (on/off) inputs and one binary output
-  it activates its output when a certain number of its inputs are active

#### Scenarios [page 259]
-  The first network on the left is simply the identity function: if neuron A is activa‐ ted, then neuron C gets activated as well (since it receives two input signals from neuron A), but if neuron A is off, then neuron C is off as well.
-  The second network performs a logical AND: neuron C is activated only when both neurons A and B are activated (a single input signal is not enough to acti‐ vate neuron C).
-  The third network performs a logical OR: neuron C gets activated if either neu‐ ron A or neuron B is activated (or both).
-  Finally, if we suppose that an input connection can inhibit the neuron’s activity (which is the case with biological neurons), then the fourth network computes a slightly more complex logical proposition: neuron C is activated only if neuron A is active and if neuron B is off. If neuron A is active all the time, then you get a logical NOT: neuron C is active when neuron B is off, and vice versa.

## Perceptron:
-  based on **Linear Threshold Unit** (LTU):
    1.  inputs and output are numbers [instead of binary on/off values] + each input associated with a weight
    2.  computes a weighted sum of its inputs (z = w1 * x1 + w2 * x2 + ... + wn * xn = wT * x) 
    3.  applies a _step function_ (ex: **heaviside step function**) to step 2 sum
    4.  outputs result

-  composed of a single layer of LTUs with each neuron connected (**inputs neurons**) to all the inputs
-  __input neurons__ => output whatever input they are fed
-  __bias neurons__ => output 1 all the time
-  predictions are based on hard threshold ***perceptrons do not output a class probability like LR***
-  ML Model Example:
    -  Linear Binary Classification => if results exceed LTU calculated threshold then label positive class else label negative class
-  **Multi-Layer Perceptron** (MLP):
    -  stacks multiple perceptrons
    -  composed of ... :
        -  one input layer (_contains a bias neuron_)
        -  one or more LTU layers aka **hidden layers** (_contains a bias neuron_)
        -  one final LTU layer aka **output layer** (_does **NOT** contains a bias neuron_)

### How are Perceptrons Trained?:
-  https://en.wikipedia.org/wiki/Hebbian_theory:
    -  connection weight between 2 neurons is increased whenever they have the same output
-  takes into account error made by the network
-  is fed one training instance at a time and makes its predictions for each instance
-  every output neuron that makes a wrong prediction reinforces the connection weights from the inputs that would have contributed to the correct prediction

## Backpropagation:
-  article => https://scholar.google.com/scholar?q=Learning+Internal+Representations+by+Error+Propagation
-  innovative method to successfully train MLPs
-  hybrid method using Gradient Descent using reverse-mode autodiff
-  popular activation functions:
    -  hyperbolic tangent function tanh
    -  ReLU

_technical summary under the hood => **backpropagation**_ ...

**For each training instance, the algorithm feeds it to the network and computes the output of every neuron in each consecutive layer (this is the forward pass, just like when making predictions). Then it measures the network’s output error (i.e., the dif‐ ference between the desired output and the actual output of the network), and it computes how much each neuron in the last hidden layer contributed to each output neuron’s error. It then proceeds to measure how much of these error contributions came from each neuron in the previous hidden layer—and so on until the algorithm reaches the input layer. This reverse pass efficiently measures the error gradient across all the connection weights in the network by propagating the error gradient backward in the network (hence the name of the algorithm). If you check out the reverse-mode autodiff algorithm in Appendix D, you will find that the forward and reverse passes of backpropagation simply perform reverse-mode autodiff. The last step of the backpropagation algorithm is a Gradient Descent step on all the connec‐ tion weights in the network, using the error gradients measured earlier.** - Aurelien Geron [Hands on ML w SKLearn & TF]

### high level steps:
1.  make a prediction (**forward pass**) for each training instance
2.  measures error
3.  goes through each layer in reverse to measure error contribution from each connection (**reverse pass**)
4.  slightly tweaks connection weights to reduce error (**gradient descent**)

https://en.wikipedia.org/wiki/Backpropagation

https://stackoverflow.com/questions/4752626/epoch-vs-iteration-when-training-neural-networks

## Neural Network Hyperparameters:
-  neural networks are difficult to tune because they are so flexible and have many hyperparameters to tweak
-  flexible volatile options ... :
    -  number of layers
    -  number of neurons per layer
    -  type of activation function use in each layer
    -  weight initialization logic
    -  etc ...
-  solutions include ... :
    -  grid search w/ cross validation [_not recommended w/ large datasets because too time consuming_]
    -  **randomized search**

### Reusing Neural Networks:
-  reuse lower layers of the first network
-  network will not have to learn all the low-level structures from scratch that occur in most datasets (ex: pictures)
-  network will only have to learn the higher-level structures (ex: different objects in pictures)
-  scenario ... :
    -  large image classification / speech recognition require networks with dozen of layers + huge training set
    -  usually not needed to train these large models from scratch hence reuse parts of pretrained state of the art network learning from a similar use case

### Neural Network Best Practices:
-  many problems can start with just 1 or 2 hiddren layers
-  complex problems can require gradually increasing number of hidden layers when training set starts to overfit
-  try gradually increasing the number of neurons until the network starts to overfit the training set
-  recommended to increase number of layers over increasing number of neurons per layer
-  tweaking number of neurons still considered a challenge (**no best practice ... yet**)
-  try **early stopping** to prevent overfitting
-  ReLU is a good choice for activation function
-  softmax activation function for output layer is a good choice for classification (multi class)
-  logistic activation function for output layer is a good choice for classification (binary)

## _Exercises_

In [1]:
import tensorflow as tf
print(tf.__version__)

import sklearn
print(sklearn.__version__)

1.13.1
0.20.0


### sklearn ltu network:
-  perceptron learning algorithm strongly resembles Stochastic Gradient Descent
-  equivalent to sklearn SDGClassifier w/ following hyperparameters set:
    -  loss = "perceptron"
    -  learning_rate = "constant"
    -  eta [learning rate] = 1
    -  penalty = None (no regularization)

In [2]:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.linear_model import Perceptron

iris = load_iris()
X = iris.data[:, (2, 3)]  # petal length, petal width
y = (iris.target == 0).astype(np.int) # iris setosa?

per_clf = Perceptron(max_iter=100, tol=-np.infty, random_state=42)
per_clf.fit(X, y)

y_pred = per_clf.predict([[2, 0.5]])

### tf mlp dnn classifier via high level tf.learn api:
-  model trains dnn classification w/ 2 hidden layers [300 neurons, 100 neurons]
-  softmax output layer w/ 10 neurons
-  run 40 epochs using batches of 50 instances
-  **DNNClassifier** class creates all neuron layers based on ReLU activation function

In [3]:
# keras mnist dataset load
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
X_train = X_train.astype(np.float32).reshape(-1, 28*28) / 255.0
X_test = X_test.astype(np.float32).reshape(-1, 28*28) / 255.0
y_train = y_train.astype(np.int32)
y_test = y_test.astype(np.int32)
X_valid, X_train = X_train[:5000], X_train[5000:]
y_valid, y_train = y_train[:5000], y_train[5000:]
print(X_train.shape)
print(X_valid.shape)
print(X_test.shape)

(55000, 784)
(5000, 784)
(10000, 784)


In [4]:
feature_cols = [tf.feature_column.numeric_column("X", shape=[28 * 28])]
dnn_clf = tf.estimator.DNNClassifier(hidden_units=[300,100], n_classes=10,
                                     feature_columns=feature_cols)

input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"X": X_train}, y=y_train, num_epochs=40, batch_size=50, shuffle=True)
dnn_clf.train(input_fn=input_fn)

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/var/folders/1p/9j4_29dj62jf1c6tc1qqmk140000gn/T/tmpbvepjwbn', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x126ca55c0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}
Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
T

INFO:tensorflow:global_step/sec: 468.509
INFO:tensorflow:loss = 0.9880459, step = 5401 (0.213 sec)
INFO:tensorflow:global_step/sec: 492.514
INFO:tensorflow:loss = 1.7908235, step = 5501 (0.203 sec)
INFO:tensorflow:global_step/sec: 492.114
INFO:tensorflow:loss = 2.4486098, step = 5601 (0.204 sec)
INFO:tensorflow:global_step/sec: 485.151
INFO:tensorflow:loss = 0.44385594, step = 5701 (0.206 sec)
INFO:tensorflow:global_step/sec: 403.39
INFO:tensorflow:loss = 0.23869362, step = 5801 (0.248 sec)
INFO:tensorflow:global_step/sec: 478.174
INFO:tensorflow:loss = 1.6547506, step = 5901 (0.209 sec)
INFO:tensorflow:global_step/sec: 486.748
INFO:tensorflow:loss = 2.2692287, step = 6001 (0.205 sec)
INFO:tensorflow:global_step/sec: 453.539
INFO:tensorflow:loss = 1.480107, step = 6101 (0.221 sec)
INFO:tensorflow:global_step/sec: 472.847
INFO:tensorflow:loss = 1.0975902, step = 6201 (0.212 sec)
INFO:tensorflow:global_step/sec: 477.423
INFO:tensorflow:loss = 1.6957027, step = 6301 (0.209 sec)
INFO:tenso

INFO:tensorflow:loss = 0.06414779, step = 13601 (0.248 sec)
INFO:tensorflow:global_step/sec: 428.541
INFO:tensorflow:loss = 0.021150632, step = 13701 (0.233 sec)
INFO:tensorflow:global_step/sec: 414.163
INFO:tensorflow:loss = 0.16577165, step = 13801 (0.241 sec)
INFO:tensorflow:global_step/sec: 408.655
INFO:tensorflow:loss = 0.10047861, step = 13901 (0.245 sec)
INFO:tensorflow:global_step/sec: 440.688
INFO:tensorflow:loss = 0.4225209, step = 14001 (0.227 sec)
INFO:tensorflow:global_step/sec: 387.686
INFO:tensorflow:loss = 0.045048404, step = 14101 (0.258 sec)
INFO:tensorflow:global_step/sec: 430.668
INFO:tensorflow:loss = 0.15089414, step = 14201 (0.232 sec)
INFO:tensorflow:global_step/sec: 366.471
INFO:tensorflow:loss = 0.09425853, step = 14301 (0.273 sec)
INFO:tensorflow:global_step/sec: 425.15
INFO:tensorflow:loss = 0.023670204, step = 14401 (0.235 sec)
INFO:tensorflow:global_step/sec: 371.181
INFO:tensorflow:loss = 0.09779631, step = 14501 (0.270 sec)
INFO:tensorflow:global_step/se

INFO:tensorflow:loss = 0.015009104, step = 21701 (0.402 sec)
INFO:tensorflow:global_step/sec: 242.053
INFO:tensorflow:loss = 0.1377824, step = 21801 (0.414 sec)
INFO:tensorflow:global_step/sec: 300.941
INFO:tensorflow:loss = 0.038613603, step = 21901 (0.332 sec)
INFO:tensorflow:global_step/sec: 251.778
INFO:tensorflow:loss = 0.16367874, step = 22001 (0.398 sec)
INFO:tensorflow:global_step/sec: 231.671
INFO:tensorflow:loss = 0.013690178, step = 22101 (0.433 sec)
INFO:tensorflow:global_step/sec: 247.05
INFO:tensorflow:loss = 0.024327064, step = 22201 (0.404 sec)
INFO:tensorflow:global_step/sec: 297.879
INFO:tensorflow:loss = 0.047565654, step = 22301 (0.334 sec)
INFO:tensorflow:global_step/sec: 212.797
INFO:tensorflow:loss = 0.06992117, step = 22401 (0.470 sec)
INFO:tensorflow:global_step/sec: 256.239
INFO:tensorflow:loss = 0.0585014, step = 22501 (0.392 sec)
INFO:tensorflow:global_step/sec: 245.03
INFO:tensorflow:loss = 0.03857902, step = 22601 (0.407 sec)
INFO:tensorflow:global_step/se

INFO:tensorflow:loss = 0.023866562, step = 29801 (0.293 sec)
INFO:tensorflow:global_step/sec: 380.555
INFO:tensorflow:loss = 0.015376767, step = 29901 (0.262 sec)
INFO:tensorflow:global_step/sec: 325.707
INFO:tensorflow:loss = 0.009959009, step = 30001 (0.308 sec)
INFO:tensorflow:global_step/sec: 334.842
INFO:tensorflow:loss = 0.011732012, step = 30101 (0.298 sec)
INFO:tensorflow:global_step/sec: 317.885
INFO:tensorflow:loss = 0.03602004, step = 30201 (0.315 sec)
INFO:tensorflow:global_step/sec: 330.264
INFO:tensorflow:loss = 0.041253977, step = 30301 (0.303 sec)
INFO:tensorflow:global_step/sec: 296.521
INFO:tensorflow:loss = 0.07109792, step = 30401 (0.337 sec)
INFO:tensorflow:global_step/sec: 275.923
INFO:tensorflow:loss = 0.02161691, step = 30501 (0.362 sec)
INFO:tensorflow:global_step/sec: 316.123
INFO:tensorflow:loss = 0.07072684, step = 30601 (0.319 sec)
INFO:tensorflow:global_step/sec: 264.901
INFO:tensorflow:loss = 0.018758105, step = 30701 (0.374 sec)
INFO:tensorflow:global_st

INFO:tensorflow:global_step/sec: 308.624
INFO:tensorflow:loss = 0.011579474, step = 37901 (0.325 sec)
INFO:tensorflow:global_step/sec: 276.242
INFO:tensorflow:loss = 0.011995622, step = 38001 (0.362 sec)
INFO:tensorflow:global_step/sec: 346.754
INFO:tensorflow:loss = 0.0023810267, step = 38101 (0.287 sec)
INFO:tensorflow:global_step/sec: 333.958
INFO:tensorflow:loss = 0.0010013838, step = 38201 (0.299 sec)
INFO:tensorflow:global_step/sec: 294.362
INFO:tensorflow:loss = 0.038221553, step = 38301 (0.340 sec)
INFO:tensorflow:global_step/sec: 289.071
INFO:tensorflow:loss = 0.018653762, step = 38401 (0.346 sec)
INFO:tensorflow:global_step/sec: 321.818
INFO:tensorflow:loss = 0.021150798, step = 38501 (0.311 sec)
INFO:tensorflow:global_step/sec: 347.266
INFO:tensorflow:loss = 0.007319644, step = 38601 (0.288 sec)
INFO:tensorflow:global_step/sec: 338.486
INFO:tensorflow:loss = 0.0056379535, step = 38701 (0.295 sec)
INFO:tensorflow:global_step/sec: 339.979
INFO:tensorflow:loss = 0.027045483, st

<tensorflow_estimator.python.estimator.canned.dnn.DNNClassifier at 0x1037d92b0>

### evaluate metrics

In [5]:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"X": X_test}, y=y_test, shuffle=False)
eval_results = dnn_clf.evaluate(input_fn=test_input_fn)

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2019-03-23T04:10:52Z
INFO:tensorflow:Graph was finalized.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from /var/folders/1p/9j4_29dj62jf1c6tc1qqmk140000gn/T/tmpbvepjwbn/model.ckpt-44000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2019-03-23-04:10:52
INFO:tensorflow:Saving dict for global step 44000: accuracy = 0.9779, average_loss = 0.10576631, global_step = 44000, loss = 13.388141
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 44000: /var/folders/1p/9j4_29dj62jf1c6tc1qqmk140000gn/T/tmpbvepjwbn/model.ckpt-44000


In [6]:
eval_results

{'accuracy': 0.9779,
 'average_loss': 0.10576631,
 'loss': 13.388141,
 'global_step': 44000}

In [7]:
y_pred_iter = dnn_clf.predict(input_fn=test_input_fn)
y_pred = list(y_pred_iter)
y_pred[0]

INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /var/folders/1p/9j4_29dj62jf1c6tc1qqmk140000gn/T/tmpbvepjwbn/model.ckpt-44000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.


{'logits': array([-11.029446 ,   2.6809397,  -5.0362144,   2.0475237, -10.583523 ,
        -11.072868 , -24.23713  ,  19.431154 ,  -5.1649723,   1.053862 ],
       dtype=float32),
 'probabilities': array([5.9037715e-14, 5.3146497e-08, 2.3656833e-11, 2.8208831e-08,
        9.2212838e-14, 5.6529000e-14, 1.0841922e-19, 1.0000000e+00,
        2.0798785e-11, 1.0443431e-08], dtype=float32),
 'class_ids': array([7]),
 'classes': array([b'7'], dtype=object)}

### dnn via lower level tf api:
-  low level api is used for more control over the architecture of the network
-  Part 1 => construction phase (BUILD TF GRAPH):
    1.  create placeholders for inputs / targets
    2.  create func to build nueron layer
    3.  use nueron layer func to create DNN
    4.  define cost function
    5.  create optimizer
    6.  define perf measure
-  Part 2 => execution phase (RUN GRAPH TO TRAIN MODEL):
    1.  define number of epochs / mini-batch size
    2.  train model
-  **V1 => via custom udf**
-  **V2 => via built-in tf func**

### V1

In [8]:
# 1 => create placeholders for inputs / targets
n_inputs = 28*28  # MNIST features; 28*28 = 784 neurons required
n_hidden1 = 300 # number of neurons
n_hidden2 = 100 # number of neurons
n_outputs = 10 # number of neurons

In [9]:
# 1 => create placeholders for inputs / targets
tf.reset_default_graph()

# shape = None because unsure how many training instances each batch will contain

# X => 2D tensor (matrix) w/ instances as 1st dim and features as 2nd dim
X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X") # input layer during execution phase
# y => 1D tensor w/ 1 entry per instance
y = tf.placeholder(tf.int32, shape=(None), name="y")

In [10]:
# 2 => create func to build nueron layer
def neuron_layer(X, n_neurons, name, activation=None):
    """
    X => inputs
    n_neurons => # of neruons
    name => name of layer
    activation = activation func
    """
    with tf.name_scope(name):
        n_inputs = int(X.get_shape()[1])
        stddev = 2 / np.sqrt(n_inputs)
        init = tf.truncated_normal((n_inputs, n_neurons), stddev=stddev) # stddev makes DNN more efficient
        W = tf.Variable(init, name="kernel")
        b = tf.Variable(tf.zeros([n_neurons]), name="bias")
        Z = tf.matmul(X, W) + b
        if activation is not None:
            return activation(Z)
        else:
            return Z

In [11]:
# 3 => use nueron layer func to create DNN
with tf.name_scope("dnn"):
    hidden1 = neuron_layer(X, n_hidden1, name="hidden1", # X as input
                           activation=tf.nn.relu)
    hidden2 = neuron_layer(hidden1, n_hidden2, name="hidden2", # hidden1 output as input
                           activation=tf.nn.relu)
    logits = neuron_layer(hidden2, n_outputs, name="outputs") # hidden2 output as input; ready for input to activation func

In [12]:
# 4 => define cost function
with tf.name_scope("loss"): # cost func for training
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y,
                                                              logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

In [13]:
# 5 => create optimizer
learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate) # GDO for tweaking model parameters to min cost func
    training_op = optimizer.minimize(loss)

In [14]:
# 6 => define perf measure
with tf.name_scope("eval"): # model eval
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [15]:
init = tf.global_variables_initializer() # initialize variables
saver = tf.train.Saver() # save trained model params to disk

In [16]:
# 1 => define number of epochs / mini-batch size
n_epochs = 40
batch_size = 50

In [17]:
def shuffle_batch(X, y, batch_size):
    rnd_idx = np.random.permutation(len(X))
    n_batches = len(X) // batch_size
    for batch_idx in np.array_split(rnd_idx, n_batches):
        X_batch, y_batch = X[batch_idx], y[batch_idx]
        yield X_batch, y_batch

In [18]:
# 2 => train model
with tf.Session() as sess: # open tf session
    init.run() # run init node to initialize all vars
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_val = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
        print(epoch, "Batch accuracy:", acc_batch, "Val accuracy:", acc_val)

    save_path = saver.save(sess, "./my_model_final_v1.ckpt")

0 Batch accuracy: 0.9 Val accuracy: 0.9168
1 Batch accuracy: 0.92 Val accuracy: 0.9332
2 Batch accuracy: 0.96 Val accuracy: 0.9414
3 Batch accuracy: 0.94 Val accuracy: 0.949
4 Batch accuracy: 0.98 Val accuracy: 0.9552
5 Batch accuracy: 0.96 Val accuracy: 0.9582
6 Batch accuracy: 0.98 Val accuracy: 0.9622
7 Batch accuracy: 0.96 Val accuracy: 0.9614
8 Batch accuracy: 0.96 Val accuracy: 0.9636
9 Batch accuracy: 1.0 Val accuracy: 0.9656
10 Batch accuracy: 1.0 Val accuracy: 0.967
11 Batch accuracy: 0.96 Val accuracy: 0.9676
12 Batch accuracy: 1.0 Val accuracy: 0.9684
13 Batch accuracy: 0.98 Val accuracy: 0.9698
14 Batch accuracy: 1.0 Val accuracy: 0.9694
15 Batch accuracy: 1.0 Val accuracy: 0.9716
16 Batch accuracy: 1.0 Val accuracy: 0.973
17 Batch accuracy: 1.0 Val accuracy: 0.9732
18 Batch accuracy: 1.0 Val accuracy: 0.9726
19 Batch accuracy: 0.96 Val accuracy: 0.9742
20 Batch accuracy: 1.0 Val accuracy: 0.974
21 Batch accuracy: 1.0 Val accuracy: 0.9742
22 Batch accuracy: 0.98 Val accurac

In [19]:
# restore saved model
with tf.Session() as sess:
    saver.restore(sess, "./my_model_final_v1.ckpt") # or better, use save_path
    X_new_scaled = X_test[:20]
    Z = logits.eval(feed_dict={X: X_new_scaled})
    y_pred = np.argmax(Z, axis=1)

INFO:tensorflow:Restoring parameters from ./my_model_final_v1.ckpt


In [20]:
print("Predicted classes:", y_pred)
print("Actual classes:   ", y_test[:20])

Predicted classes: [7 2 1 0 4 1 4 9 6 9 0 6 9 0 1 5 9 7 3 4]
Actual classes:    [7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4]


### V2:
-  same construction and execution phase steps apply
-  only difference is using tf.layers.dense func to build neuron layer

In [29]:
n_inputs = 28*28  # MNIST
n_hidden1 = 300
n_hidden2 = 100
n_outputs = 10

In [30]:
tf.reset_default_graph()

X = tf.placeholder(tf.float32, shape=(None, n_inputs), name="X")
y = tf.placeholder(tf.int32, shape=(None), name="y")

In [31]:
with tf.name_scope("dnn"):
    hidden1 = tf.layers.dense(X, n_hidden1, name="hidden1",
                              activation=tf.nn.relu)
    hidden2 = tf.layers.dense(hidden1, n_hidden2, name="hidden2",
                              activation=tf.nn.relu)
    logits = tf.layers.dense(hidden2, n_outputs, name="outputs")
    y_proba = tf.nn.softmax(logits)

In [24]:
with tf.name_scope("loss"):
    xentropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=y, logits=logits)
    loss = tf.reduce_mean(xentropy, name="loss")

In [25]:
learning_rate = 0.01

with tf.name_scope("train"):
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    training_op = optimizer.minimize(loss)

In [26]:
with tf.name_scope("eval"):
    correct = tf.nn.in_top_k(logits, y, 1)
    accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))

In [27]:
init = tf.global_variables_initializer()
saver = tf.train.Saver()

In [28]:
n_epochs = 20
n_batches = 50

with tf.Session() as sess:
    init.run()
    for epoch in range(n_epochs):
        for X_batch, y_batch in shuffle_batch(X_train, y_train, batch_size):
            sess.run(training_op, feed_dict={X: X_batch, y: y_batch})
        acc_batch = accuracy.eval(feed_dict={X: X_batch, y: y_batch})
        acc_valid = accuracy.eval(feed_dict={X: X_valid, y: y_valid})
        print(epoch, "Batch accuracy:", acc_batch, "Validation accuracy:", acc_valid)

    save_path = saver.save(sess, "./my_model_final_v2.ckpt")

0 Batch accuracy: 0.9 Validation accuracy: 0.9036
1 Batch accuracy: 0.9 Validation accuracy: 0.9222
2 Batch accuracy: 0.9 Validation accuracy: 0.9372
3 Batch accuracy: 0.94 Validation accuracy: 0.9428
4 Batch accuracy: 0.94 Validation accuracy: 0.9478
5 Batch accuracy: 0.96 Validation accuracy: 0.9518
6 Batch accuracy: 0.96 Validation accuracy: 0.9566
7 Batch accuracy: 0.94 Validation accuracy: 0.958
8 Batch accuracy: 0.94 Validation accuracy: 0.9598
9 Batch accuracy: 0.96 Validation accuracy: 0.9622
10 Batch accuracy: 0.98 Validation accuracy: 0.965
11 Batch accuracy: 0.92 Validation accuracy: 0.9674
12 Batch accuracy: 1.0 Validation accuracy: 0.9688
13 Batch accuracy: 0.94 Validation accuracy: 0.9682
14 Batch accuracy: 0.98 Validation accuracy: 0.9694
15 Batch accuracy: 0.96 Validation accuracy: 0.9702
16 Batch accuracy: 0.98 Validation accuracy: 0.9714
17 Batch accuracy: 1.0 Validation accuracy: 0.9716
18 Batch accuracy: 1.0 Validation accuracy: 0.9714
19 Batch accuracy: 1.0 Validat

### additional exercises:

https://github.com/ageron/handson-ml/blob/master/10_introduction_to_artificial_neural_networks.ipynb

1. Draw an ANN using the original artificial neurons (like the ones in Figure 10-3) that computes A ⊕ B (where ⊕ represents the XOR operation). Hint: A ⊕ B = (A ∧ ¬ B) ∨ (¬ A ∧ B).
2. Why is it generally preferable to use a Logistic Regression classifier rather than a classical Perceptron (i.e., a single layer of linear threshold units trained using the Perceptron training algorithm)? How can you tweak a Perceptron to make it equivalent to a Logistic Regression classifier?
3. Why was the logistic activation function a key ingredient in training the first MLPs?
4. Name three popular activation functions. Can you draw them?
5. Suppose you have an MLP composed of one input layer with 10 passthrough neurons, followed by one hidden layer with 50 artificial neurons, and finally one output layer with 3 artificial neurons. All artificial neurons use the ReLU activa‐ tion function. • What is the shape of the input matrix X? • What about the shape of the hidden layer’s weight vector Wh, and the shape of its bias vector bh? • What is the shape of the output layer’s weight vector Wo, and its bias vector bo? • What is the shape of the network’s output matrix Y? • Write the equation that computes the network’s output matrix Y as a function of X, Wh, bh, Wo and bo.
6. How many neurons do you need in the output layer if you want to classify email into spam or ham? What activation function should you use in the output layer? If instead you want to tackle MNIST, how many neurons do you need in the out‐ put layer, using what activation function? Answer the same questions for getting your network to predict housing prices as in Chapter 2.
7. What is backpropagation and how does it work? What is the difference between backpropagation and reverse-mode autodiff?
8. Can you list all the hyperparameters you can tweak in an MLP? If the MLP over‐ fits the training data, how could you tweak these hyperparameters to try to solve the problem?
9. Train a deep MLP on the MNIST dataset and see if you can get over 98% preci‐ sion. Just like in the last exercise of Chapter 9, try adding all the bells and whistles (i.e., save checkpoints, restore the last checkpoint in case of an interruption, add summaries, plot learning curves using TensorBoard, and so on).

### grp