# Tutorial MNIST Simple

Load the lib and start the session

In [1]:
import tensorflow as tf
import tensorlayer as tl

sess = tf.InteractiveSession()

`InteractiveSession` installs itself as the default session on construction.

In [3]:
# prepare data
X_train, y_train, X_val, y_val, X_test, y_test = \
                                tl.files.load_mnist_dataset(shape=(-1,784))

Load or Download MNIST > data/mnist/
data/mnist/train-images-idx3-ubyte.gz
Downloading train-labels-idx1-ubyte.gz...113%
Succesfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.
Downloading t10k-images-idx3-ubyte.gz...100%
Succesfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.
data/mnist/t10k-images-idx3-ubyte.gz
Downloading t10k-labels-idx1-ubyte.gz...180%
Succesfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.


## Define placeholder

Since we have to train our example, so we have to build a placeholder.

The `tf.placeholder` is used to feed actual training examples.

[For more information](https://www.tensorflow.org/programmers_guide/reading_data)

In [4]:
x = tf.placeholder(tf.float32, shape=[None, 784], name='x')
y_ = tf.placeholder(tf.int64, shape=[None, ], name='y_')

## Define the Network

`InputLayer` class is the starting layer of a neural network. The first thing that we to do is to get our network prepared.

In [6]:
network = tl.layers.InputLayer(x, name='input_layer')

  [TL] InputLayer  input_layer: (?, 784)


The name is an optional name to attach to this layer. But it would be better if you feed the words.

Now, we use TensorLayer to build the Models.

This is where TensorLayer steps in. It allows you to define an arbitrarily structured neural network by creating and stacking or merging layers. Since every layer knows its immediate incoming layers, the output layer (or output layers) of a network double as a handle to the network as a whole, so usually this is the only thing we will pass on to the rest of the code.

`tutorial_mnist_simple.py` is a simple example for MNIST dataset.

Here we don't talk much about the layers that we made here, if you are interest in these, please do go the [TensorLayer API - Layer](https://tensorlayer.readthedocs.io/en/latest/modules/layers.html)

In [7]:
network = tl.layers.DropoutLayer(network, keep=0.8, name='drop1')
network = tl.layers.DenseLayer(network, n_units=800,
                                act = tf.nn.relu, name='relu1')
network = tl.layers.DropoutLayer(network, keep=0.5, name='drop2')
network = tl.layers.DenseLayer(network, n_units=800,
                                act = tf.nn.relu, name='relu2')
network = tl.layers.DropoutLayer(network, keep=0.5, name='drop3')

  [TL] DropoutLayer drop1: keep:0.800000 is_fix:False
  [TL] DenseLayer  relu1: 800 relu
  [TL] DropoutLayer drop2: keep:0.500000 is_fix:False
  [TL] DenseLayer  relu2: 800 relu
  [TL] DropoutLayer drop3: keep:0.500000 is_fix:False


The softmax is implemented internally in tl.cost.cross_entropy(y, y_) to speed up computation, so we use identity here.

In [8]:
network = tl.layers.DenseLayer(network, n_units=10,
                                act = tf.identity,
                                name='output_layer')

  [TL] DenseLayer  output_layer: 10 identity


## Define cost function and metric.

`tl.cost.cross_entropy()` is a softmax cross-entropy operation, returns the TensorFlow expression of cross-entropy of two distributions, implement softmax internally.

With the given Tensorflow variable(in this scenario, is `y` and `y_`)

About cross-entropy [WikiPedia](https://en.wikipedia.org/wiki/Cross_entropy) 

Measures the probability error in discrete classification tasks in which the classes are mutually exclusive (each entry is in exactly one class). For example, each CIFAR-10 image is labeled with one and only one label: an image can be a dog or a truck, but not both.

And the `equal` will return the truth value of (x == y) element-wise.Will return a `Tensor` of type `bool`.

Using all of these method, we could define cost function and metric.

In [9]:
y = network.outputs
cost = tl.cost.cross_entropy(y, y_, name='xentropy')
correct_prediction = tf.equal(tf.argmax(y, 1), y_)
acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
y_op = tf.argmax(tf.nn.softmax(y), 1)

## Define the optimizer

`tf.train.AdamOptimizer())`is using an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

* learning_rate: A Tensor or a floating point value.  The learning rate.
* beta1: A float value or a constant float tensor. The exponential decay rate for the 1st moment estimates.
* beta2: A float value or a constant float tensor. The exponential decay rate for the 2nd moment estimates.
* epsilon: A small constant for numerical stability.
* use_locking: If True use locks for update operations.
* name: Optional name for the operations created when applying gradients.(Defaults to "Adam".)

In [10]:
train_params = network.all_params

train_op = tf.train.AdamOptimizer(learning_rate=0.0001, beta1=0.9, beta2=0.999,
                                  epsilon=1e-08, use_locking=False).minimize(cost, var_list=train_params)

## Initialize all variables in the session

In [11]:
tl.layers.initialize_global_variables(sess)

## Print network information

In [12]:
network.print_params()

  param   0: (784, 800)      (mean: -7.893177826190367e-05, median: -0.00025377183919772506, std: 0.08798402547836304)   relu1/W:0
  param   1: (800,)          (mean: 0.0               , median: 0.0               , std: 0.0               )   relu1/b:0
  param   2: (800, 800)      (mean: 2.5794106477405876e-05, median: -2.705508904909948e-06, std: 0.08786776661872864)   relu2/W:0
  param   3: (800,)          (mean: 0.0               , median: 0.0               , std: 0.0               )   relu2/b:0
  param   4: (800, 10)       (mean: 0.00037536685704253614, median: -7.521080988226458e-05, std: 0.08868692070245743)   output_layer/W:0
  param   5: (10,)           (mean: 0.0               , median: 0.0               , std: 0.0               )   output_layer/b:0
  num of params: 1276810


Print all the layers that we have build for the network.

In [13]:
network.print_layers()

  layer 0: Tensor("drop1/mul:0", shape=(?, 784), dtype=float32)
  layer 1: Tensor("relu1/Relu:0", shape=(?, 800), dtype=float32)
  layer 2: Tensor("drop2/mul:0", shape=(?, 800), dtype=float32)
  layer 3: Tensor("relu2/Relu:0", shape=(?, 800), dtype=float32)
  layer 4: Tensor("drop3/mul:0", shape=(?, 800), dtype=float32)
  layer 5: Tensor("output_layer/Identity:0", shape=(?, 10), dtype=float32)


## Train the network

In [14]:
tl.utils.fit(sess, network, train_op, cost, X_train, y_train, x, y_,
             acc=acc, batch_size=500, n_epoch=500, print_freq=5,
             X_val=X_val, y_val=y_val, eval_train=False)

Start training the network ...
Epoch 1 of 500 took 2.163315s
   val loss: 0.553532
   val acc: 0.825900
Epoch 5 of 500 took 1.254972s
   val loss: 0.282738
   val acc: 0.917800
Epoch 10 of 500 took 1.169460s
   val loss: 0.222307
   val acc: 0.937500
Epoch 15 of 500 took 1.272432s
   val loss: 0.187691
   val acc: 0.947200
Epoch 20 of 500 took 1.416087s
   val loss: 0.163059
   val acc: 0.954700
Epoch 25 of 500 took 1.381155s
   val loss: 0.146343
   val acc: 0.960100
Epoch 30 of 500 took 1.258913s
   val loss: 0.131556
   val acc: 0.963100
Epoch 35 of 500 took 1.570845s
   val loss: 0.121536
   val acc: 0.966500
Epoch 40 of 500 took 1.439608s
   val loss: 0.113252
   val acc: 0.969300
Epoch 45 of 500 took 1.432343s
   val loss: 0.104831
   val acc: 0.970900
Epoch 50 of 500 took 1.372524s
   val loss: 0.099427
   val acc: 0.971600
Epoch 55 of 500 took 1.463206s
   val loss: 0.094483
   val acc: 0.972800
Epoch 60 of 500 took 1.408511s
   val loss: 0.090017
   val acc: 0.973500
Epoch 65 

## Evaluation

We would like to use `tl.utils.test` to test a given non time-series network by the given test data and metric.

In [15]:
tl.utils.test(sess, network, acc, X_test, y_test,
              x, y_, batch_size=None, cost=cost)

Start testing the network ...
   test loss: 0.048938
   test acc: 0.987900


## Save the network to .npz file

In [16]:
tl.files.save_npz(network.all_params, name='model.npz')
sess.close()

[*] model.npz saved


# Credit

Credit goes to [TensorLayer Example](https://github.com/zsdonghao/tensorlayer/tree/master/example) for the majority of this code. I've merely create a jupyter notebook to make it more readable, and add some of my personal note.

Due to my personal limited ability, if you have got any mistake, pleae don't hesitate to email at me(chengzehua@outlook.com).

Thank you very much.