# Convolutional neural network

Convolutional neural network, also known as *convnets*, is a well-known
method in computer vision applications. This type of architecture is
dominant to recognize objects from a picture or video.

In this tutorial, you will learn how to construct a convnet and how to
use TensorFlow to solve the handwritten dataset.

## The architecture of a Convolutional Neural Network

Think about Facebook a few years ago, after you uploaded a picture to
your profile, you were asked to add a name to the face on the picture
manually. Nowadays, Facebook uses convnet to tag your friend in the
picture automatically.

A convolutional neural network is not very difficult to understand. An
input image is processed during the convolution phase and later
attributed a label.

A typical convnet architecture can be summarized in the picture below.
First of all, an image is pushed to the network; this is called the
input image. Then, the input image goes through an infinite number of
steps; this is the convolutional part of the network. Finally, the
neural network can predict the digit on the image.

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image001.png)

An image is composed of an array of pixels with height and width. A
grayscale image has only one channel while the color image has three
channels (each one for Red, Green, and Blue). A channel is stacked over
each other. In this tutorial, you will use a grayscale image with only
one channel. Each pixel has a value from 0 to 255 to reflect the
intensity of the color. For instance, a pixel equals to 0 will show a
white color while pixel with a value close to 255 will be darker.

Let’s have a look of an image stored in the [MNIST
dataset](http://yann.lecun.com/exdb/mnist/). The picture below shows how
to represent the picture of the left in a matrix format. Note that, the
original matrix has been standardized to be between 0 and 1. For darker
color, the value in the matrix is about 0.9 while white pixels have a
value of 0.

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image002.png)

## Convolutional operation

The most critical component in the model is the convolutional layer.
This part aims at reducing the size of the image for faster computations
of the weights and improve its generalization.

During the convolutional part, the network keeps the essential features
of the image and excludes irrelevant noise. For instance, the model is
learning how to recognize an elephant from a picture with a mountain in
the background. If you use a traditional neural network, the model will
assign a weight to all the pixels, including those from the mountain
which is not essential and can mislead the network.

Instead, a convolutional neural network will use a mathematical
technique to extract only the most relevant pixels. This mathematical
operation is called convolution. This technique allows the network to
learn increasingly complex features at each layer. The convolution
divides the matrix into small pieces to learn to most essential elements
within each piece.

In every convnets, there are four components:

1.  Convolution

2.  Non Linearity (ReLU)

3.  Pooling or Sub Sampling

4.  Classification (Fully Connected Layer)

-   Convolution

The purpose of the convolution is to extract the features of the object
on the image locally. It means the network will learn specific patterns
within the picture and will be able to recognize it everywhere in the
picture.

Convolution is an element-wise multiplication. The concept is easy to
understand. The computer will scan a part of the image, usually with a
dimension of 3x3 and multiplies it to a filter. The output of the
element-wise multiplication is called a feature map. This step is
repeated until all the image is scanned. Note that, after the
convolution, the size of the image is reduced.

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image003.png)

Below, there is a URL to see in action how convolution works.

![](https://media.giphy.com/media/fV8esV6419OdEltpbO/giphy.gif)

There are numerous channels available. Below, we listed some of the
channels. You can see that each filter has a specific purpose. Note, in
the picture below; the Kernel is a synonym of the filter.

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image005.png)
[Source](https://en.wikipedia.org/wiki/Kernel_(image_processing))

*Arithmetic behind the convolution*

The convolutional phase will apply the filter on a small array of pixels
within the picture. The filter will move along the input image with a
general shape of 3x3 or 5x5. It means the network will slide these
windows across all the input image and compute the convolution. The
image below shows how the convolution operates. The size of the patch is
3x3, and the output matrix is the result of the element-wise operation
between the image matrix and the filter.

![](http://machinelearninguru.com/_images/topics/computer_vision/basics/convolutional_layer_1/stride1.gif)



[Source](http://machinelearninguru.com/computer_vision/basics/convolution/convolution_layer.html)

You notice that the width and height of the output can be different from
the width and height of the input. It happens because of the border
effect.

**Border effect**

Image has a 5x5 features map and a 3x3 filter. There is only one window
in the center where the filter can screen an 3x3 grid. The output
feature map will shrink by two tiles alongside with a 3x3 dimension.

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image007.png)

To get the same output dimension as the input dimension, you need to add
padding. Padding consists of adding the right number of rows and columns
on each side of the matrix. It will allow the convolution to center fit
every input tile. In the image below, the input/output matrix have the
same dimension 5x5

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image008.png)

When you define the network, the convolved features are controlled by
three parameters:

1.  Depth: It defines the number of filters to apply during the
    convolution. In the previous example, you saw a depth of 1, meaning
    only one filter is used. In most of the case, there is more than one
    filter. The picture below shows the operations done in a situation
    with three filters

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image009.gif)

1.  Stride: It defines the number of "pixel's jump" between two slices.
    If the stride is equal to 1, the windows will move with a pixel's
    spread of one. If the stride is equal to two, the windows will jump
    by 2 pixels. If you increase the stride, you will have smaller
    feature maps.

Example stride 1

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image010.png)

Image stride 2

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image011.png)

1.  Zero-padding: A padding is an operation of adding a corresponding
    number of rows and column on each side of the input features maps.
    In this case, the output has the same dimension as the input.

-   Non Linearity (ReLU)

At the end of the convolution operation, the output is subject to an
activation function to allow non-linearity. The usual activation
function for convnet is the Relu. All the pixel with a negative value
will be replaced by zero.

-   Max-pooling operation

This step is easy to understand. The purpose of the pooling is to reduce
the dimensionality of the input image. The steps are done to reduce the
computational complexity of the operation. By diminishing the
dimensionality, the network has lower weights to compute, so it prevents
overfitting.

In this stage, you need to define the size and the stride. A standard
way to pool the input image is to use the maximum value of the feature
map. Look at the picture below. The "pooling" will screen a four
submatrix of the 4x4 feature map and return the maximum value. The
pooling takes the maximum value of a 2x2 array and then move this
windows by two pixels. For instance, the first sub-matrix is
\[3,1,3,2\], the pooling will return the maximum, which is 3.

![](https://github.com/thomaspernet/Tensorflow/blob/master/tensorflow/20_CNN_v7_files/image012.png)

There is another pooling operation such as the mean.

This operation aggressively reduces the size of the feature map

-   Fully connected layers

The last step consists of building a traditional artificial neural
network as you did in the previous tutorial. You connect all neurons
from the previous layer to the next layer. You use a softmax activation
function to classify the number on the input image.

**Recap:** 

Convolutional Neural network compiles different layers before making a
prediction. A neural network has:

-   A convolutional layer

-   Relu Activation function

-   Pooling layer

-   Densely connected layer

The convolutional layers apply different filters on a subregion of the
picture. The Relu activation function adds non-linearity, and the
pooling layers reduce the dimensionality of the features maps.

All these layers extract essential information from the images. At last,
the features map are feed to a primary fully connected layer with a
softmax function to make a prediction.

# Train CNN with TensorFlow

Now that you are familiar with the building block of a convnets, you are
ready to build one with TensorFlow. We will use the MNIST dataset.

The data preparation is the same as the previous tutorial. You can run
the codes and jump directly to the architecture of the CNN.

You will follow the steps below:

Step 1: Upload Dataset

Step 2: Input layer

Step 3: Convolutional layer

Step 4: Pooling layer

Step 5: Second Convolutional Layer and Pooling Layer

Step 6: Dense layer

Step 7: Logit Layer


**Step 1**: Upload Dataset

The MNIST dataset is available at this [URL](https://www.dropbox.com/sh/jm9jo0d58oggeb9/AAAZrRHvHFGYdCHssXpEH2o1a?dl=0).

Please download it and store it in Downloads. You can upload it with `fetch_mldata('MNIST original')`.

**Create a train/test set**

You need to split the dataset with `train_test_split

**Scale the features**

Finally, you can scale the feature with MinMaxScaler

In [1]:
import numpy as np
import tensorflow as tf

from sklearn.datasets import fetch_mldata

  return f(*args, **kwds)


If you are a Windows user, run the following lines

```
#Change USERNAME by the username of your machine
## Windows USER
mnist = fetch_mldata('C:\\Users\\USERNAME\\Downloads\\MNIST original')
```

Otherwise, you need to run this line
```
mnist = fetch_mldata('/Users/USERNAME/Downloads/MNIST original')
```

In [7]:
## Mac User

mnist = fetch_mldata('/Users/Thomas/Downloads/MNIST original')
print(mnist.data.shape)
print(mnist.target.shape)

(70000, 784)
(70000,)


You split the dataset with 80 percent training and 20 percent testing.

In [8]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(mnist.data,
                                                    mnist.target,
                                                    test_size=0.2,
                                                    random_state=42)
y_train  = y_train.astype(int)
y_test  = y_test.astype(int)
batch_size =len(X_train)

print(X_train.shape, y_train.shape,y_test.shape )

(56000, 784) (56000,) (14000,)


Finaly, you scale the data using the min/max scaler of scikit learn.

In [9]:
## resclae
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
# Train
X_train_scaled = scaler.fit_transform(X_train.astype(np.float64))
# test
X_test_scaled = scaler.fit_transform(X_test.astype(np.float64))

feature_columns = [
      tf.feature_column.numeric_column('x', shape=X_train_scaled.shape[1:])]

X_train_scaled.shape[1:]


(784,)

**Define the CNN**

A CNN uses filters on the raw pixel of an image to learn details pattern
compare to global pattern with a traditional neural net. To construct a
CNN, you need to define:

1. A convolutional layer: Apply n number of filters to the feature
   map. After the convolution, you need to use a Relu activation
   function to add non-linearity to the network.
2. Pooling layer: The next step after the convolution is to downsample
   the feature max. The purpose is to reduce the dimensionality of the
   feature map to prevent overfitting and improve the computation
   speed. Max pooling is the conventional technique, which divides the
   feature maps into subregions (usually with a 2x2 size) and keeps
   only the maximum values.
3. Fully connected layers: All neurons from the previous layers are
   connected to the next layers. The CNN will classify the label
   according to the features from the convolutional layers and reduced
   with the pooling layer.

CNN architecture

- Convolutional Layer: Applies 14 5x5 filters (extracting 5x5-pixel
  subregions), with ReLU activation function
- Pooling Layer: Performs max pooling with a 2x2 filter and stride of
  2 (which specifies that pooled regions do not overlap)
- Convolutional Layer: Applies 36 5x5 filters, with ReLU activation
  function
- Pooling Layer 2: Again, performs max pooling with a 2x2 filter and
  stride of 2
- 1,764 neurons, with dropout regularization rate of 0.4 (probability
  of 0.4 that any given element will be dropped during training)
- Dense Layer (Logits Layer): 10 neurons, one for each digit target
  class (0–9).

There are three important modules to use to create a CNN:

- `conv2d()`. Constructs a two-dimensional convolutional layer with the
  number of filters, filter kernel size, padding, and activation
  function as arguments.
- `max_pooling2d()`. Constructs a two-dimensional pooling layer using
  the max-pooling algorithm.
- `dense()`. Constructs a dense layer with the hidden layers and units

You will define a function to build the CNN. Let's see in detail how to
construct each building block before to wrap everything together in the
function.

**Step 2**: Input layer

```
def cnn_model_fn(features, labels, mode):
    input_layer = tf.reshape(tensor = features["x"],shape =[-1, 28, 28, 1])
```

You need to define a tensor with the shape of the data. For that, you can use the module tf.reshape. In this module, you need to declare the
tensor to reshape and the shape of the tensor. The first argument is the
features of the data, which is defined in the argument of the function.
A picture has a height, a width, and a channel. The MNIST dataset is a
monochronic picture with a 28x28 size. We set the batch size to -1 in
the shape argument so that it takes the shape of the `features["x"]`.
The advantage is to make the batch size hyperparameters to tune. If the
batch size is set to 7, then the tensor will feed 5,488 values (28*28*7).

**Step 3**: Convolutional layer

```
# first Convolutional Layer
conv1 = tf.layers.conv2d(
    inputs=input_layer,
    filters=14,
    kernel_size=[5, 5],
    padding="same",
    activation=tf.nn.relu)
```

The first convolutional layer has 14 filters with a kernel size of 5x5
with the same padding. The same padding means both the output tensor and
input tensor should have the same height and width. Tensorflow will add
zeros to the rows and columns to ensure the same size.
You use the Relu activation function. The output size will be
[28, 28, 14].

**Step 4**: Pooling layer
The next step after the convolution is the pooling computation. The
pooling computation will reduce the dimensionality of the data. You can
use the module max_pooling2d with a size of 2x2 and stride of 2. You
use the previous layer as input. The output size will be
[batch_size, 14, 14, 14]

```
# first Pooling Layer 
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
```

**Step 5**: Second Convolutional Layer and Pooling Layer

The second convolutional layer has 32 filters, with an output size of
`[batch_size, 14, 14, 32]`. The pooling layer has the same size as
before and the output shape is `[batch_size, 14, 14, 18]`.

```
conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=36,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
```

**Step 6**: Dense layer
Then, you need to define the fully-connected layer. The feature map has
to be flatten before to be connected with the dense layer. You can use
the module reshape with a size of 7*7*36.
The dense layer will connect 1764 neurons. You add a Relu activation
function. Besides, you add a dropout regularization term with a rate of
0.3, meaning 30 percents of the weights will be set to 0. Note that, the
dropout takes place only during the training phase. The function
cnn_model_fn has an argument mode to declare if the model needs to
be trained or to evaluate.

```
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 36])

dense = tf.layers.dense(inputs=pool2_flat, units=7 * 7 * 36, activation=tf.nn.relu)
dropout = tf.layers.dropout(
      inputs=dense, rate=0.3, training=mode == tf.estimator.ModeKeys.TRAIN)
```

**Step 7**: Logit Layer

Finally, you can define the last layer with the prediction of the model.
The output shape is equal to the batch size and 10, the total number of
images.

```
# Logits Layer
logits = tf.layers.dense(inputs=dropout, units=10)
```

You can create a dictionary containing the classes and the probability
of each class. The module `tf.argmax()` with returns the highest value
if the logit layers. The softmax function returns the probability of
each class.

```
predictions = {
      # Generate predictions
      "classes": tf.argmax(input=logits, axis=1),
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  }
```

You only want to return the dictionnary prediction when mode is set to
prediction. You add this codes to dispay the predictions

```
if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
```

The next step consists to compute the loss of the model. In the last
tutorial, you learnt that the loss function for a multiclass model is
cross entropy. The loss is easily computed with the following code:

```
# Calculate Loss (for both TRAIN and EVAL modes)
loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)
```

The final step is to optimize the model, that is to find the best values
of the weights. For that, you use a Gradient descent optimizer with a
learning rate of 0.001. The objective is to minimize the loss

```
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
```

You are done with the CNN. However, you want to display the performance
metrics during the evaluation mode. The performance metrics for a
multiclass model is the accuracy metrics. Tensorflow is equipped with
a module accuracy with two arguments, the labels, and the predicted
values.

```
eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)
```

That's it. You created your first CNN and you are ready to wrap
everything into a function in order to use it to train and evaluate the
model.

In [10]:
def cnn_model_fn(features, labels, mode):
  """Model function for CNN."""
  # Input Layer
  input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

  # Convolutional Layer
  conv1 = tf.layers.conv2d(
      inputs=input_layer,
      filters=32,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)

  # Pooling Layer
  pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

  # Convolutional Layer #2 and Pooling Layer
  conv2 = tf.layers.conv2d(
      inputs=pool1,
      filters=36,
      kernel_size=[5, 5],
      padding="same",
      activation=tf.nn.relu)
  pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

  # Dense Layer
  pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 36])
  dense = tf.layers.dense(inputs=pool2_flat, units=7 * 7 * 36, activation=tf.nn.relu)
  dropout = tf.layers.dropout(
      inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

  # Logits Layer
  logits = tf.layers.dense(inputs=dropout, units=10)

  predictions = {
      # Generate predictions (for PREDICT and EVAL mode)
      "classes": tf.argmax(input=logits, axis=1),
      "probabilities": tf.nn.softmax(logits, name="softmax_tensor")
  }

  if mode == tf.estimator.ModeKeys.PREDICT:
    return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

  # Calculate Loss
  loss = tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)

  # Configure the Training Op (for TRAIN mode)
  if mode == tf.estimator.ModeKeys.TRAIN:
    optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
    train_op = optimizer.minimize(
        loss=loss,
        global_step=tf.train.get_global_step())
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op)

  # Add evaluation metrics Evaluation mode
  eval_metric_ops = {
      "accuracy": tf.metrics.accuracy(
          labels=labels, predictions=predictions["classes"])}
  return tf.estimator.EstimatorSpec(
      mode=mode, loss=loss, eval_metric_ops=eval_metric_ops)


The steps below are the same as the previous tutorials.
First of all, you define an estimator with the CNN model.

In [12]:
# Create the Estimator
mnist_classifier = tf.estimator.Estimator(
    model_fn=cnn_model_fn, model_dir="train/mnist_convnet_model")

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': 'train/mnist_convnet_model', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x114288e48>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}


A CNN takes many times to train, therefore, you create a Logging hook to
store the values of the softmax layers every 50 iterations.

In [13]:
# Set up logging for predictions
tensors_to_log = {"probabilities": "softmax_tensor"}
logging_hook = tf.train.LoggingTensorHook(
      tensors=tensors_to_log, every_n_iter=50)

You are ready to estimate the model. You set a batch size of 100 and
shuffle the data. Note that we set training steps of 16.000, it can take
lots of time to train. Be patient.

In [14]:
# Train the model
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": X_train_scaled},
    y=y_train,
    batch_size=100,
    num_epochs=None,
    shuffle=True)
mnist_classifier.train(
    input_fn=train_input_fn,
    steps=500,
    hooks=[logging_hook])


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into train/mnist_convnet_model/model.ckpt.
INFO:tensorflow:probabilities = [[0.09477627 0.12005997 0.10586331 0.09869653 0.10132864 0.09234226
  0.10487132 0.08388694 0.09745449 0.10072027]
 [0.09137405 0.10250511 0.10209611 0.09091399 0.11156436 0.08881138
  0.10393392 0.09999351 0.1071749  0.10163269]
 [0.08873715 0.10550566 0.09641793 0.09746279 0.09736262 0.09782984
  0.10728729 0.09973746 0.10353723 0.10612202]
 [0.08601869 0.11003186 0.08991204 0.08752326 0.11834793 0.09352052
  0.10633367 0.10139539 0.10406768 0.10284896]
 [0.10315961 0.10586246 0.0936099  0.1030763  0.10913491 0.08971104
  0.11866575 0.09154476 0.09029319 0.09494208]
 [0.11261123 0.10636197 0.09411069 0.11400822 0.09893055 0.08329154
  0.

INFO:tensorflow:loss = 2.297851085662842, step = 1
INFO:tensorflow:probabilities = [[0.11054767 0.09275471 0.10013526 0.08835042 0.11395822 0.09137029
  0.11880095 0.09505082 0.08695626 0.10207541]
 [0.10244074 0.10639194 0.09917687 0.0892521  0.11454935 0.09581779
  0.1061843  0.07470831 0.11133587 0.10014272]
 [0.09699993 0.10831697 0.10654346 0.08630888 0.11071913 0.0836101
  0.11320966 0.08499833 0.10292014 0.10637341]
 [0.0987331  0.09658351 0.09226895 0.10431514 0.10138343 0.10351134
  0.10918573 0.10066592 0.08524862 0.10810425]
 [0.09117156 0.11203172 0.10274307 0.1035143  0.09821626 0.09624681
  0.10442217 0.08876414 0.09710149 0.10578848]
 [0.08540867 0.12719263 0.11550417 0.09658033 0.0928353  0.09237116
  0.1038759  0.0857867  0.09271943 0.10772572]
 [0.09545376 0.09604547 0.12127067 0.09799733 0.10155984 0.08310186
  0.09298788 0.09923943 0.09127276 0.121071  ]
 [0.10535233 0.09774638 0.09067006 0.07460938 0.09765898 0.09005611
  0.11580084 0.10369658 0.12084851 0.10356081

INFO:tensorflow:global_step/sec: 2.62982
INFO:tensorflow:probabilities = [[0.09779091 0.13565899 0.09770616 0.095411   0.10460225 0.10277529
  0.09612318 0.07814849 0.09130824 0.10047547]
 [0.08314637 0.11291003 0.10187962 0.09921908 0.09288833 0.11500898
  0.10429283 0.08099788 0.11138132 0.09827556]
 [0.10529038 0.1038397  0.10376705 0.10807484 0.10490608 0.08624273
  0.09995768 0.09195606 0.08572645 0.11023902]
 [0.10792637 0.09719112 0.1017011  0.09304814 0.10655243 0.10512428
  0.10660265 0.08814942 0.09070132 0.10300318]
 [0.1044911  0.10117117 0.09376086 0.106519   0.10203609 0.09516923
  0.09423762 0.09972065 0.09807788 0.10481639]
 [0.09175901 0.1236866  0.10044358 0.08819327 0.097542   0.07747333
  0.10762874 0.09329776 0.12006269 0.09991302]
 [0.0947483  0.10559257 0.08986076 0.10612343 0.09464517 0.0963543
  0.11880008 0.08538435 0.10853712 0.09995391]
 [0.08661206 0.09738052 0.09092333 0.0966622  0.11212096 0.09477623
  0.11236215 0.08964704 0.0987553  0.12076022]
 [0.0953

INFO:tensorflow:loss = 2.305206298828125, step = 101 (38.026 sec)
INFO:tensorflow:probabilities = [[0.11161921 0.08057046 0.0994389  0.09346912 0.11075594 0.09434979
  0.1120339  0.08882454 0.09737912 0.11155902]
 [0.09596016 0.10926549 0.09903363 0.10823471 0.10993965 0.09343465
  0.09586058 0.09160284 0.09095405 0.10571425]
 [0.10227139 0.11006905 0.09801046 0.09957789 0.10438735 0.09181699
  0.10525362 0.08816053 0.09450363 0.1059491 ]
 [0.09975787 0.11247331 0.09039749 0.0991134  0.10160263 0.09843848
  0.1017482  0.09497388 0.09792441 0.10357033]
 [0.10305566 0.10108461 0.10605245 0.09733682 0.09768492 0.09180533
  0.10126729 0.08430429 0.1079302  0.10947844]
 [0.11544857 0.09036746 0.10454513 0.08488042 0.099274   0.0949177
  0.11203393 0.09910099 0.09572781 0.10370398]
 [0.10277044 0.11231499 0.09754805 0.10642093 0.10259929 0.09633377
  0.0993438  0.08479529 0.09933239 0.09854105]
 [0.09547645 0.10823457 0.10347744 0.09912415 0.09726053 0.09262797
  0.10986357 0.07888519 0.1053

INFO:tensorflow:global_step/sec: 2.49772
INFO:tensorflow:probabilities = [[0.09618051 0.11333042 0.09681423 0.10213157 0.09754125 0.07517962
  0.10612432 0.10738283 0.09960186 0.10571339]
 [0.11885299 0.09311653 0.11239283 0.09589802 0.10864241 0.09894172
  0.09498059 0.09081223 0.08867836 0.09768431]
 [0.0851925  0.10295272 0.10020238 0.09144596 0.12191371 0.10631884
  0.09848759 0.09264038 0.1030152  0.09783073]
 [0.10190065 0.09106947 0.11358515 0.10386782 0.09877861 0.07968473
  0.10421464 0.09601401 0.10182035 0.10906457]
 [0.09843533 0.10299904 0.10726394 0.10401963 0.09897475 0.08264756
  0.10430107 0.08448147 0.10445194 0.11242527]
 [0.09898605 0.10793448 0.10301802 0.09581081 0.08263798 0.09586277
  0.10646926 0.09277058 0.11348585 0.10302419]
 [0.09874667 0.10791901 0.10422821 0.09205585 0.09750872 0.08964512
  0.10634513 0.10639266 0.09601718 0.10114143]
 [0.10187298 0.10537817 0.10022145 0.08996586 0.09753465 0.09345467
  0.10661632 0.10562289 0.10325421 0.0960788 ]
 [0.101

INFO:tensorflow:loss = 2.266510486602783, step = 201 (40.037 sec)
INFO:tensorflow:probabilities = [[0.08740483 0.10220509 0.10154277 0.11332553 0.10139556 0.08944373
  0.11141215 0.08346321 0.11501905 0.09478809]
 [0.10783568 0.09445036 0.10144717 0.0883547  0.09767817 0.09355765
  0.10663487 0.1080292  0.09864307 0.10336912]
 [0.10261589 0.10171092 0.09599072 0.09412995 0.09108029 0.09985734
  0.09911387 0.10225619 0.10101604 0.1122288 ]
 [0.09650049 0.09869481 0.11019421 0.10297574 0.10289639 0.10304986
  0.09267563 0.09224545 0.10254738 0.09822004]
 [0.09640824 0.10602162 0.09291263 0.10333614 0.11200641 0.08912803
  0.11117571 0.09001429 0.09357479 0.10542213]
 [0.0916957  0.09918697 0.09112835 0.10981116 0.10186257 0.11553579
  0.10078084 0.08939906 0.09337671 0.10722286]
 [0.10108586 0.09176883 0.11027321 0.09762778 0.11309464 0.08905613
  0.11250393 0.08939897 0.10287149 0.09231916]
 [0.0932304  0.09852946 0.11353489 0.09568931 0.10039062 0.09131
  0.11460281 0.08696023 0.094399

INFO:tensorflow:global_step/sec: 2.74781
INFO:tensorflow:probabilities = [[0.09275079 0.09781953 0.10201835 0.10008664 0.11003606 0.08758336
  0.09226309 0.09568198 0.11233258 0.10942761]
 [0.11545296 0.10298779 0.0998541  0.10120464 0.08954312 0.09110143
  0.10608867 0.09680603 0.09244219 0.10451906]
 [0.10014649 0.10081567 0.11214886 0.107541   0.08731592 0.09723773
  0.10361038 0.08312414 0.10188491 0.10617489]
 [0.11372872 0.10509682 0.10414652 0.09443506 0.11058112 0.0834747
  0.10228564 0.10462599 0.08048178 0.10114367]
 [0.10499089 0.08267195 0.09753852 0.08725578 0.09892066 0.10101111
  0.13854432 0.07955919 0.11225216 0.09725543]
 [0.07985565 0.10600006 0.11536836 0.10121217 0.11438446 0.1010733
  0.11355437 0.08443012 0.08762246 0.09649905]
 [0.09408254 0.10071983 0.10091003 0.09706804 0.10504525 0.08435585
  0.10218031 0.11100124 0.08832532 0.11631159]
 [0.10205579 0.10568917 0.09500247 0.0940452  0.09578024 0.08574331
  0.11521807 0.09678406 0.10737587 0.10230583]
 [0.10611

INFO:tensorflow:loss = 2.254169464111328, step = 301 (36.393 sec)
INFO:tensorflow:probabilities = [[0.1050658  0.09359336 0.11288996 0.09263927 0.11123501 0.08779526
  0.09171194 0.10530055 0.09520767 0.10456118]
 [0.1152992  0.08803308 0.11602868 0.09839093 0.0959201  0.08958415
  0.11210812 0.08531573 0.10615357 0.09316644]
 [0.10433627 0.08969733 0.10240831 0.09740598 0.1062149  0.09310843
  0.10328498 0.1045105  0.09704345 0.10198985]
 [0.09251677 0.085149   0.09478267 0.09379303 0.11946546 0.10614051
  0.109974   0.09948232 0.09095115 0.10774509]
 [0.11366128 0.08452618 0.09686292 0.09923135 0.11486704 0.08678809
  0.11077315 0.09991192 0.09400568 0.0993724 ]
 [0.13784295 0.09533166 0.08934016 0.08385327 0.11232967 0.09297074
  0.11276561 0.08217103 0.1043056  0.08908931]
 [0.09953302 0.10522317 0.10318287 0.10449705 0.09902985 0.09427051
  0.10843249 0.09477569 0.09911171 0.09194364]
 [0.08991633 0.10083876 0.09146703 0.10004696 0.11171964 0.09229909
  0.10464963 0.10656835 0.098

INFO:tensorflow:global_step/sec: 3.05531
INFO:tensorflow:probabilities = [[0.09597762 0.0909486  0.12058421 0.10410072 0.099659   0.07758354
  0.10627802 0.11648254 0.0769326  0.11145315]
 [0.10963587 0.09053114 0.13479191 0.09767633 0.08651705 0.08932541
  0.10331992 0.08237692 0.10635014 0.09947531]
 [0.10397911 0.08333036 0.10077023 0.09792857 0.10290619 0.07712854
  0.13626958 0.09095258 0.09763974 0.10909508]
 [0.10388879 0.08918134 0.10400142 0.09318527 0.10414206 0.0957871
  0.12201979 0.09041223 0.09773254 0.09964945]
 [0.11870913 0.10231822 0.08833914 0.09680783 0.10971388 0.09126806
  0.11002335 0.07492631 0.11399084 0.09390323]
 [0.08565493 0.10227403 0.09968803 0.09860474 0.11018276 0.11449807
  0.09156754 0.09174301 0.08989833 0.11588855]
 [0.10981075 0.1183686  0.1132189  0.09458769 0.08500653 0.08478
  0.10224897 0.09152189 0.10658724 0.09386945]
 [0.10203997 0.08620733 0.08903091 0.08759958 0.10962791 0.09202795
  0.09395495 0.09671206 0.10027396 0.14252538]
 [0.0980225

INFO:tensorflow:loss = 2.234375, step = 401 (32.738 sec)
INFO:tensorflow:probabilities = [[0.10925104 0.09653889 0.10431825 0.08467032 0.10086386 0.08627147
  0.11630298 0.11062485 0.08954831 0.10161002]
 [0.09938521 0.08399853 0.12136011 0.10847069 0.10966809 0.08305985
  0.10988242 0.08633995 0.09059924 0.1072359 ]
 [0.10220732 0.10345756 0.09625048 0.08527816 0.1015621  0.09477276
  0.11312039 0.10197088 0.09421402 0.10716634]
 [0.09802776 0.1089209  0.1051896  0.0832979  0.10225636 0.0926843
  0.0907352  0.10142474 0.10935258 0.10811067]
 [0.09239856 0.09341619 0.09808144 0.08604263 0.11372883 0.09269515
  0.12414781 0.09271637 0.09827336 0.10849967]
 [0.11332475 0.12115515 0.11525958 0.09966035 0.09610272 0.07342646
  0.09392155 0.07893969 0.10141926 0.10679046]
 [0.09248126 0.08352564 0.09328173 0.09296233 0.10924984 0.0779205
  0.12711865 0.10182997 0.11599064 0.10563943]
 [0.09673758 0.08308309 0.10099019 0.12364542 0.09017135 0.08803624
  0.10571979 0.09707048 0.11207459 0.102

INFO:tensorflow:Saving checkpoints for 500 into train/mnist_convnet_model/model.ckpt.
INFO:tensorflow:Loss for final step: 2.2088098526000977.


<tensorflow.python.estimator.estimator.Estimator at 0x11427ccc0>

Now that the model is train, you can evaluate it and print the results

In [15]:
# Evaluate the model and print results
eval_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": X_test_scaled},
    y=y_test,
    num_epochs=1,
    shuffle=False)
eval_results = mnist_classifier.evaluate(input_fn=eval_input_fn)
print(eval_results)


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-09-04-12:46:51
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from train/mnist_convnet_model/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-09-04-12:47:05
INFO:tensorflow:Saving dict for global step 500: accuracy = 0.594, global_step = 500, loss = 2.1915197
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 500: train/mnist_convnet_model/model.ckpt-500
{'accuracy': 0.594, 'loss': 2.1915197, 'global_step': 500}


With the current architecture, you get an accuracy of 97% (after 16.00 steps training). You can change the architecture, the batch size and the number of iteration to improve the accuracy. The CNN neural network has performed far better than ANN or logistic regression. In the tutorial on an artificial neural network, you had an accuracy of 96%, which is lower the CNN. The performances of the CNN are impressive with a larger image, both in term of speed computation and accuracy.

## Summary

A convolutional neural network works very well to evaluate picture. This
type of architecture is dominant to recognize objects from a picture or
video.
To build a CNN, you need to follow six steps:

**Step 1**: Input layer:
This step reshapes the data. The shape is equal to the square root of
the number of pixels. For instance, if a picture has 156 pixels, then
the shape is 26x26. You need to specify if the picture has colour or
not. If yes, then you had 3 to the shape- 3 for RGB-, otherwise 1.

```
input_layer = tf.reshape(tensor = features["x"],shape =[-1, 28, 28,

1])
```

**Step 2**: Convolutional layer

Next, you need to create the convolutional layers. You apply different
filters to allow the network to learn important feature. You specify the
size of the kernel and the amount of filters.

```
conv1 = tf.layers.conv2d(
inputs=input_layer,
filters=14,
kernel_size=[5, 5],
padding="same",
activation=tf.nn.relu)
```

**Step 3**: Pooling layer

In the third step, you add a pooling layer. This layer decreases the
size of the input. It does so by taking the maximum value of the a
sub-matrix. For instance, if the sub-matrix is [3,1,3,2], the pooling
will return the maximum, which is 3.

```
pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2],

strides=2)
```

**Step 4**: Add Convolutional Layer and Pooling Layer

In this step, you can add as much as you want conv layers and pooling
layers. Google uses architecture with more than 20 conv layers.

**Step 5**: Dense layer

The step 5 flatten the previous to create a fully connected layers. In
this step, you can use different activation function and add a dropout
effect.

```
pool2_flat = tf.reshape(pool2, [-1, 7 * 7 * 36])

dense = tf.layers.dense(inputs=pool2_flat, units=7 * 7 * 36, activation=tf.nn.relu)

dropout = tf.layers.dropout(

      inputs=dense, rate=0.3, training=mode == tf.estimator.ModeKeys.TRAIN)
```

# **Step 6**: Logit Layer

The final step is the prediction.

`logits = tf.layers.dense(inputs=dropout, units=10)`