# Train your first neural network: Image classification for retail products

You will implement a model to classify the images of retail products (predict and label the images in 10 classes such as dress, coat, or shirt).

Import the necessary libraries and print the version of TensorFlow API:

In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras

# Helper libraries
import numpy as np
import matplotlib.pyplot as plt

print(tf.__version__)

## Import the Fashion MNIST dataset

We will use the [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist) dataset which contains 70,000 grayscale images in 10 categories. We will use 60,000 images to train the network and 10,000 images to evaluate how accurately the network learned to classify images. The following are some examples of the images (each image is 28 x 28 pixels):

<table>
  <tr><td>
    <img src="https://tensorflow.org/images/fashion-mnist-sprite.png"
         alt="Fashion MNIST sprite"  width="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> <a href="https://github.com/zalandoresearch/fashion-mnist">Fashion-MNIST samples</a> (by Zalando, MIT License).<br/>&nbsp;
  </td></tr>
</table>

Import and load the data from TensorFow (the data is already split into test and training sets):

In [None]:
fashion_mnist = keras.datasets.fashion_mnist

(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()

The *images* are defined as pixel values ranging between 0 and 255 (each image with 28x28=784 values). These values correspond to a shade of gray:

<table>
  <tr><td>
    <img src="http://what-when-how.com/wp-content/uploads/2012/07/tmp26dc25_thumb2.png"
         alt="Shades of grey" width="400">
  </td></tr>
  <tr><td align="center">
    <b>Figure 2.</b> 256 (not 50) shades of grey.<br/>&nbsp;
  </td></tr>
</table>

The *labels* are integers, ranging from 0 to 9. These correspond to the *class* of clothing the image represents:

<table>
  <tr>
    <th>Label</th>
    <th>Class</th> 
  </tr>
  <tr>
    <td>0</td>
    <td>T-shirt/top</td> 
  </tr>
  <tr>
    <td>1</td>
    <td>Trouser</td> 
  </tr>
    <tr>
    <td>2</td>
    <td>Pullover</td> 
  </tr>
    <tr>
    <td>3</td>
    <td>Dress</td> 
  </tr>
    <tr>
    <td>4</td>
    <td>Coat</td> 
  </tr>
    <tr>
    <td>5</td>
    <td>Sandal</td> 
  </tr>
    <tr>
    <td>6</td>
    <td>Shirt</td> 
  </tr>
    <tr>
    <td>7</td>
    <td>Sneaker</td> 
  </tr>
    <tr>
    <td>8</td>
    <td>Bag</td> 
  </tr>
    <tr>
    <td>9</td>
    <td>Ankle boot</td> 
  </tr>
</table>

Assign *class names* to the labels (we will use the class names later when plotting the images):

In [None]:
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

## Explore the data

Explore the format of the **training** dataset (60,000 images with each image is represented as 28 x 28 pixels):

In [None]:
train_images.shape

Explore the labels of the **training** dataset (each label is an integer between 0 and 9):

In [None]:
train_labels

Explore the format of the **test** dataset (10,000 images with each image is represented as 28 x 28 pixels):

In [None]:
test_images.shape

Explore the labels of the **test** dataset (each label is an integer between 0 and 9):

In [None]:
test_labels

## Preprocess the data

See how an image looks like (28 by 28 pixels, each pixel having a value between 0 and 255):

In [None]:
plt.figure()
plt.imshow(train_images[0],cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)

Scale the pixel values to a range of 0 to 1 for both the **training** and **test** datasets (this is how the neural network model expects the input):

In [None]:
train_images = train_images / 255.0

test_images = test_images / 255.0

Confirm that the pixel values are now between 0 and 1:

In [None]:
plt.figure()
plt.imshow(train_images[0],cmap=plt.cm.binary)
plt.colorbar()
plt.grid(False)

Display the first 25 images from the *training set* with the class name below each image and verify that the data looks good:

In [None]:
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[train_labels[i]])

## Build the model

Building the neural network requires (1) setting up the layers of the model, then (2) compiling the model.

### Setup the layers

The basic building block of a neural network is the *layer*. Layers extract representations from the data. Create three layers:

In [None]:
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)), # DO NOT CHANGE ANYTHING IN THIS LAYER
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax) # DO NOT CHANGE ANYTHING IN THIS LAYER
])

model.summary()


### Compile the model

Define the loss function, optimizer, and performance metric using the model's *compile* step:

In [None]:
model.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.train.AdamOptimizer(),
              metrics=['accuracy']) #The fraction of the images that are correctly classified.

## Train the model

Train the model for 5 epochs using the images and their labels in the training set:

In [None]:
model.fit(train_images, train_labels, batch_size =100, epochs=5)

In [None]:
model.get_config()

As the model trains, the loss and accuracy metrics are displayed. This model reaches an accuracy of about 0.88 (or 88%) on the training data.

## Evaluate the model

Evaluate the accuracy of the model by making predictions using the model in the test dataset:

In [None]:
test_loss, test_acc = model.evaluate(test_images, test_labels)

print('Test accuracy:', test_acc)

## Make predictions

Store the predictions for all images in the test dataset:

In [None]:
predictions = model.predict(test_images)

Display the probabilities in the prediction of the first image:

In [None]:
predictions[0] #222 for a false prediction

Display the label that is prediced with the highest probability:

In [None]:
np.argmax(predictions[0])

Confirm whether the prediction is correct by checking the actual label of the first image:

In [None]:
test_labels[0]

Define two helper functions (we will use the functions later when plotting the results):

In [None]:
def plot_image(i, predictions_array, true_label, img):
  predictions_array, true_label, img = predictions_array[i], true_label[i], img[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])
  
  plt.imshow(img, cmap=plt.cm.binary)

  predicted_label = np.argmax(predictions_array)
  if predicted_label == true_label:
    color = 'blue'
  else:
    color = 'red'
  
  plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
                                100*np.max(predictions_array),
                                class_names[true_label]),
                                color=color)

def plot_value_array(i, predictions_array, true_label):
  predictions_array, true_label = predictions_array[i], true_label[i]
  plt.grid(False)
  plt.xticks([])
  plt.yticks([])
  thisplot = plt.bar(range(10), predictions_array, color="#777777")
  plt.ylim([0, 1]) 
  predicted_label = np.argmax(predictions_array)
 
  thisplot[predicted_label].set_color('red')
  thisplot[true_label].set_color('blue')

Display the 1st image, its predicted label with the probability and its true label:

In [None]:
i = 0
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
#plt.subplot(1,2,2)
#plot_value_array(i, predictions,  test_labels)

Display the plot of all predicted probabilities for the 1st image:

In [None]:
plot_value_array(0, model.predict(np.expand_dims(test_images[0],0)), test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)

Display the 223rd image, its predicted label with the probability and its true label:

In [None]:
i = 222
plt.figure(figsize=(6,3))
plt.subplot(1,2,1)
plot_image(i, predictions, test_labels, test_images)
#plt.subplot(1,2,2)
#plot_value_array(i, predictions,  test_labels)

Display the plot of all predicted probabilities for the 223rd image:

In [None]:
plot_value_array(0, model.predict(np.expand_dims(test_images[222],0)), test_labels)
_ = plt.xticks(range(10), class_names, rotation=45)

Plot several images with their predictions (correct prediction labels are blue and incorrect ones are red):

In [None]:
# Plot the first X test images, their predicted label, and the true label
# Color correct predictions in blue, incorrect predictions in red
num_rows = 6
num_cols = 4
num_images = num_rows*num_cols
plt.figure(figsize=(2*2*num_cols, 2*num_rows))
for i in range(num_images):
  plt.subplot(num_rows, 2*num_cols, 2*i+1)
  plot_image(i, predictions, test_labels, test_images)
  plt.subplot(num_rows, 2*num_cols, 2*i+2)
  plot_value_array(i, predictions, test_labels)


#What to do now?

You will test different neural network structures, train and evaluate the model, and report the **Test accuracy** each time. Test accuracy is reported in the **"Evaluate the model"** step. Play with (1) the number and type of network layers and (2) the number of neurons in each layer. Create and test at least 4 combinations, and report your results in a table.

For example, if you make the following change in **"Setup the layers"** step:

From...
```
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])
```

To...
```
model = keras.Sequential([
    keras.layers.Flatten(input_shape=(28, 28)),
    keras.layers.Dense(128, activation=tf.nn.relu),
    keras.layers.Dense(64, activation=tf.nn.sigmoid),
    keras.layers.Dense(10, activation=tf.nn.softmax)
])
```

You would have added one more Sigmoid layer to the model with 64 neurons.


---

# Copyright 2018 The TensorFlow Authors.

In [None]:
#@title Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

In [None]:
#@title MIT License
#
# Copyright (c) 2017 François Chollet
#
# Permission is hereby granted, free of charge, to any person obtaining a
# copy of this software and associated documentation files (the "Software"),
# to deal in the Software without restriction, including without limitation
# the rights to use, copy, modify, merge, publish, distribute, sublicense,
# and/or sell copies of the Software, and to permit persons to whom the
# Software is furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
# DEALINGS IN THE SOFTWARE.