## Data
In this notebook, we will go through several methods to load image data in Tensorflow for training, evaluation and prediction.
The MNIST database of handwritten digits, one of the most widely used datasets for machine learning research is used. It is a collection of images that are commonly used to train machine learning and computer vision algorithms. The MNIST dataset contains 70,000 28*28 images for  handwritten digits from 0 to 9. 60000 for training and 10000 for testing.
![resources/MnistExamples.png](resources/MnistExamples.png)
<sub>Source: https://en.wikipedia.org/wiki/MNIST_database</sub>

tf.keras has provided built-in methods to download and load several famous datasets, including MNIST. Then we will learn how to use tf.data.Dataset to packing images and labels together. tf.data.Dataset is the best way to stream training data from disk. Datasets are iterables (not iterators), and work just like other Python iterables in Eager mode. 

In [None]:
import tensorflow as tf
import numpy as np
import utils
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
print(tf.__version__)
%matplotlib inline

In [None]:
class_names = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

<font size="5">The first method </font>: Use tensorflow.keras.datasets to load data, which can be used directly in training/evaluation/prediction.

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
print(x_train.shape, ' ', y_train.shape)
print(x_test.shape, ' ', y_test.shape)

In [None]:
x_train = x_train.astype('float32').reshape((-1,28,28,1))
x_test = x_test.astype('float32').reshape((-1,28,28,1))
x_train /= 255.0
x_test /= 255.0
trainDataset = tf.data.Dataset.from_tensor_slices((x_train, y_train))

In [None]:
utils.show_images(trainDataset, class_names)

<font size="5">The second method </font>: When you want to use some your own pictures to perform machine learning tasks.

In [None]:
import os

filenames = []
labels = []
class_names = ['bird','cat','dog']

for filename in os.listdir("./data"):
    filenames.append(os.path.join("./data",filename))
    if filename.startswith("bird"):
        labels.append(0)
    elif filename.startswith("cat"):
        labels.append(1)
    else:
        labels.append(2)
print(filenames)
print(labels)

In [None]:
path_dataset = tf.data.Dataset.from_tensor_slices((filenames, labels))

In [None]:
for path in path_dataset:
    print(path)

In [None]:
def load_and_preprocess_image(filename, label):
    raw_image = tf.io.read_file(filename)
    image_tensor = tf.image.decode_jpeg(raw_image)
    image_tensor = tf.image.resize(image_tensor, [224, 224])
    image_tensor /= 255.0
    return image_tensor,label

In [None]:
trainDataset = path_dataset.map(load_and_preprocess_image)

In [None]:
utils.show_images(trainDataset, class_names)