## Data
In this notebook, we will go through several methods to load image data in Tensorflow for training, evaluation and prediction.
The CIFAR-10 dataset (Canadian Institute For Advanced Research), one of the most widely used datasets for machine learning research is used, which is a collection of images that are commonly used to train machine learning and computer vision algorithms. The CIFAR-10 dataset contains 60,000 32x32 color images in 10 different classes. The 10 different classes represent airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships, and trucks. There are 6,000 images of each class.
![resources/cifar.PNG](resources/cifar.PNG)
<sub>Source: https://www.cs.toronto.edu/~kriz/cifar.html</sub>

tf.keras has provided built-in methods to downlaod and load several famous datasets, including CIFAR-10. We'll also try to use tf.data.Dataset for reading some raw images sampled from CIFAR-10. tf.data.Dataset is the best way to stream training data from disk. Datasets are iterables (not iterators), and work just like other Python iterables in Eager mode. 

In [None]:
import tensorflow as tf
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
print(tf.__version__)
%matplotlib inline

Get the class names of all classes in CIFAR-10.

In [None]:
class_names_cifar10 = ['Airplane','Automobile','Bird','Cat','Deer',
                       'Dog','Frog','Horse','Ship','Truck']

<font size="5">The first method </font>: Use tensorflow.keras.datasets to load data with numpy format, which can be used directly in training/evaluation/prediction.

In [None]:
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
print(x_train.shape, ' ', y_train.shape)
print(x_test.shape, ' ', y_test.shape)

In [None]:
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255.0
x_test /= 255.0
trainDataset = tf.data.Dataset.from_tensor_slices(x_train)

In [None]:
fig, axes = plt.subplots(2, 3)
for i,image in enumerate(trainDataset):
    if i>=6:
        break
    fig.subplots_adjust(hspace=0.6, wspace=0.4)
    axes.flat[i].imshow(image)

<font size="5">The second method </font>: When you want to use some your own pictures to perform machine learning tasks.

In [None]:
import os

filenames = []
for filename in os.listdir("./data"):
    if filename.endswith("jpg") or filename.endswith("png"):
        filenames.append(os.path.join("./data",filename))
print(filenames)

In [None]:
path_dataset = tf.data.Dataset.from_tensor_slices(filenames)

In [None]:
for path in path_dataset:
    print(path)

In [None]:
def load_and_preprocess_image(raw_image):
    image_tensor = tf.image.decode_jpeg(raw_image)
    image_tensor = tf.image.resize(image_tensor, [32, 32])
    image_tensor /= 255.0
    return image_tensor

In [None]:
raw_dataset = path_dataset.map(tf.io.read_file)
image_dataset = raw_dataset.map(load_and_preprocess_image)

In [None]:
fig, axes = plt.subplots(2, 3)
for i,image in enumerate(image_dataset):
    fig.subplots_adjust(hspace=0.6, wspace=0.4)
    axes.flat[i].imshow(image)

<font size="5">The third method </font>: We can also transform our image files into TFRecord files firstly, which are a simple format used for Tensorflow to pack multiple examples/images into the same file, TensorFlow is able to read multiple examples at once, which is especially important for performance when using a remote storage service such as GCS.

In [None]:
tfrec = tf.data.experimental.TFRecordWriter('images.tfrec')
tfrec.write(raw_dataset)

In [None]:
image_dataset = tf.data.TFRecordDataset('images.tfrec').map(load_and_preprocess_image)

In [None]:
fig, axes = plt.subplots(2, 3)
for i,image in enumerate(image_dataset):
    fig.subplots_adjust(hspace=0.6, wspace=0.4)
    axes.flat[i].imshow(image)