## Importing necessary libraries

In [1]:
import os
import requests
from tqdm import tqdm
data_dir = 'data'

if not os.path.exists(data_dir):
    os.makedirs(data_dir)


## Using the `tf.data` API to retrieve data

Here we will be using the `tf.data` API to feed a dataset containing images of flowers. The dataset has a folder containing the images and a CSV file listing filenames and their corresponding label as an integer. We will write a TensorFlow data pipeline that does the following.

* Extract filenames and classes from the CSV
* Read in the images from the extracted filenames and resize them to 64x64
* Convert the class labels to one-hot encoded vectors
* Combine the processed images and one-hot encoded vectors to a single dataset
* Finally, shuffle the data and output as batches

### Downloading the data
The dataset is available at https://www.kaggle.com/olgabelitskaya/flower-color-images/data . 

You need to download the zip file available in this URL and place it in the `data` folder in the `Ch02` folder.

In [10]:
from zipfile import ZipFile

if os.path.exists('data/flower-color-images.zip'):
    zfile = ZipFile('data/flower-color-images.zip')
    zfile.extractall('data')
else:
    print("Did you download the dataset as a zip file and place it in the Ch02/data folder?")

## Creating a tf.data.Dataset 

Here we are creating the `tf.data` pipeline that executes the above steps.

In [48]:
import tensorflow as tf
import os
print(tf.__version__)
# Getting filenames of all PNG files

data_dir = os.path.join('data','flower_images', 'flower_images') + os.path.sep
assert os.path.exists(data_dir)
csv_ds = tf.data.experimental.CsvDataset(
    os.path.join(data_dir,'flower_labels.csv') , ("",-1), header=True
)
fname_ds = csv_ds.map(lambda a,b: a)
label_ds = csv_ds.map(lambda a,b: b)

def get_image(file_path):
    
    img = tf.io.read_file(data_dir + file_path)
    # convert the compressed string to a 3D uint8 tensor
    img = tf.image.decode_png(img, channels=3)
    # Use `convert_image_dtype` to convert to floats in the [0,1] range.
    img = tf.image.convert_image_dtype(img, tf.float32)
    # resize the image to the desired size.
    return tf.image.resize(img, [64, 64])


image_ds = fname_ds.map(get_image)
print("The image dataset contains: {}".format(image_ds))
label_ds = label_ds.map(lambda x: tf.one_hot(x, depth=10))
data_ds = tf.data.Dataset.zip((image_ds, label_ds))

data_ds = data_ds.shuffle(buffer_size= 20)
data_ds = data_ds.batch(5)
for item in data_ds:
    print(item)
    break


2.1.0
<MapDataset shapes: (64, 64, 3), types: tf.float32>


### Defining and training a model

Here we are defining a simple Convolution Neural Network (CNN) model to train it on the image data we just retrieved. You don't have to worry about the technical details of CNNs right now. We will discuss them in detail in the next chapter.

In [1]:
from tensorflow.keras.layers import Dense, Conv2D, Flatten
from tensorflow.keras.models import Sequential

model = Sequential([
    Conv2D(64,(5,5), activation='relu', input_shape=(64,64,3)),
    Flatten(),
    Dense(10, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

model.fit(data_ds, epochs=100)

## Using Keras data generators to retrieve data

Instead of `tf.data` API let us use the Keras `ImageDataGenerator` to retrieve the data. As you can see, the `ImageDataGenerator` involves much less code than the using the `tf.data` API. 

In [9]:
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import os
import pandas as pd

data_dir = os.path.join('data','flower_images')
src_dir = os.path.join(data_dir, 'flower_images')
img_gen = ImageDataGenerator(
    samplewise_center=True, rotation_range=30, 
    brightness_range=(-0.2,0.2))

print(os.path.join(src_dir, 'flower_labels.csv'))
labels_df = pd.read_csv(os.path.join(src_dir, 'flower_labels.csv'), header=0)

gen_iter = img_gen.flow_from_dataframe(
    dataframe=labels_df, directory=src_dir, x_col='file', y_col='label', class_mode='raw', batch_size=2, target_size=(64,64))

for item in gen_iter:
    print(item)
    break

data\flower_images\flower_images\flower_labels.csv
Found 210 validated image filenames.
(array([[[[-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -7.546631, -8.546631],
         ...,
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631]],

        [[-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         ...,
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631]],

        [[-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         ...,
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631],
         [-8.546631, -8.546631, -8.546631]],

        ...,

        [[-8.546631, -8.546631, -8.546631],
         [-8.

## Using the `tensorflow-datasets` library

Here we will use the `tensorflow-datasets` package. It is a curated list of popular datasets available for machine learning projects. With this package you can download a dataset in a single line. This means you don't have to worry about downloading/extracting/formatting data manually. All of that will be already done when you import data using the `tensorflow-datasets` library.

### Lists the available datasets

In [10]:
import tensorflow_datasets as tfds
import tensorflow as tf
# See all registered datasets
tfds.list_builders()

['abstract_reasoning',
 'aeslc',
 'aflw2k3d',
 'amazon_us_reviews',
 'arc',
 'bair_robot_pushing_small',
 'beans',
 'big_patent',
 'bigearthnet',
 'billsum',
 'binarized_mnist',
 'binary_alpha_digits',
 'c4',
 'caltech101',
 'caltech_birds2010',
 'caltech_birds2011',
 'cars196',
 'cassava',
 'cats_vs_dogs',
 'celeb_a',
 'celeb_a_hq',
 'cfq',
 'chexpert',
 'cifar10',
 'cifar100',
 'cifar10_1',
 'cifar10_corrupted',
 'citrus_leaves',
 'cityscapes',
 'civil_comments',
 'clevr',
 'cmaterdb',
 'cnn_dailymail',
 'coco',
 'coil100',
 'colorectal_histology',
 'colorectal_histology_large',
 'cos_e',
 'curated_breast_imaging_ddsm',
 'cycle_gan',
 'deep_weeds',
 'definite_pronoun_resolution',
 'diabetic_retinopathy_detection',
 'div2k',
 'dmlab',
 'downsampled_imagenet',
 'dsprites',
 'dtd',
 'duke_ultrasound',
 'dummy_dataset_shared_generator',
 'dummy_mnist',
 'emnist',
 'eraser_multi_rc',
 'esnli',
 'eurosat',
 'fashion_mnist',
 'flic',
 'flores',
 'food101',
 'gap',
 'gigaword',
 'glue',
 'gr

### Download the Cifar10 dataset and view information

In [11]:
# Load a given dataset by name, along with the DatasetInfo
data, info = tfds.load("cifar10", with_info=True)
print(info)

tfds.core.DatasetInfo(
    name='cifar10',
    version=3.0.0,
    description='The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.',
    homepage='https://www.cs.toronto.edu/~kriz/cifar.html',
    features=FeaturesDict({
        'image': Image(shape=(32, 32, 3), dtype=tf.uint8),
        'label': ClassLabel(shape=(), dtype=tf.int64, num_classes=10),
    }),
    total_num_examples=60000,
    splits={
        'test': 10000,
        'train': 50000,
    },
    supervised_keys=('image', 'label'),
    citation="""@TECHREPORT{Krizhevsky09learningmultiple,
        author = {Alex Krizhevsky},
        title = {Learning multiple layers of features from tiny images},
        institution = {},
        year = {2009}
    }""",
    redistribution_info=,
)



### Exploring the data 

Here we will print the `data` and see what it provides. Then we will need to batch the data as data is provided as individual samples when you import it from `tensorflow-datasets`.

In [5]:
print(data)

{'test': <DatasetV1Adapter shapes: {image: (32, 32, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>, 'train': <DatasetV1Adapter shapes: {image: (32, 32, 3), label: ()}, types: {image: tf.uint8, label: tf.int64}>}


In [13]:
train_ds = data["train"].batch(16)
def format_data(x):
    return (x["image"], tf.one_hot(x["label"], depth=10))
train_ds = train_ds.map(format_data)
for item in train_ds:
    print(item)
    break
    #print(item.shape)
    #print(item[0].shape)

(<tf.Tensor: shape=(16, 32, 32, 3), dtype=uint8, numpy=
array([[[[143,  96,  70],
         [141,  96,  72],
         [135,  93,  72],
         ...,
         [ 96,  37,  19],
         [105,  42,  18],
         [104,  38,  20]],

        [[128,  98,  92],
         [146, 118, 112],
         [170, 145, 138],
         ...,
         [108,  45,  26],
         [112,  44,  24],
         [112,  41,  22]],

        [[ 93,  69,  75],
         [118,  96, 101],
         [179, 160, 162],
         ...,
         [128,  68,  47],
         [125,  61,  42],
         [122,  59,  39]],

        ...,

        [[187, 150, 123],
         [184, 148, 123],
         [179, 142, 121],
         ...,
         [198, 163, 132],
         [201, 166, 135],
         [207, 174, 143]],

        [[187, 150, 117],
         [181, 143, 115],
         [175, 136, 113],
         ...,
         [201, 164, 132],
         [205, 168, 135],
         [207, 171, 139]],

        [[195, 161, 126],
         [187, 153, 123],
         [186, 151

### Training a simple CNN on the Cifar10 data

In [15]:
from tensorflow.keras.layers import Dense, Conv2D, Flatten
from tensorflow.keras.models import Sequential

model = Sequential([
    Conv2D(64,(5,5), activation='relu', input_shape=(32,32,3)),
    Flatten(),
    Dense(10, activation='softmax')
])
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])

model.fit(train_ds, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x201d3f0bc88>