# About the data

The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images. 

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class. 

There are three versions suitable for download, binary, python and matlab versions, for our tutorial we will use the pythong version
http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz



## Python version
The information below is coppied from the CIFAR website http://www.cs.toronto.edu/~kriz/cifar.html  
<br />
The archive contains the files data_batch_1, data_batch_2, ..., data_batch_5, as well as test_batch. Each of these files is a Python "pickled" object produced with cPickle. Here is a python2 routine which will open such a file and return a dictionary:

def unpickle(file):  
&nbsp;&nbsp;&nbsp;import cPickle  
&nbsp;&nbsp;&nbsp;with open(file, 'rb') as fo:  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dict = cPickle.load(fo)  
&nbsp;&nbsp;&nbsp;return dict  
    
And a python3 version:  
def unpickle(file):  
&nbsp;&nbsp;&nbsp;import pickle  
&nbsp;&nbsp;&nbsp;with open(file, 'rb') as fo:  
&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;dict = pickle.load(fo, encoding='bytes')  
&nbsp;&nbsp;&nbsp;return dict
    
Loaded in this way, each of the batch files contains a dictionary with the following elements:
data -- a 10000x3072 numpy array of uint8s. Each row of the array stores a 32x32 colour image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order, so that the first 32 entries of the array are the red channel values of the first row of the image.
labels -- a list of 10000 numbers in the range 0-9. The number at index i indicates the label of the ith image in the array data.

The dataset contains another file, called batches.meta. It too contains a Python dictionary object. It has the following entries:
label_names -- a 10-element list which gives meaningful names to the numeric labels in the labels array described above. For example, label_names[0] == "airplane", label_names[1] == "automobile", etc.

# Step 0 - Initialization


In [None]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import re
import sys
import tarfile
import time
import numpy as np
import cPickle
import cv2
import h5py
import math

from tensorflow.python.training import moving_averages

from six.moves import urllib
import tensorflow as tf

# Step 1 - Download the CIFAR10 data


The first step is to get the data, which takes about 160 MB. 


Define the function for downloading the data

In [None]:
url = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
homedir = os.path.expanduser('~')
dest_directory = homedir +'/Datasets/cifar10'

def maybe_download_and_extract():
  """Download and extract the tarball from Alex's website."""
  if not os.path.exists(dest_directory):
    os.makedirs(dest_directory)
  filename = url.split('/')[-1]
  filepath = os.path.join(dest_directory, filename)
  if not os.path.exists(filepath):
    def _progress(count, block_size, total_size):
      sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
          float(count * block_size) / float(total_size) * 100.0))
      sys.stdout.flush()
    filepath, _ = urllib.request.urlretrieve(url, filepath, _progress)
    print()
    statinfo = os.stat(filepath)
    print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
  extracted_dir_path = os.path.join(dest_directory, 'cifar-10-batches-bin')
  if not os.path.exists(extracted_dir_path):
    tarfile.open(filepath, 'r:gz').extractall(dest_directory)

Download the data

In [None]:
maybe_download_and_extract()

### Several functions used for processing the cifar data and creating training batches

In this part we will implement several important functions that are not related to the model itself, but are necessary in order to process the data into the way that we are able to analyse it with our model. Also, in order to be able to work on larger models, we wouldn't like to keep the whole data in the memory, as this will be too heavy for the system and the GPU will not be able to hold such a load, the data will be saved into a hdf5 file, and all the batches will be accessed directly from that file (In the case of CIFAR10 this proceedure is not important, but will be useful when moveing to larger datasets). Some information regarding the hdf5 file system can be gotten from their official website - https://www.hdfgroup.org/HDF5/

Unpickling of the data from the downloaded files

In [None]:
def unpickle(file):
    with open(file, 'rb') as fo:
        dict = cPickle.load(fo)
    return dict

Changing the labels into a on-hot vector (e.g. 3 is represented by [0,0,0,1,0,0,0,0,0,0]) which is the representation used by most machine learning discriminators.

In [None]:
def onehot(value,out_size):
    output = np.zeros((out_size))
    output[value] = 1
    return output

Reading the data and processing the images and labels

In [None]:
def read_cifar10(path):
    data_batches = []
    labels_batches = []
    for i in xrange(1, 6):
        dict = unpickle(path + "/data_batch_" + str(i))
        data = np.array(dict['data'])
        labels = dict['labels']
        data_batches.extend(data)
        labels_batches.extend(labels)

    labels_batches = [onehot(v, 10) for v in labels_batches]

    data_batches = np.array(data_batches).reshape(-1, 3, 32, 32)
    data_batches = data_batches.transpose(0, 2, 3, 1)

    data_batches = data_batches.astype(np.float32)
    data_batches, mean, std = normalize(data_batches)
    trX = np.array(data_batches[:40000])
    trY = np.array(labels_batches[:40000])

    valX = np.array(data_batches[40000:])
    valY = np.array(labels_batches[40000:])

    dict = unpickle(path + "/test_batch")
    teX, _, _ = normalize(np.array(dict['data']).reshape(-1, 3, 32, 32), mean, std)
    teX = teX.transpose(0, 2, 3, 1)

    teY = dict['labels']
    teY = np.array([onehot(v, 10) for v in teY])

    return trX, trY, teX, teY, valX, valY

Transferring the images and the labels into an hdf5 files and getting them for further use.

In [None]:
def write_hdf5(trX, trY, teX, teY, valX, valY,path):
    f = h5py.File(path+"/data.hdf5", "w")
    htrX = f.create_dataset("trX", trX.shape, dtype='f')
    htrX[...]=trX
    htrX.attrs['length']=len(trX)
    htrY = f.create_dataset("trY", trY.shape, dtype='f')
    htrY[...] = trY
    hteX = f.create_dataset("teX", teX.shape, dtype='f')
    hteX[...] = teX
    hteY = f.create_dataset("teY", teY.shape, dtype='f')
    hteY[...] = teY
    hvalX = f.create_dataset("valX", valX.shape, dtype='f')
    hvalX[...] = valX
    hvalY = f.create_dataset("valY", valY.shape, dtype='f')
    hvalY[...] = valY

def get_hdf5():
    path = "./data"
    f = h5py.File(path + "/data.hdf5", "r")
    return f['trX'],f['trY'],f['valX'],f['valY'],f['teX'],f['teY']

Getting the train and the test batch, and also the shape of the image to be sent as a variable to the model later on

In [None]:
def get_img_shape():
    path = "./data"
    f = h5py.File(path + "/data.hdf5", "r")
    return list(f['trX'].shape[1:])

def get_test_batch(data,labels, batch_size):
    test_indices = np.arange(len(data))  # Get A Test Batch
    np.random.shuffle(test_indices)
    test_indices = list(np.sort(test_indices[0:batch_size]))
    with data.astype('float32'):
        batch_data = data[test_indices]
    with labels.astype('float32'):
        batch_labels = labels[test_indices]
    return batch_data, batch_labels

def get_train_batch(data,labels,indices):
    tr_ind = list(np.sort(indices))
    with data.astype('float32'):
        batch_data = data[tr_ind]
    with labels.astype('float32'):
        batch_labels = labels[tr_ind]

In order to reduce the overfitting and to add extra images to our system, in the following functions the images are undergoing randomly modifications that make them different from the existing images. First of all we are flipping the images, thus creating double the amount of training data. Secondly we are adding a padding of 5 pixels to the images and randomly cropping the correct size out of them, this way we are able not only to create a much larger dataset of images, but also make our system not sensitive to transitions of the object in the image, if it is higher, lower, to the left or to the right, our classifier should be able to learn that an object is an object, no matter where it is located.

In [None]:
def horizontal_flip(image, axis):
    '''
    Flip an image at 50% possibility
    :param image: a 3 dimensional numpy array representing an image
    :param axis: 0 for vertical flip and 1 for horizontal flip
    :return: 3D image after flip
    '''
    flip_prop = np.random.randint(low=0, high=2)
    if flip_prop == 0:
        image = cv2.flip(image, axis)

    return image

def random_crop_and_flip(batch_data, padding_size):
    '''
    Helper to random crop and random flip a batch of images
    :param padding_size: int. how many layers of 0 padding was added to each side
    :param batch_data: a 4D batch array
    :return: randomly cropped and flipped image
    '''
    pad_width = ((0, 0), (padding_size, padding_size), (padding_size, padding_size), (0, 0))
    batch_data = np.pad(batch_data, pad_width=pad_width, mode='constant', constant_values=0)
    cropped_batch = np.zeros(len(batch_data) * 32 * 32 * 3).reshape(
        len(batch_data), 32, 32, 3)

    for i in range(len(batch_data)):
        x_offset = np.random.randint(low=0, high=2 * padding_size, size=1)[0]
        y_offset = np.random.randint(low=0, high=2 * padding_size, size=1)[0]
        cropped_batch[i, ...] = batch_data[i, ...][x_offset:x_offset+32,
                      y_offset:y_offset+32, :]

        cropped_batch[i, ...] = horizontal_flip(image=cropped_batch[i, ...], axis=1)

    return cropped_batch

Normalize the data.

In [None]:
def normalize(images, mean=None,std=None):
    if mean == None:
        mean = np.mean(images)
        std = np.std(images)
    images = (images - mean) / std
    return images, mean,std

# Step 2 - Create the Resnet Model

In this step we create the model, which will later on be initialized and trained during the training stage. 

## Introduction to the Resnet model

The resnet model is a recent winner of many image processing competitions, designed by the Microsoft Researchers. Its idea is to try and deepen the existing Networks design by adding a residual layer to the designs, with this they are able to redirect the information flow, such that when the network goes deeper, an extra layer of shallow parts is added to the output of the building block. It was proven that with such design the model will be at least as good as the shallow model, as it allows direct information flow from the shallow levels to the deeper ones. The results achieved by this kind of model allow for deeper models to get much higher results than the shallower models on the sme data without degradation
![title](img/w04-ResNet.jpg)


# Model Design

The idea behind the system is that it is modular, it has a residual basic building block which is then stacked on top of previous blocks, the gradients are then backpropagating through the structure as they would in a regular deep convolutional network such as the VGG model. There are two basic blocks, the regular one and the bottleneck, in this tutorial we are going to use the simple basic block, as done by the facebook team's design in torch for the cifar10 data (can be found on github at - https://github.com/facebook/fb.resnet.torch/blob/master/models/resnet.lua)

# <center>simple block</center>
![title](img/resnetb.jpg)
# <center>bottleneck block</center>
![title](img/resnetbottleneck.jpg)

### Implementation of the various standard functions and layers used by the ResNET model

The resnet model relies mainly on three kinds of most common deep learning blocks to build its own basic block. Those blocks are Convolution, Batch Normalization and a Fully connected layer. The following methods are created as wrappers for the use of the TensorFlow models.

Some basic functions to get the shape of the tensor

In [None]:
def _get_shape(x):
    """ Get the shape of a Tensor. """
    return x.get_shape().as_list()
def _get_dims(shape):
    """ Get the fan-in and fan-out of a Tensor. """
    fan_in = np.prod(shape[:-1])
    fan_out = shape[-1]
    return fan_in, fan_out

Creation and initialization of the weights and biases that will be used by the fully connected and the convolutional layers.

In [None]:
def weight(name, shape, init='he', range=0.1, stddev=0.01, init_val=None, group_id=0):
    """ Get a weight variable. """
    if init_val != None:
        initializer = tf.constant_initializer(init_val)
    elif init == 'uniform':
        initializer = tf.random_uniform_initializer(-range, range)
    elif init == 'normal':
        initializer = tf.random_normal_initializer(stddev=stddev)
    elif init == 'he':
        fan_in, _ = _get_dims(shape)
        std = math.sqrt(2.0 / fan_in)
        initializer = tf.random_normal_initializer(stddev=std)
    elif init == 'xavier':
        fan_in, fan_out = _get_dims(shape)
        range = math.sqrt(6.0 / (fan_in + fan_out))
        initializer = tf.random_uniform_initializer(-range, range)
    else:
        initializer = tf.truncated_normal_initializer(stddev = stddev)

    var = tf.get_variable(name, shape, initializer=initializer)
    tf.add_to_collection('l2_' + str(group_id), tf.nn.l2_loss(var))
    return var

def bias(name, dim, init_val=0.0):
    """ Get a bias variable. """
    dims = dim if isinstance(dim, list) else [dim]
    return tf.get_variable(name, dims, initializer = tf.constant_initializer(init_val))

Fully connected layer

In [None]:
def fully_connected(x, output_size, name, init_w='he', init_b=0, stddev=0.01, group_id=0):
    """ Apply a fully-connected layer (with bias). """
    x_shape = _get_shape(x)
    input_dim = x_shape[-1]

    with tf.variable_scope(name) as scope:
        w = weight('weights', [input_dim, output_size], init=init_w, stddev=stddev, group_id=group_id)
        b = bias('biases', [output_size], init_b)
        z = tf.nn.xw_plus_b(x, w, b)
    return z

Nonlinearity function, can be chosen from three most used ones

In [None]:
def nonlinear(x, nl=None):
    """ Apply a nonlinearity layer. """
    if nl == 'relu':
        return tf.nn.relu(x)
    elif nl == 'tanh':
        return tf.tanh(x)
    elif nl == 'sigmoid':
        return tf.sigmoid(x)
    else:
        return x

Batch Normalization Layer

In [None]:
def batch_norm(x, name, is_train):
    """ Apply a batch normalization layer. """
    with tf.variable_scope(name):
        inputs_shape = x.get_shape()
        axis = list(range(len(inputs_shape) - 1))
        param_shape = int(inputs_shape[-1])

        moving_mean = tf.get_variable('mean', [param_shape], initializer=tf.constant_initializer(0.0), trainable=False)
        moving_var = tf.get_variable('variance', [param_shape], initializer=tf.constant_initializer(1.0), trainable=False)

        beta = tf.get_variable('offset', [param_shape], initializer=tf.constant_initializer(0.0))
        gamma = tf.get_variable('scale', [param_shape], initializer=tf.constant_initializer(1.0))

        control_inputs = []

        def mean_var_with_update():
            mean, var = tf.nn.moments(x, axis)
            update_moving_mean = moving_averages.assign_moving_average(moving_mean, mean, 0.99)
            update_moving_var = moving_averages.assign_moving_average(moving_var, var, 0.99)
            control_inputs = [update_moving_mean, update_moving_var]
            return tf.identity(mean), tf.identity(var)

        def mean_var():
            mean = moving_mean
            var = moving_var
            return tf.identity(mean), tf.identity(var)

        mean, var = tf.cond(is_train, mean_var_with_update, mean_var)

        with tf.control_dependencies(control_inputs):
            normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)

    return normed

Convolution layer

In [None]:
def convolution(x, k_h, k_w, c_o, s_h, s_w, name, init_w='he', init_b=0, stddev=0.01, padding='SAME', group_id=0):
    """ Apply a convolutional layer (with bias). """
    c_i = _get_shape(x)[-1]
    convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding)
    with tf.variable_scope(name) as scope:
        w = weight('weights', [k_h, k_w, c_i, c_o], init=init_w, stddev=stddev, group_id=group_id)
        z = convolve(x, w)
        b = bias('biases', c_o, init_b)
        z = tf.nn.bias_add(z, b)
    return z

Max Pooling layer

In [None]:
def max_pool(x, k_h, k_w, s_h, s_w, name, padding='SAME'):
    """ Apply a max pooling layer. """
    return tf.nn.max_pool(x, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name=name)

The functions above are the standard deep learning building blocks that are going to be used throughout this tutorial. In the modular implementation in python, those functions are saved in ops.py file, and are imported into the model file. There are many very good resource introducing those methods, and it is very recommended to read up on those before proceeding on with the model implementation. After those are understood, it is trivial to continue on to building the residual network.

### Implementation of the ResNET 20 model itself

We define the class ResNET and are initializing all the variables and the shared variables that we will be using throughout the model. In this tutorial we can compare the model results on 4 different optimizers - Momentum, RMSProp, Adagrad and SGD. Their hyperparameters can be added into the initialization, but the optimization of the optimizers is beyond the scope of this tutorial. The cost of the function that is used here is the cross entropy, and since the tensorflow offers a function that already applies the softmax, we do not need to apply it by ourselves to the final output of the process. We are also creating a tensorboard saver in the last few lines of the ResNET, it is used for being able to save checkpoints at different times, and to load them for further processing and analyzing. Also the checkpoint saver is there in order to implement the early stopping algorithm, which saves the best validation model and in the end of the process is loading it, thus stopping our model from overfitting.

In [None]:
class ResNET(object):
    def __init__(self, sess,img_shape=[32,32,3],optimizer='RMSProp'):
        self.sess = sess
        self.is_train = tf.placeholder(tf.bool)
        self.lrn_rate = tf.placeholder(tf.float16)
        self.batch_size= tf.placeholder(tf.int32)
        self.batch_shape = [self.batch_size,128]
        self.img_shape = img_shape
        self.X = tf.placeholder("float", [None, ]+img_shape)
        self.Y = tf.placeholder("float", [None, 10])
        self.optimizer = optimizer
        self.py_x = self.build_resnet20()
        self.out_sum = tf.summary.histogram("out", self.out)
        self.cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.py_x, labels=self.Y))
        tf.summary.scalar('cross_entropy', self.cost)
        if self.optimizer == 'mom':
            self.train_op = tf.train.MomentumOptimizer(self.lrn_rate, 0.9).minimize(self.cost)
        elif self.optimizer == 'RMSProp':
            self.train_op = tf.train.RMSPropOptimizer(self.lrn_rate, 0.9).minimize(self.cost)
        elif self.optimizer == "AdaGrad":
            self.train_op = tf.train.AdagradOptimizer(self.lrn_rate,1e-6).minimize(self.cost)
        elif self.optimizer == 'SGD':
            self.train_op = tf.train.GradientDescentOptimizer(self.lrn_rate).minimize(self.cost)
        else:
            self.train_op = self.own_optimizer(self.cost,self.lrn_rate)
        self.predict_op = tf.argmax(self.py_x, 1)
        with tf.name_scope('accuracy'):
            self.accuracy = tf.reduce_mean(tf.cast(tf.equal(self.predict_op,tf.argmax(self.Y,1)),"float"))
        tf.summary.scalar('accuracy', self.accuracy)
        self.checkpoint_dir = 'checkpoint'
        self.saver = tf.train.Saver()
        # For early stopping
        self.best_acc = 0 # Best Accuracy
        self.max_no_imp = 50 # Maximum epochs without improvement on the validation data
        self.no_imp_counter = 0 # Counter of no improvement epochs

And now finally for the core part of the tutorial, the Resnet model and it's bulding blocks. For this tutorial as mentioned above only the simple block is implemented, the bottleneck block is used in deeper models which we are not going to implement here. The block looks like the following - 
![title](img/w04-ResNet2.jpg)

There are two ways of adding the input, as is or through the identity convolution, in this tutorial we have used the identity convolution.

In [None]:
class ResNET(ResNET):
    def simple_block(self, input_feats, name1, name2, n, stride=1):
        """ A basic block of ResNets - used for the smaller versions, for more advanced ones, check the bottleneck version"""
        branchSa_feats = convolution(input_feats, 1, 1, 2 * n, stride, stride, name1 + '_branchSa')
        branchSa_feats = batch_norm(branchSa_feats, name2 + '_branchSa',self.is_train)

        branchSb_feats = convolution(input_feats, 3, 3, n, stride, stride, name1 + '_branchSb')
        branchSb_feats = batch_norm(branchSb_feats, name2 + '_branchSb',self.is_train)
        branchSbfeats = nonlinear(branchSb_feats, 'relu')

        branchSc_feats = convolution(branchSbfeats, 3, 3, n*2, 1, 1, name1 + '_branchSc')
        branchSc_feats = batch_norm(branchSc_feats, name2 + '_branchSc',self.is_train)

        output_feats = branchSa_feats + branchSc_feats
        output_feats = nonlinear(output_feats, 'relu')
        return output_feats

Next those building blocks are stacked on top one another and create a deeper network, the network can go as deep as requested, for simplisity in this tutorial we are using a 20 layer network. After the blocks are stacked on top of each other, the output is flattened by the use of average pooling with a kernel size of [1,8,8,1] and the whole thing is reshaped into a vector, that in turn is fed into a fully connected layer that outputs 10 dimensional vector, which corresponds to our output labels. This output then is flattened to make probabilities of each image belonging to one of the 10 categories (this is done in the cost function).

In [None]:
class ResNET(ResNET):
    def build_resnet20(self):
        """ Build the ResNet20 net. """
        """ This net can be modified to any other size stride by changing the n, I am recreating
        the specific residual network that was used by facebook designed for CIFAR. If want to change
        this network to a larger one by using only the basic blocks"""

        imgs = self.X
        #imgs = tf.placeholder(tf.float32, [self.batch_size] + self.img_shape)
        #is_train = tf.placeholder(tf.bool)
        # TODO: Relu and batch normalization separately!!!
        conv1_feats = convolution(imgs, 3, 3, 16, 1, 1, 'conv1')
        conv1_feats = batch_norm(conv1_feats, 'bn_conv1',self.is_train)
        conv1_feats = nonlinear(conv1_feats, 'relu')

        res2a_feats = self.simple_block(conv1_feats,'res2a','bn2a', 16)
        res3a_feats = self.simple_block(res2a_feats, 'res3a', 'bn3a', 32, 2)
        res4a_feats = self.simple_block(res3a_feats, 'res4a', 'bn4a', 64,2)
        
        # Registration of the output activations for further analysis in tensorboard
        self.res2_sum = tf.summary.histogram("res2", res2a_feats)
        self.res3_sum = tf.summary.histogram("res3", res3a_feats)
        self.res4_sum = tf.summary.histogram("res4", res4a_feats)
        
        # in torch this is spatial average pool
        pool4_feats = tf.nn.avg_pool(res4a_feats, ksize=[1,8,8,1], strides=[1,1,1,1], padding = 'VALID')

        res5a_feats_flat = tf.reshape(pool4_feats,self.batch_shape)

        res6a_fc_feats = fully_connected(res5a_feats_flat, 10, "FCa", init_w='he', init_b=0, stddev=0.01, group_id=0)

        self.out = res6a_fc_feats

        return self.out

### Training procedure

Now that all the building blocks are in place and the model is defined, we can move on to the training of our data. The training is done iteratively on the training dataset, with a predefined batch size, and tested on a validation and testing data of larger size.  
The procedure itself is fairly straightforward, after initializing all the variables defined in the graph of tensorflow, we are iterating over the same procedure for a predifined amount of iterations (epochs) and are modifying our training rate to become smaller and smaller as the time progresses (that hyperparameter is an important one for the achieved results and one that is recommended to modify if trying to get better results on Gradient Descent based methods). Then inside the loop, the data is split into batches, and their indices shuffled, this way we prevent our model from learning a certain order in the training data and overfitting the model. The data batch is processed according to the utilities defined above in the cifar data processing part of the tutorial and is fed into the minimizer function. The results are then saved to the logs for further analysis with tensorboard.  
After the training is over, we are loading from the memory the best results we have achieved and are testing it on the new data batch and a testing data (the batch can be defined as 10000 to check it on all of the data.

In [None]:
class ResNET(ResNET):
    def train(self,train_batch=128,val_batch=256, test_batch=1024,n_epochs=500):
        # Need to initialize all variables
        trX, trY, valX, valY, teX, teY = get_hdf5()
        self.merged = tf.summary.merge_all()
        self.train_writer = tf.summary.FileWriter("./logs/train", self.sess.graph)
        self.test_writer = tf.summary.FileWriter("./logs/test", self.sess.graph)

        tf.global_variables_initializer().run()
        lrn_rate = 0.1
        count = 0
        total_time = time.time()
        i = 0
        for i in range(n_epochs):
            if i == 20:
                lrn_rate=0.01
            if i ==80:
                lrn_rate=0.001
            # Adding randomality to reduce overfitting
            train_indices = np.arange(len(trX))
            np.random.shuffle(train_indices)
            training_batch = zip(range(0, len(trX), train_batch),
                                 range(train_batch, len(trX) + 1, train_batch))
            for start, end in training_batch:
                tr_ind = list(np.sort(train_indices[start:end]))
                with trX.astype('float32'):
                    batch_data = trX[tr_ind]
                with trY.astype('float32'):
                    batch_labels=trY[tr_ind]
                #batch_data, batch_labels = get_train_batch(trX,trY,train_indices[start:end])
                # Adding random modification to reduce overfitting
                batch_data = random_crop_and_flip(batch_data,5)
                summary,_,tr_acc, tr_cost = self.sess.run([self.merged, self.train_op, self.accuracy, self.cost], feed_dict={self.X: batch_data, self.Y: batch_labels,
                                                   self.is_train:True, self.lrn_rate:lrn_rate,self.batch_size:train_batch})

                self.train_writer.add_summary(summary, count)
                count+=1
            batch_data, batch_labels = get_test_batch(valX,valY, val_batch)
            summary, val_acc = self.sess.run([self.merged, self.accuracy], feed_dict={self.X: batch_data,
                                                                                 self.Y: batch_labels,
                                                                                 self.is_train: True,
                                                                                 self.batch_size: val_batch})
            batch_data = None
            if val_acc > self.best_acc:
                self.save(self.checkpoint_dir, count)
                self.no_imp_counter = 0
                self.best_acc = val_acc
            else:
                self.no_imp_counter += 1
                if self.no_imp_counter > self.max_no_imp:
                    break

            self.test_writer.add_summary(summary,count)
            #self.train_writer.add_summary(summary, i)
            print('train ', i, tr_acc)
            #self.test_writer.add_summary(summary, i)
            print('validation ',i, val_acc)
            print ('cost - ',tr_cost)
            if (i%10==0):
                batch_data,batch_labels = get_test_batch(teX,teY,test_batch)
                summary, test_acc = self.sess.run([self.merged, self.accuracy], feed_dict={self.X: batch_data,
                                                                       self.Y: batch_labels,
                                                                       self.is_train: True,
                                                                       self.batch_size: test_batch})
                print('test ', i, test_acc)
        total_time -= time.time()
        # Print out the best run (early stopping implementation)
        could_load, checkpoint_counter = self.load(self.checkpoint_dir)
        if could_load:
            batch_data, batch_labels = get_test_batch(teX,teY, test_batch)
            train_data, train_labels = get_test_batch(trX, trY, test_batch)
            val_data, val_labels = get_test_batch(valX, valY, test_batch)

            val_acc = self.sess.run([self.accuracy], feed_dict={self.X: val_data,
                                                                self.Y: val_labels,
                                                                self.is_train: True,
                                                                self.batch_size: test_batch})
            tr_acc = self.sess.run([self.accuracy], feed_dict={self.X: train_data,
                                                                self.Y: train_labels,
                                                                self.is_train: True,
                                                                self.batch_size: test_batch})
            te_acc = self.sess.run([self.accuracy], feed_dict={self.X: batch_data,
                                                               self.Y: batch_labels,
                                                               self.is_train: True,
                                                               self.batch_size: test_batch})
            print ('average time per iteration is: ', total_time / (i + 1))
            print ('best validation accuracy - ',val_acc)
            print ('with train accuracy of - ', tr_acc)
            print ('with test accuracy of - ', te_acc)

Three more small functions that are taking care of loading and saving the checkpoints

In [None]:
class ResNET(ResNET):
    @property
    def model_dir(self):
        return "".format()

    def save(self, checkpoint_dir, step):
        model_name = "DCGAN.model"
        checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)

        if not os.path.exists(checkpoint_dir):
            os.makedirs(checkpoint_dir)

        self.saver.save(self.sess,
                        os.path.join(checkpoint_dir, model_name),
                        global_step=step)

    def load(self, checkpoint_dir):
        import re
        print(" [*] Reading checkpoints...")
        checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)

        ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
        if ckpt and ckpt.model_checkpoint_path:
            ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
            self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
            counter = int(next(re.finditer("(\d+)(?!.*\d)", ckpt_name)).group(0))
            print(" [*] Success to read {}".format(ckpt_name))
            return True, counter
        else:
            print(" [*] Failed to find a checkpoint")
            return False, 0

Ok, tutorial almost over, the model is implemented, training procedure defined and all the parts are in the right place. We can proceed on to the main function which will set the wheels in motion on our journey into the analysis of the Cifar10 images.

In [None]:
flags = tf.app.flags
#flags.DEFINE_integer("n_epochs", 500, "Epoch to train [500]")
#flags.DEFINE_integer("train_size", 128, "The size of batch images [128]")
#flags.DEFINE_integer("val_size", 256, "The size of batch images [128]")
#flags.DEFINE_integer("test_size", 512, "The size of batch images [128]")
#flags.DEFINE_string("optimizer","mom","The name of the optimizer to be used [AdaDelta,mom,RMSProp,SGD]")
#FLAGS = flags.FLAGS
n_epochs = 500
train_size = 128
val_size = 256
test_size = 512
optimizer = "mom"
def main(_):
    homedir = os.path.expanduser('~')
    path = homedir + '/Datasets/cifar10/cifar-10-batches-py'
    hdf5_path = "./data"

    if not os.path.exists(hdf5_path+"/data.hdf5"):
        print ("hdf5 doesn't exist, creating it in data folder")
        trX, trY, teX, teY, valX, valY = read_cifar10(path)
        write_hdf5(trX, trY, teX, teY, valX, valY, hdf5_path)

    run_config = tf.ConfigProto()
    run_config.gpu_options.allow_growth = True
    with tf.Session(config=run_config) as sess:
        ResNet_model = ResNET(sess, img_shape=get_img_shape(),optimizer=FLAGS.optimizer)
        ResNet_model.train(n_epochs=n_epochs,train_batch=train_size,val_batch=val_size, test_batch=test_size)

if __name__ == "__main__":
    tf.app.run()

This is it for this tutorial, the test results we were able to achieve on the Cifar10 are roughly the same as were illustrated by the original paper (after about 100 epochs a training accuracy of 93% and a testing accuracy of 92%). This structures can be further applied to other datasets, and by using the basic building blocks of the residual model create deeper and stronger models. There are many parts of the design that can be modified and perhaps even improved, but that is left as an exercise to the student.