##SparseNet models for Keras.
### Reference
- [Sparsely Connected Convolutional Networks](https://arxiv.org/abs/1801.05895)
- [Github](https://github.com/lyken17/sparsenet)

###Skip Connections:
---
<div align="justify">Predicting detail information of complicated visual scene may require understanding it at multiple levels abstraction, from edges and textures to object categories. As we all know convolution neural network learns increasingly abstract visual representations when going to deeper layers. But training such  deep networks requires back-propogating a signal through all the layers of the given networks which results in loss at the end of network to be noisier due to deeper layers and it becomes worst has we go deep and we also need to store and maintain a feature computed early that network need to reuse, to overcome this we use skip connections which connect multiple outputs from different layer to $l_{th}$ layer which can provide a pathway for assembling  feature that  combines many level of abstraction</div>

### Optimizer:
___

The choice of optimization algorithm indicates how fast  and optimum is your model.  To train a model we need to reduce the loss which is a functon of weight and bias. With the help of back propogation we back propogate the current error in the previous layer and modify the weights and bias such a way that the error is reduced. To modify weights we use optimization algorithm.

Optimization function is usually use to calculate gradiant i.e. the partial derivative of loss function with respect to weights and weghts are modified in opposite direction of calculate gradient. This i cycle is repeated until we reach minima of loss function

To reduce the loss and modify the weights we use hyper parameter learning rate which can be tuned to get optimal results. Choosing a proper learning rate is difficult. If the learning rate is too low it result in slow convergence and may lead to vanishing gradient descent  and if it is to high it may diverge from the minmal and may leading exploding gradient problem




### Why Sparsenet : 
___
<div align="justify">SparseNet is a variant of DenseNets or Resnet. In Densenet we  have a skip connection after every block which are concatenated to next layer and in resnet we do cumulative summation, but both have same problems i.e. as the depth increases, the number of features grows linearly. Later features may corrupt or wash out the information carried by earlier features maps as seen in resnet which result in saturation of resnet performance, in contrast densnet preserves the original format of previous layers due to concatenation, this factor contribute to better parameter performance efficiency over resnet but due to concatenation no of parameters grows at the rate of $0(N^2)$ due to which portion of network is devoted to process previously seen feature map and hence are not able to exploit all the parameters fully and this pitfall are due to the linear growth of feature maps in both densnet and resnet. To overcome this we would like to maintain the power of short gradient paths for training. By aggregating features only from layers with exponential offset the length of the shortest  gradient path between blocks with offset S is bounded  by  $O((c-1)log(S))$. Here, c is again the base of the exponent governing the sparse connection pattern. The total number of output to the $l^{th}$ block is $O(log(l))$ due to exponential offsets. Therefore total no of skip connections is</div>

<center>$\sum_{l=1}^N(log_cl) = O(NlogN)$</center>

<center>N is the number of basic blocks (depth) of the network. </center>

<center>The number of parameters are $O(N log N)$ and $O(N)$, respectively, for aggregation by concatenation and aggregation by summation</center>

<div align="justify">SparseNets have such skip connections only at depths of $2^N$. This allows model to be less memory intensive due to less parameters while still performing equivalent to densnet or better</div>

<center>![alt text](https://lh3.googleusercontent.com/ar5begWFXAGXPVDeIORZB_iD4OrsAe6dR-yyfEjCNhR8fnt-LnnFcRDUrecj7era4845nS8iyolaWmN0GaTCo114I9WmTSTo0cTIGBQnVwzvJ9yrVa0Fm0TYUnxphcHbQC5pAoWe=w2400)</center>

<center>** Densnet/Resnet**</center>

<center>![alt text](https://lh3.googleusercontent.com/pJosUXvPpuMJu87oi0gb351VelsWpLkbcX6TXx5i1qh_QOaMEPgeJS-Ikg3Dilfue6qDNnfOblaOpc8BUJOzgY4yPE23QxOBttS268ojfYJajR7uBGg__cNisOUUIp0f-vNfx7zi=w2400)</center>

<center>![alt text](https://lh3.googleusercontent.com/EnnfenqM7PICFgtcfxgVO0PWJOCbpFFCCq0DDSRhBoZU63ZgZUzqAsGWx0tSZCbULSNwqfBDEr41Q0enZUJXAUON3j2s30aqosQZrrsgBHWHjWJpB4Xo5bBlm-NvmBkkXJNOTRMp=w2400)</center>
<center>** Sparsenet**</center>


##Experiments

<div align="justify">We demonstrate the effectiveness of SparseNets over DenseNets, through image classification tasks on the CIFAR-100 datasets</div>

We Implement our models in Keras

**Datasets**:
CIFAR both the CIFAR-10 and CIFAR-100 datasets have 50,000 training
images and 10,000 testing images with size of 32 × 32 pixels. CIFAR-10 
and CIFAR-100  have 10 and 100 classes respectively. Our experiments
use standard data augmentation

In [0]:
import numpy as np
import warnings
from scipy.misc import imresize, toimage

from keras.models import Model
from keras.layers.core import Dense, Dropout, Activation, Reshape
from keras.layers.convolutional import Conv2D, Conv2DTranspose, UpSampling2D, SeparableConv2D
from keras.layers import AveragePooling2D, MaxPooling2D
from keras.layers import GlobalAveragePooling2D
from keras.layers import Input
from keras.layers.merge import concatenate
from keras.layers.normalization import BatchNormalization
from keras.regularizers import l2
from keras.utils.layer_utils import convert_all_kernels_in_model, convert_dense_weights_data_format
from keras.utils.data_utils import get_file
from keras.engine.topology import get_source_inputs
from keras.applications.imagenet_utils import _obtain_input_shape
from keras.applications.imagenet_utils import decode_predictions
from keras.callbacks import LearningRateScheduler
import keras.backend as K


import os.path
import sklearn.metrics as metrics
from keras.datasets import cifar100
from keras.utils import np_utils
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD
from keras.callbacks import ModelCheckpoint, ReduceLROnPlateau

###Load Datasets

In [0]:
#load data
(trainX, trainY), (testX, testY) = cifar100.load_data()

trainX = trainX.astype('float32')
testX = testX.astype('float32')

cifar_mean = trainX.mean(axis=(0, 1, 2), keepdims=True)
cifar_std = trainX.std(axis=(0, 1, 2), keepdims=True)

trainX = (trainX - cifar_mean) / (cifar_std + 1e-8)
testX = (testX - cifar_mean) / (cifar_std + 1e-8)

Y_train = np_utils.to_categorical(trainY, nb_classes)
Y_test = np_utils.to_categorical(testY, nb_classes)



### Hyper Parameters


In [0]:
dropout_rate = 0.0
growth_rate=24
compression = 0.5
depth = 40
bottleneck=False
weight_decay=1e-4

**Batch Normalization** : Use to normalize output of previous activation layer

**Relu** : Use to convert all the negative values to zero and positive values unchanged

**Conv2D**: Apply kernel with receptive field size of 3x3 with bias zero and padding same to maintain output feature map same as input feature map

**Dropout**: Reducing overfiitting by dropping few node randomly

**Bottleneck Layer** : To improve computational efficiency we can introduce bottleneck layer 3x3 convolution layer to reduce the no of feature maps

**Compression**:  We can also reduce feature maps in transition layer by introducing compression factor whose value lies between $0<\theta\le1$

###Exponential $2^N$
We consider skip connections only at depths of $2^N$.  

**input** : List of processed layers

**returns** : layers which are exponential of $2^N$

In [0]:
def _exponential_index_fetch(x_list):
    count = len(x_list)
    i = 1
    inputs = []
    while i <= count:
        inputs.append(x_list[count - i])
        i *= 2
    return inputs

###Convolution Layer with &amp; Wihtout Bottleneck Layer

###Parameters:

**Input**: previous layer output (ip), num of filter (nb_filter)

**Output** : current layer output (x)

**note**: Both bottleneck layer and dropout are conditional 

In [0]:
def add_conv_block(ip, nb_filter):
    
  concat_axis = -1

  x = BatchNormalization(axis=concat_axis, momentum=0.1, epsilon=1e-5)(ip)
  x = Activation('relu')(x)

  if bottleneck:
    # Obtained from https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua
    inter_channel = nb_filter * 4 

    x = Conv2D(inter_channel, (1, 1), kernel_initializer='he_normal', padding='same', use_bias=False,
               kernel_regularizer=l2(weight_decay))(x)
    x = BatchNormalization(axis=concat_axis, epsilon=1e-5, momentum=0.1)(x)
    x = Activation('relu')(x)

  x = Conv2D(nb_filter, (3, 3), kernel_initializer='he_normal', padding='same', use_bias=False)(x)
  if dropout_rate > 0.0:
    x = Dropout(dropout_rate)(x)

  return x

### Dense Block

###Parameters:

**Input**: 

1.   previous layer output (ip)
2.   no of layers per dense block (nb_layers)
3. rate at which no of filter grow (growth_rate), grow filter exponentially               (grow_nb_filters)

**Output** : 
1. current layer output (x) 
2. no of filter (nb_filter)

###Explanation

In this function we create layers based on no of layers (nb_layers)  and concatenate  output layers from previous layers  which are $2^N$  exponential and also grow no of filter exponentially 


In [0]:
def add_dense_block(x, nb_layers, nb_filter):
    
    concat_axis = -1

    x_list = [x]
    channel_list = [nb_filter]

    for i in range(nb_layers):
      x = add_conv_block(x, growth_rate)
      x_list.append(x)

      fetch_outputs = _exponential_index_fetch(x_list)
      x = concatenate(fetch_outputs, axis=concat_axis)

      channel_list.append(growth_rate)

    nb_filter = sum(_exponential_index_fetch(channel_list))

    return x, nb_filter

###Transition Layer


###Parameters:

**Input**: 

1.   previous layer output (ip)
2. no of filters (nb_filter)
3. reduction ratio of transition layer (compression)

**Output** : 
1. current layer output (x) 

###Explanation

Use to reduce the no of feature maps after each dense block. This is use to control no of parameters that flow to next dense block for better computancy. Using compression we control the reduction ratio of transition layerdef add_transition_block(ip, nb_filter):
  concat_axis = -1
  x = BatchNormalization(axis=concat_axis, epsilon=1e-5, momentum=0.1)(ip)
  x = Activation('relu')(x)
  x = Conv2D(int(nb_filter * compression), (1, 1), kernel_initializer='he_normal', padding='same', use_bias=False,
             kernel_regularizer=l2(weight_decay))(x)
  x = AveragePooling2D((2, 2))(x)

  return x

In [0]:
def add_transition_block(ip, nb_filter):
  concat_axis = -1
  x = BatchNormalization(axis=concat_axis, epsilon=1e-5, momentum=0.1)(ip)
  x = Activation('relu')(x)
  x = Conv2D(int(nb_filter * compression), (1, 1), kernel_initializer='he_normal', padding='same', use_bias=False,
             kernel_regularizer=l2(weight_decay))(x)
  x = AveragePooling2D((2, 2))(x)

  return x

###Create Sparsenet


###Parameters:

**Input**: 

1.   Image Input (img_input)
2.   total no of layer in the given sparsenet model (depth)
4. growth rate

**Output** : 
1. model output (x) 

###Explanation

Based on given depth we divide total no layers into layers in dense block, layer in transition layer and layer in bottleneck layer using the below formula we divide the layers into the given blocks and layers
<center>$Depth = 3N+4$</center>
**note**: by default we consider 3 dense block and last dense block doesn't have transition layer

We convert given no of dense block in our case 3 into a list and also growth rate into list remember both lost should be of dense block size

We assign nb_filter = growth rate for firstlayer which is a convolution layer of kerner receptive field size of 3x3 following this layers is our 3 dense block and 2 transition layer and bottleneck layers if true and finall y the output layer

In [0]:
def create_dense_net(img_input):
   
    global compression
    #channel_last
    concat_axis = -1
    #no of dense_block
    nb_dense_block=3

    # layers in each dense block
    assert (depth - 4) % nb_dense_block == 0, 'Depth must be 3 N + 4'
    count = int((depth - 4) / nb_dense_block)

    if bottleneck:
      count = count // 2
    else:
      compression = 1

    #convert int list
    nb_layers = [count for _ in range(nb_dense_block)]
    final_nb_layer = count

    # compute initial nb_filter
    nb_filter = 2 * growth_rate

    # Initial convolution
    x = Conv2D(nb_filter, (3,3), kernel_initializer='he_normal', padding='same', use_bias=False, kernel_regularizer=l2(weight_decay))(img_input)

    # Add dense blocks
    for i in range(nb_dense_block - 1):
        # add dense block
        x, nb_filter = add_dense_block(x, nb_layers[i], nb_filter)
        # add transition_block
        x = add_transition_block(x, nb_filter)
        nb_filter = int(nb_filter * compression)

    # The last dense_block does not have a transition_block
    x, nb_filter = add_dense_block(x, final_nb_layer, nb_filter)

    x = BatchNormalization(axis=concat_axis, epsilon=1e-5, momentum=0.1)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    x = Dense(nb_classes, activation='softmax')(x)

    return x

###Create Sparse Model
###parameters
**batch_size** :  The datasets which we use is cifar and consist of 50000 train and 10000 test images which we divide into no of batches with each batch size equal to 64 and total no of batches forms 1 epoch

In [0]:
def SparseNet(input_shape):
    
    inputs = Input(shape=input_shape)
    
    output = create_dense_net(inputs)
        
    # Create model.
    model = Model(inputs, output)
    return model

In [0]:
nb_classes = 100
epoch = 150
nb_epoch_24 = 15
nb_epoch_32 = 135

img_rows, img_cols = None, None
img_channels = 3

In [0]:
model = SparseNet((None,None,nb_channels))
print("Model created")

model.summary()

Model created
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_2 (InputLayer)            (None, None, None, 3 0                                            
__________________________________________________________________________________________________
conv2d_40 (Conv2D)              (None, None, None, 4 1296        input_2[0][0]                    
__________________________________________________________________________________________________
batch_normalization_40 (BatchNo (None, None, None, 4 192         conv2d_40[0][0]                  
__________________________________________________________________________________________________
activation_40 (Activation)      (None, None, None, 4 0           batch_normalization_40[0][0]     
_______________________________________________________________________________________________

<b>Image Augmentation</b> If we consider example of dog image in given dataset and all dogs are facing left side then the model may not recognize all the dog facing right side after it its trained for good accuracy which may lead to overfiitting. To reduce over fitting and to improve accuracy we use image augmentation which genrates images from given datasets and produces more images with different angle and flips for better results and also we resize the images into smaller size  as the layers in the intial stages perfoms same pattern detection irrelevant of image size, therfore by reducing the size we can train our model faster and using less computation and then apply the result from this layers to images with large size

In [0]:
generator = ImageDataGenerator(width_shift_range=5. / 32,
                               height_shift_range=5. / 32,
                               horizontal_flip=True)

generator.fit(trainX, seed=0)

## Experiment

<div align="justify">**Objective:** Create a model using <b><i>Sparsenet</i></b> architecture and train the model on <i><b>CIFAR100</b></i> dataset</div>

<div align="justify">In this model i use densnet architecture for our model and replace the concatenation part with exponential $2^N$ function keeping all the hyper parameters and model architecture same. As i have already explained why sparse net exponential term has advantage over denset and resnet i would like to present the facts in below experiment
  
 I use <i><b>Stochastic Gradient Descent (SGD)</b></i> optimizer. I have set the learning rate as 0.1 and i have also use momentum parameter which helps accelerate SGD in right direction. During the experiment i used  3 methods to set learning rate during training  
 1. <b>Time-Based Learning Rate Schedule</b> in which we decay the learning rate by certain value at each epoch let say $$\\text{decay} = \text{learning rate} /\text{ no of epoch}$$
 <br>
 2.<b> Drop-Based Learning Rate Schedule</b> in which we reduce learning rate after certain epochs for example in densnet paper the initial lr=0.1 and reduce to 0.01 when epoch reaches 50% and 0.001 when epoch reaches to 75% 
 
Both the above method are form of <i><b>Adaptive Learning Rate</b></i> in which we reduce learning rate linearly but in both cases what should be the base learning rate since if learning rate is to low it lead to slow convergence and if it is high it may lead to divergence and we may need to train model with diifferent learning rate to find the optimal result
</br></br>

### Experiment Result

### val_acc =  69.620 for 250 Epochs using SGD Optimizer along with Learning Rate scheduler




In [0]:
#optimizer
optimizer = SGD(lr=0.0,decay=0.0, momentum=0.9,nesterov=True)  # Using SGD with Learning Rate Scheduler
model.compile(loss='categorical_crossentropy', optimizer=optimizer, metrics=["accuracy"])
print("Finished compiling")
print("Building model...")


Finished compiling
Building model...


In [0]:
# Learning Rate Schedule
def step_decay(epoch):
  lr = 0.1
  if epoch == int(0.5 * nb_epoch_32):
    lr = np.float32(learning_rate / 10.)
  if epoch == int(0.75 * nb_epoch_32):
    lr = np.float32(learning_rate / 100.)
  return lr

In [0]:
lrate = LearningRateScheduler(step_decay)
model_checkpoint = ModelCheckpoint(weights_file, monitor="val_acc", save_best_only=True, save_weights_only=True, verbose=1)

callbacks = [lrate, model_checkpoint]

In [0]:
#size 24
#the learning rate is kept constant at 0.1
trainX_24 = [imresize(image, (24, 24, 3)) for image in trainX]
trainX_24 = np.array(trainX)

model.fit_generator(generator.flow(trainX_24, Y_train, batch_size=64),
                    steps_per_epoch=len(trainX_24) // batch_size, epochs=nb_epoch_24,
                    callbacks=callbacks,
                    validation_data=(testX, Y_test),
                    validation_steps=testX.shape[0] // batch_size, verbose=1)



  if issubdtype(ts, int):
  elif issubdtype(type(size), float):


Epoch 1/15

Epoch 00001: val_acc improved from -inf to 0.13040, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 2/15
 85/781 [==>...........................] - ETA: 3:08 - loss: 3.5948 - acc: 0.1382


Epoch 00002: val_acc improved from 0.13040 to 0.25160, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 3/15
136/781 [====>.........................] - ETA: 2:54 - loss: 3.0103 - acc: 0.2448


Epoch 00003: val_acc improved from 0.25160 to 0.32840, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 4/15
149/781 [====>.........................] - ETA: 2:50 - loss: 2.5723 - acc: 0.3324


Epoch 00004: val_acc improved from 0.32840 to 0.35710, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 5/15
152/781 [====>.........................] - ETA: 2:48 - loss: 2.2504 - acc: 0.4055


Epoch 00005: val_acc improved from 0.35710 to 0.41550, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 6/15
153/781 [====>.........................] - ETA: 2:49 - loss: 2.0702 - acc: 0.4454


Epoch 00006: val_acc improved from 0.41550 to 0.48630, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 7/15
153/781 [====>.........................] - ETA: 2:49 - loss: 1.8771 - acc: 0.4952


Epoch 00007: val_acc improved from 0.48630 to 0.49790, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 8/15
153/781 [====>.........................] - ETA: 2:49 - loss: 1.7641 - acc: 0.5221


Epoch 00008: val_acc improved from 0.49790 to 0.49860, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 9/15
153/781 [====>.........................] - ETA: 2:49 - loss: 1.6582 - acc: 0.5460


Epoch 00009: val_acc improved from 0.49860 to 0.51100, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 10/15
153/781 [====>.........................] - ETA: 2:48 - loss: 1.5962 - acc: 0.5611


Epoch 00010: val_acc improved from 0.51100 to 0.54120, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 11/15
153/781 [====>.........................] - ETA: 2:49 - loss: 1.5452 - acc: 0.5818


Epoch 00011: val_acc improved from 0.54120 to 0.54680, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 12/15
153/781 [====>.........................] - ETA: 2:49 - loss: 1.4709 - acc: 0.6049


Epoch 00012: val_acc improved from 0.54680 to 0.56470, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 13/15
153/781 [====>.........................] - ETA: 2:49 - loss: 1.4154 - acc: 0.6125


Epoch 00013: val_acc did not improve from 0.56470
Epoch 14/15
182/781 [=====>........................] - ETA: 2:40 - loss: 1.4037 - acc: 0.6126


Epoch 00014: val_acc improved from 0.56470 to 0.57110, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 15/15
160/781 [=====>........................] - ETA: 2:46 - loss: 1.3232 - acc: 0.6368


Epoch 00015: val_acc did not improve from 0.57110


<keras.callbacks.History at 0x7fac9de6fcf8>

In [0]:
lrate = LearningRateScheduler(step_decay)
model_checkpoint = ModelCheckpoint(weights_file, monitor="val_acc", save_best_only=True, save_weights_only=True, verbose=1)

callbacks = [lrate, model_checkpoint]
#size 32
model.fit_generator(generator.flow(trainX, Y_train, batch_size=batch_size),
                    steps_per_epoch=len(trainX) // batch_size, epochs=nb_epoch_32,
                    callbacks=callbacks,
                    validation_data=(testX, Y_test),
                    validation_steps=testX.shape[0] // batch_size, verbose=1)

Epoch 1/135

Epoch 00001: val_acc improved from -inf to 0.66020, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 2/135
 82/781 [==>...........................] - ETA: 3:07 - loss: 0.4533 - acc: 0.8843


Epoch 00002: val_acc improved from 0.66020 to 0.67590, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 3/135
135/781 [====>.........................] - ETA: 2:51 - loss: 0.4288 - acc: 0.8922


Epoch 00003: val_acc did not improve from 0.67590
Epoch 4/135
177/781 [=====>........................] - ETA: 2:40 - loss: 0.4286 - acc: 0.8900


Epoch 00004: val_acc improved from 0.67590 to 0.67740, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 5/135
159/781 [=====>........................] - ETA: 2:46 - loss: 0.4141 - acc: 0.8979


Epoch 00005: val_acc did not improve from 0.67740
Epoch 6/135


Epoch 00006: val_acc improved from 0.67740 to 0.67960, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 7/135
161/781 [=====>........................] - ETA: 2:45 - loss: 0.3839 - acc: 0.9117


Epoch 00007: val_acc did not improve from 0.67960
Epoch 8/135


Epoch 00008: val_acc improved from 0.67960 to 0.68050, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 9/135
161/781 [=====>........................] - ETA: 2:45 - loss: 0.3891 - acc: 0.9032


Epoch 00009: val_acc improved from 0.68050 to 0.68700, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 10/135
155/781 [====>.........................] - ETA: 2:47 - loss: 0.3740 - acc: 0.9071


Epoch 00010: val_acc did not improve from 0.68700
Epoch 11/135
182/781 [=====>........................] - ETA: 2:40 - loss: 0.3786 - acc: 0.9076


Epoch 00011: val_acc did not improve from 0.68700
Epoch 12/135


Epoch 00012: val_acc improved from 0.68700 to 0.68720, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 13/135
162/781 [=====>........................] - ETA: 2:45 - loss: 0.3541 - acc: 0.9169


Epoch 00013: val_acc did not improve from 0.68720
Epoch 14/135


Epoch 00014: val_acc improved from 0.68720 to 0.68790, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 15/135
160/781 [=====>........................] - ETA: 2:45 - loss: 0.3736 - acc: 0.9078


Epoch 00015: val_acc did not improve from 0.68790
Epoch 16/135


Epoch 00016: val_acc did not improve from 0.68790
Epoch 17/135


Epoch 00017: val_acc did not improve from 0.68790
Epoch 18/135


Epoch 00018: val_acc did not improve from 0.68790
Epoch 19/135


Epoch 00019: val_acc improved from 0.68790 to 0.68830, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 20/135
163/781 [=====>........................] - ETA: 2:45 - loss: 0.3465 - acc: 0.9191


Epoch 00020: val_acc did not improve from 0.68830
Epoch 21/135


Epoch 00021: val_acc improved from 0.68830 to 0.69570, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 22/135
160/781 [=====>........................] - ETA: 2:45 - loss: 0.3335 - acc: 0.9231


Epoch 00022: val_acc did not improve from 0.69570
Epoch 23/135


Epoch 00023: val_acc did not improve from 0.69570
Epoch 24/135


Epoch 00024: val_acc did not improve from 0.69570
Epoch 25/135


Epoch 00025: val_acc did not improve from 0.69570
Epoch 26/135


Epoch 00026: val_acc did not improve from 0.69570
Epoch 27/135


Epoch 00027: val_acc did not improve from 0.69570
Epoch 28/135


Epoch 00028: val_acc did not improve from 0.69570
Epoch 29/135


Epoch 00029: val_acc did not improve from 0.69570
Epoch 30/135


Epoch 00030: val_acc did not improve from 0.69570
Epoch 31/135


Epoch 00031: val_acc did not improve from 0.69570
Epoch 32/135


Epoch 00032: val_acc did not improve from 0.69570
Epoch 33/135


Epoch 00033: val_acc did not improve from 0.69570
Epoch 34/135


Epoch 00034: val_acc did not improve from 0.69570
Epoch 35/135


Epoch 00035: val_acc did not improve from 0.69570
Epoch 36/135


Epoch 00036: val_acc did not improve from 0.69570
Epoch 37/135


Epoch 00037: val_acc did not improve from 0.69570
Epoch 38/135


Epoch 00038: val_acc did not improve from 0.69570
Epoch 39/135


Epoch 00039: val_acc did not improve from 0.69570
Epoch 40/135


Epoch 00040: val_acc did not improve from 0.69570
Epoch 41/135


Epoch 00041: val_acc did not improve from 0.69570
Epoch 42/135


Epoch 00042: val_acc did not improve from 0.69570
Epoch 43/135


Epoch 00043: val_acc did not improve from 0.69570
Epoch 44/135


Epoch 00044: val_acc did not improve from 0.69570
Epoch 45/135


Epoch 00045: val_acc did not improve from 0.69570
Epoch 46/135


Epoch 00046: val_acc did not improve from 0.69570
Epoch 47/135


Epoch 00047: val_acc did not improve from 0.69570
Epoch 48/135


Epoch 00048: val_acc did not improve from 0.69570
Epoch 49/135


Epoch 00049: val_acc did not improve from 0.69570
Epoch 50/135


Epoch 00050: val_acc did not improve from 0.69570
Epoch 51/135


Epoch 00051: val_acc did not improve from 0.69570
Epoch 52/135


Epoch 00052: val_acc did not improve from 0.69570
Epoch 53/135


Epoch 00053: val_acc did not improve from 0.69570
Epoch 54/135


Epoch 00054: val_acc did not improve from 0.69570
Epoch 55/135


Epoch 00055: val_acc did not improve from 0.69570
Epoch 56/135


Epoch 00056: val_acc did not improve from 0.69570
Epoch 57/135


Epoch 00057: val_acc did not improve from 0.69570
Epoch 58/135


Epoch 00058: val_acc did not improve from 0.69570
Epoch 59/135


Epoch 00059: val_acc did not improve from 0.69570
Epoch 60/135


Epoch 00060: val_acc did not improve from 0.69570
Epoch 61/135


Epoch 00061: val_acc did not improve from 0.69570
Epoch 62/135


Epoch 00062: val_acc did not improve from 0.69570
Epoch 63/135


Epoch 00063: val_acc did not improve from 0.69570
Epoch 64/135


Epoch 00064: val_acc did not improve from 0.69570
Epoch 65/135


Epoch 00065: val_acc did not improve from 0.69570
Epoch 66/135


Epoch 00066: val_acc did not improve from 0.69570
Epoch 67/135


Epoch 00067: val_acc did not improve from 0.69570
Epoch 68/135


Epoch 00068: val_acc did not improve from 0.69570
Epoch 69/135


Epoch 00069: val_acc did not improve from 0.69570
Epoch 70/135


Epoch 00070: val_acc did not improve from 0.69570
Epoch 71/135


Epoch 00071: val_acc did not improve from 0.69570
Epoch 72/135


Epoch 00072: val_acc did not improve from 0.69570
Epoch 73/135


Epoch 00073: val_acc did not improve from 0.69570
Epoch 74/135


Epoch 00074: val_acc did not improve from 0.69570
Epoch 75/135


Epoch 00075: val_acc did not improve from 0.69570
Epoch 76/135


Epoch 00076: val_acc improved from 0.69570 to 0.69620, saving model to SparseNet-40-24-CIFAR100.h5
Epoch 77/135
163/781 [=====>........................] - ETA: 2:46 - loss: 0.2917 - acc: 0.9278


Epoch 00077: val_acc did not improve from 0.69620
Epoch 78/135


Epoch 00078: val_acc did not improve from 0.69620
Epoch 79/135


Epoch 00079: val_acc did not improve from 0.69620
Epoch 80/135


Epoch 00080: val_acc did not improve from 0.69620
Epoch 81/135


Epoch 00081: val_acc did not improve from 0.69620
Epoch 82/135


Epoch 00082: val_acc did not improve from 0.69620
Epoch 83/135


Epoch 00083: val_acc did not improve from 0.69620
Epoch 84/135


Epoch 00084: val_acc did not improve from 0.69620
Epoch 85/135


Epoch 00085: val_acc did not improve from 0.69620
Epoch 86/135


Epoch 00086: val_acc did not improve from 0.69620
Epoch 87/135


Epoch 00087: val_acc did not improve from 0.69620
Epoch 88/135


Epoch 00088: val_acc did not improve from 0.69620
Epoch 89/135


Epoch 00089: val_acc did not improve from 0.69620
Epoch 90/135


Epoch 00090: val_acc did not improve from 0.69620
Epoch 91/135


Epoch 00091: val_acc did not improve from 0.69620
Epoch 92/135
114/781 [===>..........................] - ETA: 2:59 - loss: 0.2680 - acc: 0.9360