<a href="https://colab.research.google.com/github/xlhaw/academic-kickstart/blob/master/SpeedingUpCNNs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Speeding up Convolutional Neural Networks

>>>![](https://cdn-images-1.medium.com/max/716/1*FjzcTRoe-R680V0hOwYo5A.png)

>>>>> From “Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition” paper

This notebook is meant to support [this](https://medium.com/@alexburlacu1996/speeding-up-neural-networks-convolutions-240beac5e30f) blog post with concrete examples.

In [0]:
import tensorflow as tf
import keras as _k

print(f"Keras version: {_k.__version__}")
print(f"Tensorflow version: {tf.__version__}")

Keras version: 2.2.4
Tensorflow version: 1.13.1


Using TensorFlow backend.


## Models

The notebook explains how different methods of speeding up the convolutional layers work and their effect on a LeNet inspired model and an All-CNN-C model.

In [0]:
import keras.backend as K
from keras.layers import Conv2D, Activation, Lambda, Conv3D, BatchNormalization


ExpandDimension = lambda axis: Lambda(lambda x: K.expand_dims(x, axis))
SqueezeDimension = lambda axis: Lambda(lambda x: K.squeeze(x, axis))

# Layers
def simple_factorized_conv(filters, kernel, *args, **kwargs):
    kwargs["activation"] = None
    def __inner(inp):
        cnn1 = Conv2D(filters, (kernel[0], 1), *args, **kwargs)(inp)
        cnn2 = Conv2D(filters, (1, kernel[1]), *args, **kwargs)(cnn1)
        
        return cnn2

    return __inner
  
def cp_decomposed_conv(filters, kernel, *args, **kwargs):
    """
    Beware, it doesn't work! It's just an aproximate demostration of how
    CP decomposition should be implemented. The issue is in the dimension matching.
    If you found the solution, feel free to comment it, either in the blogpost or here.
    You'll get full credit for your finding.
    """
    kwargs["activation"] = None
    rank = filters // 2
    d = kernel[0]
    def __inner(inp):
        first    = Conv2D(rank, kernel_size=(1, 1), **kwargs)(inp)
        
        expanded = ExpandDimension(axis=1)(first)
        mid1     = Conv3D(rank, kernel_size=(d, 1, 1), **kwargs)(expanded)
        mid2     = Conv3D(rank, kernel_size=(1, d, 1), **kwargs)(mid1)
        squeezed = SqueezeDimension(axis=1)(mid2)
        
        last     = Conv2D(filters,  kernel_size=(1, 1), **kwargs)(squeezed)
        
        return last

    return __inner

def bn_relu(layer):
    def __inner(*args, **kwargs):
        l = layer(*args, **kwargs)
        bn = BatchNormalization()(l)
        act = Activation("relu")(bn)
        
        return act # courtesy to Jiun-Kuei Jung

    return __inner


In [0]:
from keras.regularizers import l2
from keras.models import Model
from keras.layers import (
    Dense, Dropout, Conv2D, SeparableConv2D,
    MaxPool2D, Flatten, GlobalAveragePooling2D
)

class BaseExampleModel:
    def __init__(self, conv_type="std"):
        """
        :convtype: used as a flag to easily switch the Convolution layer implementation
        """
        self.conv_layer = lambda *args, **kwargs: {
                              "std": bn_relu(Conv2D(*args, **kwargs)),
                              "sep": bn_relu(SeparableConv2D(*args, **kwargs)),
                              "cp" : bn_relu(cp_decomposed_conv(*args, **kwargs)),
                              "fac": bn_relu(simple_factorized_conv(*args, **kwargs))
                          }[conv_type]
        
    def make(self, inp, nb_classes):
        """
        :inp: a reference to the Keras Input layer, to easily switch the input size
              and even the way data is fed into network
              (via standard Keras methods or via TF Dataset API)
        :nb_classes: for the final Dense layer
        """
        NotImplemented

class AllCNNLike(BaseExampleModel):
    """
    isnpired from https://arxiv.org/abs/1412.6806
    namely All-CNN-C
    """
    def __init__(self, conv_type="std"):
        super().__init__(conv_type=conv_type)
        
    def make(self, inp, nb_classes):
        conv1 = self.conv_layer(96, (3, 3), padding="same")(inp)
        conv2 = self.conv_layer(96, (3, 3), padding="same")(conv1)
        
        conv3 = bn_relu(Conv2D(96, (3, 3), padding="same", strides=(2, 2)))(conv2)
        
        conv4 = self.conv_layer(192, (3, 3), padding="same")(conv3)
        conv5 = self.conv_layer(192, (3, 3), padding="same")(conv4)
        
        conv6 = bn_relu(Conv2D(192, (3, 3), padding="same", strides=(2, 2)))(conv5)
        
        conv7 = bn_relu(Conv2D(192, (3, 3), padding="same"))(conv6)
        conv8 = bn_relu(Conv2D(192, (1, 1), padding="same"))(conv7)
        conv9 = bn_relu(Conv2D(nb_classes, (1, 1), padding="same"))(conv8)
        
        gap = GlobalAveragePooling2D()(conv9)
        final = Activation("softmax")(gap)
        
        return Model(inputs=inp, outputs=final)
      
class WiderAllCNNLike(BaseExampleModel):
    """
    isnpired from https://arxiv.org/abs/1412.6806
    namely All-CNN-C
    """
    def __init__(self, conv_type="std"):
        super().__init__(conv_type=conv_type)
        
    def make(self, inp, nb_classes):
        conv1 = self.conv_layer(192, (3, 3), padding="same")(inp)
        
        conv2 = bn_relu(Conv2D(96, (3, 3), padding="same", strides=(2, 2)))(conv1)
        
        conv3 = self.conv_layer(384, (3, 3), padding="same")(conv2)
        
        conv4 = bn_relu(Conv2D(128, (3, 3), padding="same", strides=(2, 2)))(conv3)
        
        conv5 = bn_relu(Conv2D(256, (3, 3), padding="same"))(conv4)
        conv6 = bn_relu(Conv2D(nb_classes, (1, 1), padding="same"))(conv5)
        
        gap = GlobalAveragePooling2D()(conv6)
        final = Activation("softmax")(gap)
        
        return Model(inputs=inp, outputs=final)

In [0]:
from keras.datasets import cifar100
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import to_categorical

BATCH_SIZE = 32

def preprocess_data(pair):
    x, y = pair
    return x / 255., to_categorical(y)

(x_train, y_train), (x_test, y_test) = map(preprocess_data, cifar100.load_data())

train_gen = ImageDataGenerator().flow(x_train, y_train, batch_size=BATCH_SIZE)
test_gen = ImageDataGenerator().flow(x_test, y_test, batch_size=BATCH_SIZE)

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz


In [0]:
from keras.metrics import categorical_accuracy, top_k_categorical_accuracy
from keras.optimizers import  SGD
from keras.layers import Input

def main(conv_type="std", verbose=False, net_type="simple"):
    inp = Input(shape=(32, 32, 3))
    net = {
        "simple": AllCNNLike(conv_type=conv_type).make(inp, nb_classes=100),
        "wide": WiderAllCNNLike(conv_type=conv_type).make(inp, nb_classes=100)
    }[net_type]

    net.compile(SGD(lr=0.01, decay=0.001, momentum=0.8), "categorical_crossentropy",
                 metrics=[categorical_accuracy, top_k_categorical_accuracy])

    if verbose:
        print(net.summary())

    net.fit_generator(train_gen, steps_per_epoch=50000 // BATCH_SIZE, epochs=5, verbose=verbose)
    loss, acc, topk_acc = net.evaluate_generator(test_gen, steps=10000 // BATCH_SIZE)

    print(f"Accuracy: {acc}, Top5 Accuracy {topk_acc} and Loss {loss}.")

## Standard Convolutions

In [0]:
# Run it!
main("std", verbose=True)

Instructions for updating:
Colocations handled automatically by placer.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 32, 32, 96)        2688      
_________________________________________________________________
batch_normalization_1 (Batch (None, 32, 32, 96)        384       
_________________________________________________________________
activation_1 (Activation)    (None, 32, 32, 96)        0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 32, 32, 96)        83040     
_________________________________________________________________
batch_normalization_2 (Batch (None, 32, 32, 96)        384       
_________________________________________________________________
acti

## Simply Factorized Convolutions

In [0]:
# You know what to do ;)
main("fac", verbose=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_17 (Conv2D)           (None, 32, 32, 96)        960       
_________________________________________________________________
conv2d_18 (Conv2D)           (None, 32, 32, 96)        27744     
_________________________________________________________________
batch_normalization_16 (Batc (None, 32, 32, 96)        384       
_________________________________________________________________
activation_18 (Activation)   (None, 32, 32, 96)        0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 32, 32, 96)        27744     
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 32, 32, 96)        27744     
__________

## Dephwise Separable Convolutions

In [0]:
# Push that button!
main("sep", verbose=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_3 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
separable_conv2d_13 (Separab (None, 32, 32, 96)        411       
_________________________________________________________________
batch_normalization_31 (Batc (None, 32, 32, 96)        384       
_________________________________________________________________
activation_35 (Activation)   (None, 32, 32, 96)        0         
_________________________________________________________________
separable_conv2d_14 (Separab (None, 32, 32, 96)        10176     
_________________________________________________________________
batch_normalization_32 (Batc (None, 32, 32, 96)        384       
_________________________________________________________________
activation_36 (Activation)   (None, 32, 32, 96)        0         
__________

## Wide Standard Convolutional Neural Network

In [0]:
main("std", net_type="wide", verbose=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_7 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_114 (Conv2D)          (None, 32, 32, 192)       5376      
_________________________________________________________________
batch_normalization_92 (Batc (None, 32, 32, 192)       768       
_________________________________________________________________
activation_104 (Activation)  (None, 32, 32, 192)       0         
_________________________________________________________________
conv2d_115 (Conv2D)          (None, 16, 16, 96)        165984    
_________________________________________________________________
batch_normalization_93 (Batc (None, 16, 16, 96)        384       
_________________________________________________________________
activation_105 (Activation)  (None, 16, 16, 96)        0         
__________

## Wide Separable Convolutional Neural Network

In [0]:
main("sep", net_type="wide", verbose=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_4 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
separable_conv2d_23 (Separab (None, 32, 32, 192)       795       
_________________________________________________________________
batch_normalization_55 (Batc (None, 32, 32, 192)       768       
_________________________________________________________________
activation_62 (Activation)   (None, 32, 32, 192)       0         
_________________________________________________________________
conv2d_68 (Conv2D)           (None, 16, 16, 96)        165984    
_________________________________________________________________
batch_normalization_56 (Batc (None, 16, 16, 96)        384       
_________________________________________________________________
activation_63 (Activation)   (None, 16, 16, 96)        0         
__________

## Wide Simply Factorized Neural Network

In [0]:
main("fac", net_type="wide", verbose=True)

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_5 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_91 (Conv2D)           (None, 32, 32, 192)       1920      
_________________________________________________________________
conv2d_92 (Conv2D)           (None, 32, 32, 192)       110784    
_________________________________________________________________
batch_normalization_70 (Batc (None, 32, 32, 192)       768       
_________________________________________________________________
activation_79 (Activation)   (None, 32, 32, 192)       0         
_________________________________________________________________
conv2d_93 (Conv2D)           (None, 16, 16, 96)        165984    
_________________________________________________________________
batch_normalization_71 (Batc (None, 16, 16, 96)        384       
__________

## Failures aside, now a good architecture

In [0]:
class AdequatelyDesignedAllCNNLike:
    """
    isnpired from https://arxiv.org/abs/1412.6806
    namely All-CNN-C
    """
    def __init__(self):
        pass
        
    def make(self, inp, nb_classes):
        # the chunk below can be represented as a single layer with 5x5 kernels
        conv1 = Conv2D(96, (3, 3), padding="same")(inp)
        bn1   = BatchNormalization()(conv1)
        act1  = Activation("relu")(bn1)
        conv2 = Conv2D(96, (3, 3), padding="same")(act1)
        bn2   = BatchNormalization()(conv2)
        act2  = Activation("relu")(bn2)
        
        conv3 = Conv2D(96, (3, 3), padding="same", strides=(2, 2))(act2)
        bn3   = BatchNormalization()(conv3)
        act3  = Activation("relu")(bn3)
        
        conv4 = SeparableConv2D(256, (3, 3), padding="same")(act3)
        bn4   = BatchNormalization()(conv4)
        act4  = Activation("relu")(bn4)
        
        conv5 = Conv2D(192, (3, 3), padding="same", strides=(2, 2))(act4)
        bn5   = BatchNormalization()(conv5)
        act5  = Activation("relu")(bn5)
        
        conv6 = SeparableConv2D(256, (3, 3), padding="same")(act5)
        bn6   = BatchNormalization()(conv6)
        act6  = Activation("relu")(bn6)
        
        conv7 = Conv2D(nb_classes, (1, 1), padding="same")(act6)
        bn7   = BatchNormalization()(conv7)
        act7  = Activation("relu")(bn7)
        
        gap = GlobalAveragePooling2D()(act7)
        final = Activation("softmax")(gap)
        
        return Model(inputs=inp, outputs=final)

In [0]:
inp = Input(shape=(32, 32, 3))
net = AdequatelyDesignedAllCNNLike().make(inp, 100)
net.compile(SGD(lr=0.01, decay=0.001, momentum=0.8), "categorical_crossentropy",
             metrics=[categorical_accuracy, top_k_categorical_accuracy])


print(net.summary())

net.fit_generator(train_gen, steps_per_epoch=50000 // BATCH_SIZE, epochs=5)
loss, acc, topk_acc = net.evaluate_generator(test_gen, steps=10000 // BATCH_SIZE)

print(f"Accuracy: {acc}, Top5 Accuracy {topk_acc} and Loss {loss}.")

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_6 (InputLayer)         (None, 32, 32, 3)         0         
_________________________________________________________________
conv2d_100 (Conv2D)          (None, 32, 32, 96)        2688      
_________________________________________________________________
batch_normalization_76 (Batc (None, 32, 32, 96)        384       
_________________________________________________________________
activation_86 (Activation)   (None, 32, 32, 96)        0         
_________________________________________________________________
conv2d_101 (Conv2D)          (None, 32, 32, 96)        83040     
_________________________________________________________________
batch_normalization_77 (Batc (None, 32, 32, 96)        384       
_________________________________________________________________
activation_87 (Activation)   (None, 32, 32, 96)        0         
__________

## Final notes

As you can see, it is indeed possible with slight alterations to speed up a convolutional network like All-CNN-C by ~7% (See Wide Standard Convolution) with small drop in accuracy.
Moreover, with a better-designed network (See AdequatelyDesignedAllCNNLike model), the speedup is considerable (~ 20%) and the accuracy drop is still small!

Anyway, remember that there's no silver bullet and you should always experiment in order to get satisfying results.

Hopefully, the methods showed above and described in the blog post will help.