### CNN Case Study: Google Inception Net

AlexNet has 8 layers, VGGNet has 19 layers, while Inception Net has 22 layers. 1/12 size of parameters compared with that in AlexNet. Why less parameters?
* More parameters, more complicated the model, which needs more training data, but high quality data is not that readily available.
* More parameters, more expensive computing resources required.<br>

Features of Inception Net:
* Less parameters
* Dropped the last FC layer, and repalced with average-pooling layer (90% of parameters cames from FC layer from AlexNet or VGGNet, which also results in over-fitting)  
* Inception Module make parameters more efficient (Network in Network idea, NIN)

NIN idea: Smaller network forms a modual, while the larger network can be formed by stacking lots of moduals. Inception Net added branch network, but NIN formed as cascade of conv-layers.<br>
In general, to optimize the performance of the conv-layer by increasing output channels, since one filter or kernel only extract one certain feature, but increased channels will also bring large computations and overfits.<br>

MLPConv from NIN allows to integrate infomation between the channels, which equals to a conv-layer followed by a 1 by 1 conv-layer with ReLU activation.
Inception Module was designed more complicated: with 4 branches as figure shows below.

![title](Inception Net.png)

1 by 1 convolution decreased the time consume compared with 3 by 3, therefore, lower computational resources brings one layer of feature transformation and nonlinearization. In this structure, Inception Module makes networks expand in width and depth, which increases the accuracy but prevented from overfitting.

Brain neurons are connected sparsely, and the researchers pointed out that a larger and deeper network should also be sparse, which not only lower the overfit but also lower the computation. Inception Net aims to find out the most optimized Inception Module, whose structure are based on Hebbian princeple (Cells that fire together, wire together). Therefore, highly-correlated nodes connected to form a sparse net.<br>

Since we have more kernels, the outputs were high-correlated at the same place but different channel. 1 by 1 convolution can integrate the features at same spatical place but different channels, and this is also the evidence why 1 by 1 conv was used so frequently in the Inception Nets.

So far, the highly correlated nodes are connected by different-sized conv-layers (1-1, 3-3, 5-5) in 4 branches of Inception Module, and build an efficient sparse structure based on Hebbian princeple.

Inception Nets has 22 layers, besides the last output layer, it also equiped with auxiliary classifiers which considered a certain layer as classification results and then weighted with 0.3 into final classification results. In this way, it can be considered as model merging, gradients info of back-prop, and extra regularization. It is good for training.

Google Inception Nets are a big family:
1. Inception V1 (6.67%)
2. Inception V2 with Batch Normalization (4.8%)
3. Inception V3 with 43 layers (3.5%)
4. Inception V4 combined with ResNet from Microsoft (3.08%)

   * Batch Normalization: Normalized each mini-batch data in order to make input follow N(1,0) normal distribution, which accelerate the training. Dropout can be eliminated by using BN for network simplification.
   * Batch Normalization commonly comes with:
       * amplified Learning Rate and Learning Decay
       * remove Dropout and lighten L2 normalization
       * remove LRN, shuffle the training samples completely
       * restrict data augmentation <br>
   * By using these hints, V2 trains 14 times faster than that of V1

3 types of structures of Inception Module has been used in Inception V3, and contrib.slim in Tensorflow can be helpful to implement V3.

In [2]:
import tensorflow as tf
slim = tf.contrib.slim
trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev)

def inception_v3_params(weight_decay=0.00004,
                        stddev=0.1,
                        batch_norm_var_collection='moving_vars'):
    
    batch_norm_params = {'decay':0.9997,
                         'epsilon':0.001,
                         'updates_collections':tf.GraphKeys.UPDATE_OPS,
                         'variables_collections':{'beta': None,
                                                  'gama': None,
                                                  'moving_mean':[batch_norm_var_collection],
                                                  'moving_variance':[batch_norm_var_collection] } }
    
    with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)):
        with slim.arg_scope([slim.conv2d],
                           weights_initializer=tf.truncated_normal_initializer(stddev=stddev),
                           activation_fn=tf.nn.relu,
                           normalizer_fn=slim.batch_norm,
                           normalizer_params=batch_norm_params) as sc:
            return sc
        

def inception_v3_base(inputs, scope=None):
    end_points = {}
    
    with tf.variable_scope(scope, 'InceptionV3',[inputs]):      # Non-Inception Module
        with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='VALID'):
            net = slim.conv2d( inputs,   32, [3,3], stride=2,       scope='Conv_2d_1a_3x3')
            net = slim.conv2d(    net,   32, [3,3],                 scope='Conv2d_2a_3*3' )
            net = slim.conv2d(    net,   64, [3,3], padding='SAME', scope='Conv2d_2b_3*3' )
            net = slim.max_pool2d(net,       [3,3], stride=2,       scope='MaxPool_3a_3*3')
            net = slim.conv2d(    net,   80, [1,1],                 scope='Conv2d_3b_1*1' )
            net = slim.conv2d(    net,  192, [3,3],                 scope='Conv2d_4a_3*3' )
            net = slim.max_pool2d(net,       [3,3], stride=2,       scope='MaxPool_5a_3*3')
    
    with slim.arg_arg_scope([slim.conv2, slim,max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'):
        
        with tf.variable_scope('Mixed_5b'):  # First Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,48,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,64,[5,5],scope='Conv2d_0b_5*5')                                           
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,96,[3,3],scope='Conv2d_0b_3*3')
                branch_2 = slim.conv2d(branch_2,96,[3,3],scope='Conv2d_0c_3*3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,32,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_5c'):  # Second Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,48,[1,1],scope='Conv2d_0b_1*1')
                branch_1 = slim.conv2d(branch_1,64,[5,5],scope='Conv_1_0c_5*5')                                           
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,96,[3,3],scope='Conv2d_0b_3*3')
                branch_2 = slim.conv2d(branch_2,96,[3,3],scope='Conv2d_0c_3*3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,64,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_5d'):  # Third Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,48,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,64,[5,5],scope='Conv_1_0b_5*5')                                           
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,96,[3,3],scope='Conv2d_0b_3*3')
                branch_2 = slim.conv2d(branch_2,96,[3,3],scope='Conv2d_0c_3*3')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,64,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_6a'):  # First Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,384,[3,3],stride=2, padding='VALID', scope='Conv2d_1a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,64,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,96,[3,3],scope='Conv2d_0b_3*3')
                branch_1 = slim.conv2d(branch_1,96,[3,3],stride=2,padding='VALID',scope='Conv2d_1a_1*1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.max_pool2d(net,[3,3],stride=2,padding='VALID',scope='MaxPool_1a_3*3')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_6b'):  # Second Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,128,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,128,[1,7],scope='Conv2d_0b_1*7')
                branch_1 = slim.conv2d(branch_1,192,[7,1],scope='Conv2d_0c_7*1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,128,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,128,[7,1],scope='Conv2d_0b_7*1')
                branch_2 = slim.conv2d(branch_2,128,[1,7],scope='Conv2d_0c_1*7')
                branch_2 = slim.conv2d(branch_2,128,[7,1],scope='Conv2d_0d_7*1')
                branch_2 = slim.conv2d(branch_2,192,[1,7],scope='Conv2d_0e_1*7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,192,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_6c'):  # Third Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,160,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,160,[1,7],scope='Conv2d_0b_1*7')
                branch_1 = slim.conv2d(branch_1,192,[7,1],scope='Conv2d_0c_7*1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,160,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,160,[7,1],scope='Conv2d_0b_7*1')
                branch_2 = slim.conv2d(branch_2,160,[1,7],scope='Conv2d_0c_1*7')
                branch_2 = slim.conv2d(branch_2,160,[7,1],scope='Conv2d_0d_7*1')
                branch_2 = slim.conv2d(branch_2,192,[1,7],scope='Conv2d_0e_1*7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,192,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_6d'):  # Fourth Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,160,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,160,[1,7],scope='Conv2d_0b_1*7')
                branch_1 = slim.conv2d(branch_1,192,[7,1],scope='Conv2d_0c_7*1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,160,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,160,[7,1],scope='Conv2d_0b_7*1')
                branch_2 = slim.conv2d(branch_2,160,[1,7],scope='Conv2d_0c_1*7')
                branch_2 = slim.conv2d(branch_2,160,[7,1],scope='Conv2d_0d_7*1')
                branch_2 = slim.conv2d(branch_2,192,[1,7],scope='Conv2d_0e_1*7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,192,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_6e'):  # Fifth Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,192,[1,7],scope='Conv2d_0b_1*7')
                branch_1 = slim.conv2d(branch_1,192,[7,1],scope='Conv2d_0c_7*1')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,192,[7,1],scope='Conv2d_0b_7*1')
                branch_2 = slim.conv2d(branch_2,192,[1,7],scope='Conv2d_0c_1*7')
                branch_2 = slim.conv2d(branch_2,192,[7,1],scope='Conv2d_0d_7*1')
                branch_2 = slim.conv2d(branch_2,192,[1,7],scope='Conv2d_0e_1*7')
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,192,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
        end_points['Mixed_6e'] = net # as Auxiliary Classifier
        
        with tf.variable_scope('Mixed_7a'):  # First Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
                branch_0 = slim.conv2d(Branch_0,320,[3,3],stride=2,padding='VALID',scope='Conv2d_1a_3*3')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,192,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = slim.conv2d(branch_1,192,[1,7],scope='Conv2d_0b_1*7')
                branch_1 = slim.conv2d(branch_1,192,[7,1],scope='Conv2d_0c_7*1')
                branch_1 = slim.conv2d(branch_1,192,[3,3],stride=2, paddinf='VALID',scope='Conv2d_1a_3*3')
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.max_pool2d(net,[3,3],stride=2, padding='VALID',scope='MaxPool_1a_3*3')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_7b'):  # Second Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,320,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,384,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = tf.concat([
                    slim.conv2d(branch_1,384,[1,3],scope='Conv2d_0b_1*3'),
                    slim.conv2d(branch_1,384,[3,1],scope='Conv2d_0b_3*1')],3)
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,448,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,384,[3,3],scope='Conv2d_0b_3*3')
                branch_2 = tf.concat([
                    slim.conv2d(branch_2,384,[1,3],scope='Conv2d_0c_1*3'),
                    slim.conv2d(branch_2,384,[3,1],scope='Conv2d_0d_3*1')],3)
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,192,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
            
        with tf.variable_scope('Mixed_7c'):  # Third Inception Module
            with tf.variable_scope('Branch_0'):
                branch_0 = slim.conv2d(net,320,[1,1],scope='Conv2d_0a_1*1')
            with tf.variable_scope('Branch_1'):
                branch_1 = slim.conv2d(net,384,[1,1],scope='Conv2d_0a_1*1')
                branch_1 = tf.concat([
                    slim.conv2d(branch_1,384,[1,3],scope='Conv2d_0b_1*3'),
                    slim.conv2d(branch_1,384,[3,1],scope='Conv2d_0b_3*1')],3)
            with tf.variable_scope('Branch_2'):
                branch_2 = slim.conv2d(net,448,[1,1],scope='Conv2d_0a_1*1')
                branch_2 = slim.conv2d(branch_2,384,[3,3],scope='Conv2d_0b_3*3')
                branch_2 = tf.concat([
                    slim.conv2d(branch_2,384,[1,3],scope='Conv2d_0c_1*3'),
                    slim.conv2d(branch_2,384,[3,1],scope='Conv2d_0d_3*1')],3)
            with tf.variable_scope('Branch_3'):
                branch_3 = slim.avg_pool2d(net,[3,3],scope='AvgPool_0a_3*3')
                branch_3 = slim.conv2d(branch_3,192,[1,1],scope='Conv2d_0b_1*1')
            net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3)
        
        return net, end_points
    

Review the structure of the Inception V3:
    * 5 Conv-layer + 2 Pool-layer
    * Inception Module 1
    * Inception Module 2
    * Inception Module 3
Input $229*229$ -> Output $8*8$ <br>
Channel: 3(RGB) -> 2048<br>

In order to enhance the ability of expression:
* Branch 1: combining simple featue abstraction;
* Branch 2: combining more complex feature abstraction
* Branch 3: combining more complex feature abstraction
* Branch 4: pooling layer<br>

All together, 4 kinds of feature abstraction can selectively retain features, and it can enrich the expression of the network.

![title](Inception V3.png)

In [3]:
def inception_v3(inputs,
                num_classes=1000,
                is_training=True,
                dropout_keep_prob=0.8,
                prediction_fn=slim.softmax,
                spatial_squeeze=True,
                reuse=None,
                scope='InceptionV3'):
    
    with tf.variable_scope(scope,'InceptionV3',[inputs, num_classes],reuse=reuse) as scope:
        with slim.arg_scope([slim.batch_norm, slim.dropout],is_training=is_training):
            net, end_points = inception_v3_base(input, scope=scope)
            
            with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d],stride=1, padding='SAME'):
                aux_logits=end_points['Mixed_6e']
                
                with tf.variable_scope('AugLogits'):
                    aux_logits = slim.avg_pool2d(aux_logits,[5,5],stride=3,padding='VALID',scope='AvgPool_1a_5*5')
                    aux_logits = slim.conv2d(aux_logits,128,[1,1],scope='Conv2d_1b_1*1')
                    aug_logits = slim.conv2d(aux_logits,768,[5,5],weights_initializer=trunc_normal(0.01),
                                             padding='VALID',scope='Conv2d_2a_5*5')
                    aug_logits = slim.conv2d(aux_logits,num_classes,[1,1],activation_fn=None, normalizer_fn=None,
                                             weights_initializer=trunc_normal(0.001),scope='Conv2d_2b_1*1')
                    if spatial_squeeze:
                        aug_logits = tf.squeeze(aux_logits,[1,2],name='SpatialSqueeze')
                    end_points['AuxLogits'] = aug_logits
                    
                with tf.variable_scope('Logits'):
                    net = slim.avg_pool2d(net,[8,8],padding='VALID',scope='AvgPool_1a_8*8')
                    net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b')
                    end_points['PreLogits'] = net
                    logits = slim.conv2d(net,num_classes,[1,1],activation_fn=None,normalizer_fn=None,scope='Conv2d_1c_1*1')
                    
                    if spatial_squeeze:
                        logits = tf.squeeze(logits,[1,2],name='SpatialSqueeze')
                end_points['Logits'] = logits
                end_points['Predictions'] = prediction_fn(logits, scope='Predictions')
    return logits, end_points

In [None]:
batch_size = 32
height, width = 299, 299
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(inception_v3_params()):
    logits, end_points = inception_v3(inputs, is_training=False)
    
init = tf.global_varibales_initializer()
sess = tf.Session()
sess.run(init)
num_batches = 100
time_tensorflow_run(sess, logits, "Forward")

Less amount of the computation enable the widespread usage of the Inception nets. In symmary:
1. Factorization into small convolutions is efficient, and it lower the parameters, eliminates the overfit, and enhance non-lineaity of the network.
2. From input to output, the size would be decreased but the channel would be increased, which converts the spatial information into high-order features.
3. Inception Module is efficient and collect different high order features.