# Use pre-trained network to compute bottlenecks

We will use a CNN pre-trained on the Imagenet challenge to compute the bottlenecks of our images.

Bottleneck is just a way to name the output of the last convolutional layer of a convolutional neural network.

By precomputing this outputs we will be able to try (with very small computational cost) different models on top of the bottlenecks.

## Create resnet50 body

In [1]:
import sys

In [2]:
sys.path.append("D:\\GitHub\\models")

Check the resnet50.py module

In [3]:
from resnet50 import ResNet50

Using TensorFlow backend.


Download the weights from [this link](https://github.com/fchollet/deep-learning-models/releases/download/v0.2/resnet50_weights_tf_dim_ordering_tf_kernels_notop.h5)

You can increase the input_shape (probably will lead to better accuracy) according to your GPU capacity.

I have a 4G gpu.

In [4]:
%%time
body = ResNet50(input_shape=(300,300,3), weights_path="D:/GitHub/models/resnet50_body.h5")

Wall time: 15.6 s


## Add G.A.P to the body

In [6]:
from keras.models import Model
from keras.layers import GlobalAveragePooling2D

In [7]:
head = body.output
head = GlobalAveragePooling2D()(head)

In [8]:
model = Model(body.input, head)

In [9]:
model.summary()

____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
input_1 (InputLayer)             (None, 300, 300, 3)   0                                            
____________________________________________________________________________________________________
lambda_1 (Lambda)                (None, 300, 300, 3)   0           input_1[0][0]                    
____________________________________________________________________________________________________
zeropadding2d_1 (ZeroPadding2D)  (None, 306, 306, 3)   0           lambda_1[0][0]                   
____________________________________________________________________________________________________
conv1 (Convolution2D)            (None, 150, 150, 64)  9472        zeropadding2d_1[0][0]            
___________________________________________________________________________________________

## Generate Train / Val / Test bottlenecks

In [10]:
model.input_shape[1:3], model.output_shape

((300, 300), (None, 2048))

In [11]:
from keras.preprocessing.image import ImageDataGenerator

In [12]:
gen = ImageDataGenerator()

In [13]:
train_batches = gen.flow_from_directory("train", model.input_shape[1:3], shuffle=False, batch_size=8)
valid_batches = gen.flow_from_directory("valid", model.input_shape[1:3], shuffle=False, batch_size=8)
test_batches = gen.flow_from_directory("test", model.input_shape[1:3], shuffle=False, batch_size=8, class_mode=None)

Found 23000 images belonging to 2 classes.
Found 2000 images belonging to 2 classes.
Found 12500 images belonging to 1 classes.


## Generate the bottlenecks

In [14]:
%%time
train_bottleneck = model.predict_generator(train_batches, train_batches.nb_sample)

Wall time: 12min 8s


In [15]:
%%time
valid_bottleneck = model.predict_generator(valid_batches, valid_batches.nb_sample)

Wall time: 1min 2s


In [16]:
%%time
test_bottleneck = model.predict_generator(test_batches, test_batches.nb_sample)

Wall time: 6min 27s


## Save bottlenecks

In [17]:
import h5py

In [18]:
with h5py.File("300_bottlenecks.h5") as hf:
    hf.create_dataset("train", data=train_bottleneck)
    hf.create_dataset("valid", data=valid_bottleneck)
    hf.create_dataset("test", data=test_bottleneck)

## Save labels

In [19]:
from keras.utils.np_utils import to_categorical

In [20]:
with h5py.File("labels.h5") as hf:
    hf.create_dataset("train", data=to_categorical(train_batches.classes))
    hf.create_dataset("valid", data=to_categorical(valid_batches.classes))