# Enter State Farm

In [1]:
%matplotlib inline

import utils;
from utils import *
from IPython.display import FileLink

Using TensorFlow backend.
  return f(*args, **kwds)


In [2]:
batch_size = 64

In [3]:
%pwd

'/home/ubuntu/kaggle/state-farm-driver-detection/code'

In [4]:
path = "../input/" # "../input/sample/"

## Setup batches

In [5]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size)

Found 17940 images belonging to 10 classes.
Found 4484 images belonging to 10 classes.


In [6]:
trn_classes, val_classes, trn_labels, val_labels, filenames, val_filenames, test_filenames = get_classes(path)

Found 17940 images belonging to 10 classes.
Found 4484 images belonging to 10 classes.
Found 79726 images belonging to 1 classes.


Rather than using batches, we could just import all the data into an array to save some processing time. (In most examples I'm using the batches, however - just because that's how I happened to start out.)

In [7]:
trn = get_data(path+'train', target_size=(256, 256))
val = get_data(path+'valid', target_size=(256, 256))

Found 17940 images belonging to 10 classes.
Found 4484 images belonging to 10 classes.


In [8]:
save_array(path+'results/trn', trn)
save_array(path+'results/val', val)

In [9]:
trn = load_array(path+'results/trn')
val = load_array(path+'results/val')

## Re-run sample experiments on full dataset

We should find that everything that worked on the sample (see statefarm-sample.ipynb), works on the full dataset too. Only better! Because now we have more data. So let's see how they go - the models in this section are exact copies of the sample notebook models.

### Single conv layer

2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:

In [10]:
def conv1(batches):
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 256, 256)),
        Conv2D(32, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3), strides=(3,3)),
        Conv2D(64, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3), strides=(3,3)),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
    
    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))
    model.optimizer.lr=0.001
    model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=5, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))
    
    return model

In [None]:
model = conv1(batches)

Epoch 1/2
 - 199s - loss: 0.2477 - acc: 0.9371 - val_loss: 0.9141 - val_acc: 0.7310
Epoch 2/2
 - 195s - loss: 0.0141 - acc: 0.9988 - val_loss: 0.0302 - val_acc: 0.9967
Epoch 1/5
 - 195s - loss: 0.0048 - acc: 0.9998 - val_loss: 0.0132 - val_acc: 0.9973
Epoch 2/5
 - 195s - loss: 0.0025 - acc: 0.9999 - val_loss: 0.0110 - val_acc: 0.9975
Epoch 3/5
 - 195s - loss: 0.0016 - acc: 1.0000 - val_loss: 0.0102 - val_acc: 0.9975
Epoch 4/5
 - 195s - loss: 0.0011 - acc: 1.0000 - val_loss: 0.0102 - val_acc: 0.9980
Epoch 5/5
 - 195s - loss: 8.7726e-04 - acc: 1.0000 - val_loss: 0.0095 - val_acc: 0.9980


Interestingly, with no regularization or augmentation we're getting some reasonable results from our simple convolutional model. So with augmentation, we hopefully will see some very good results.

### Data augmentation

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
model = conv1(batches)

Found 17940 images belonging to 10 classes.
Epoch 1/2
 - 353s - loss: 1.3468 - acc: 0.5600 - val_loss: 1.5584 - val_acc: 0.4077
Epoch 2/2
 - 345s - loss: 0.7045 - acc: 0.7847 - val_loss: 0.3522 - val_acc: 0.9195
Epoch 1/5


In [None]:
model.optimizer.lr=0.0001
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=5, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))    

I'm shocked by *how* good these results are! We're regularly seeing 75-80% accuracy on the validation set, which puts us into the top third or better of the competition. With such a simple model and no dropout or semi-supervised learning, this really speaks to the power of this approach to data augmentation.

### Four conv/pooling pairs + dropout

Unfortunately, the results are still very unstable - the validation accuracy jumps from epoch to epoch. Perhaps a deeper model with some dropout would help.

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)

In [None]:
model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 256, 256)),
        Conv2D(32, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((2,2)),
        Conv2D(64, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((2,2)),
        Conv2D(128, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((2,2)),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dropout(0.5),
        Dense(10, activation='softmax')
    ])

In [None]:
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

In [None]:
model.optimizer.lr = 0.001

In [None]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=5, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

This is looking quite a bit better - the accuracy is similar, but the stability is higher. There's still some way to go however...

### Imagenet conv features

Since we have so little data, and it is similar to imagenet images (full color photos), using pre-trained VGG weights is likely to be helpful - in fact it seems likely that we won't need to fine-tune the convolutional layer weights much, if at all. So we can pre-compute the output of the last convolutional layer, as we did in lesson 3 when we experimented with dropout. (However this means that we can't use full data augmentation, since we can't pre-compute something that changes every image.)

In [None]:
import vgg16bn_ted
from vgg16bn_ted import Vgg16BN

In [None]:
vgg = Vgg16BN(size=(256, 256))
#vgg = Vgg16BN()
model = vgg.model
last_conv_idx = [idx for idx, layer in enumerate(model.layers) if type(layer)==Conv2D][-1]
conv_layers = model.layers[:last_conv_idx+1]

In [None]:
conv_model = Sequential(conv_layers)

In [None]:
# batches shuffle must be set to False when pre-computing features
batches = get_batches(path+'train', shuffle=False, batch_size=batch_size)
val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size)
test_batches = get_batches(path+'test', shuffle=False, batch_size=batch_size)

In [None]:
trn_classes, val_classes, trn_labels, val_labels, filenames, val_filenames, test_filenames = get_classes(path)

In [None]:
conv_feature = conv_model.predict_generator(batches, steps=ceil(batches.n/batches.batch_size))
conv_val_feature = conv_model.predict_generator(val_batches, steps=ceil(val_batches.n/val_batches.batch_size))
conv_test_feature = conv_model.predict_generator(test_batches, steps=ceil(test_batches.n/test_batches.batch_size))

In [None]:
save_array(path+'results/conv_feature.dat', conv_feature)
save_array(path+'results/conv_val_feature.dat', conv_val_feature)
save_array(path+'results/conv_test_feature.dat', conv_test_feature)

In [None]:
conv_feature = load_array(path+'results/conv_feature.dat')
conv_val_feature = load_array(path+'results/conv_val_feature.dat')
conv_test_feature = load_array(path+'results/conv_test_feature.dat')

### Batchnorm dense layers on pretrained conv layers

Since we've pre-computed the output of the last convolutional layer, we need to create a network that takes that as input, and predicts our 10 classes. Let's try using a simplified version of VGG's dense layers.

In [8]:
def get_bn_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p * 0.5),
        Desne(128, activation='relu'),
        BatchNormalization(),
        Dropout(p * 0.5),
        Desne(128, activation='relu'),
        BatchNormalization(),
        Dropout(p * 0.5),
        Dense(10, activation='softmax')
    ]

In [9]:
p = 0.8

In [10]:
bn_model = Sequential(get_bn_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

NameError: name 'conv_layers' is not defined

In [None]:
bn_model.fit(conv_features, trn_labels, batch_size=batch_size, epochs=2,
            validation_data=(conv_val_features, val_labels))

In [None]:
bn_model.save_weights(path+'models/conv8.h5')

Looking good! Let's try pre-computing 5 epochs worth of augmented data, so we can experiment with combining dropout and augmentation on the pre-trained model.

### Pre-computed data augmentation + dropout

We'll use our usual data augmentation parameters:

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
da_batches = get_batches(path+'train', gen_t, batch_size=batch_size)

We use those to create a dataset of convolutional features 5x bigger than the training set.

In [None]:
da_conv_features = conv_model.predict_generator(da_batches, da_batches.samples * 5)

In [109]:
save_array(path+'results/da_conv_feature1.dat', da_conv_features)

In [78]:
da_conv_features = load_array(path+'results/da_conv_feature1.dat')

Let's include the real training data as well in its non-augmented form.

In [None]:
da_conv_features = np.concatenate([da_conv_features, conv_features])

Since we've now got a dataset 6x bigger than before, we'll need to copy our labels 6 times too.

In [None]:
da_trn_labels = np.concatenate([trn_labels]*6)

Based on some experiments the previous model works well, with bigger dense layers.

In [None]:
def get_bn_da_layers(p):
    return [
        MaxPooling2D(input_shape=conv_layers[-1].output_shape[1:]),
        Flatten(),
        Dropout(p * 0.5),
        Desne(256, activation='relu'),
        BatchNormalization(),
        Dropout(p * 0.5),
        Desne(256, activation='relu'),
        BatchNormalization(),
        Dropout(p * 0.5),
        Dense(10, activation='softmax')
    ]

In [None]:
p = 0.8

In [None]:
bn_model = Sequential(get_bn_da_layers(p))
bn_model.compile(Adam(lr=0.001), loss='categorical_crossentropy', metrics=['accuracy'])

Now we can train the model as usual, with pre-computed augmented data.

In [None]:
bn_model.fit(da_conv_features, da_trn_labels, batch_size=batch_size, epochs=2,
            validation_data=(conv_val_features, val_labels))

In [None]:
bn_model.optimizer.lr=0.01

In [None]:
bn_model.fit(da_conv_features, da_trn_labels, batch_size=batch_size, epochs=4,
            validation_data=(conv_val_features, val_labels))

In [None]:
bn_model.optimizer.lr=0.0001

In [None]:
bn_model.fit(da_conv_features, da_trn_labels, batch_size=batch_size, epochs=4,
            validation_data=(conv_val_features, val_labels))

In [None]:
bn_model.save_weights(path+'models/da_conv8_1.h5')

### Pseudo labeling

We're going to try using a combination of [pseudo labeling](http://deeplearning.net/wp-content/uploads/2013/03/pseudo_label_final.pdf) and [knowledge distillation](https://arxiv.org/abs/1503.02531) to allow us to use unlabeled data (i.e. do semi-supervised learning). For our initial experiment we'll use the validation set as the unlabeled data, so that we can see that it is working without using the test set. At a later date we'll try using the test set.

To do this, we simply calculate the predictions of our model...

In [None]:
val_pseudo = bn_model.predict(conv_val_feature, batch_size=batch_size)

...concatenate them with our training labels...

In [None]:
comb_pseudo = np.concatenate([da_trn_labels, val_pseudo])

In [None]:
comb_features = np. concatenate([da_conv_features, conv_val_features])

...and fine-tune our model using that data.

In [None]:
bn_model.load_weights(path+'models/da_conv8_1.h5')

In [None]:
bn_model.fit(comb_features, comb_pseudo, batch_size=batch_size, epochs=2,
            validation_data=(conv_val_features, val_labels))

In [None]:
bn_model.fit(comb_features, comb_pseudo, batch_size=batch_size, epochs=4,
            validation_data=(conv_val_features, val_labels))

In [None]:
bn_model.optimizer.lr=0.00001

In [None]:
bn_model.fit(comb_features, comb_pseudo, batch_size=batch_size, epochs=4,
            validation_data=(conv_val_features, val_labels))

That's a distinct improvement - even although the validation set isn't very big. This looks encouraging for when we try this on the test set.

In [None]:
bn_model.save_weights(path+'models/bn-ps8.h5')