**What I want to do: **Play the statefarm distracted driver competition on kaggle by fine-tuning a pretrained deep learning model, specifically the Vgg16 model.

In this notebook, I'll be training and fine-tuning the model. At the end of it, I want to have the weights so that I can run predictions on the validation and test set in subsequent notebooks.

## Admin stuff

In [1]:
%matplotlib inline

In [2]:
from __future__ import division, print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
from datetime import datetime
import re

Load Jeremy's utils.py..I need to wean off these utilities ( at least understand them fully).

In [3]:
import utils; reload(utils)
from utils import plots
from utils import save_array, load_array

 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5103)
Using Theano backend.


In [4]:
# define locations for training, validation, test and sample sets
data_dir = "data"
train_path = "data/train/"
test_path = "data/test/"
validation_path = "data/valid/"
sample_train_path = "data/sample/train/"
sample_validation_path = "data/sample/valid/"
results_path = "data/results/"

## Run punchline code to train sample model

In [17]:
batch_size = 64

In [18]:
# import the vgg16 model

#import
import vgg16; reload(vgg16)
from vgg16 import Vgg16

#instantiate
vgg = Vgg16()

In [19]:
# Grab a few images at a time for training and validation.
# NB: They must be in subdirectories named based on their category
batches = vgg.get_batches(sample_train_path, batch_size=batch_size)
val_batches = vgg.get_batches(sample_validation_path, batch_size=batch_size*2)
vgg.finetune(batches)
vgg.fit(batches, val_batches, nb_epoch=5)

Found 171 images belonging to 10 classes.
Found 40 images belonging to 10 classes.
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


## Now run on the entire training/ validation set

In [20]:
batch_size=64

In [21]:
# import the vgg16 model

#import
import vgg16; reload(vgg16)
from vgg16 import Vgg16

#instantiate
vgg = Vgg16()

In [22]:
# start logging time to execute
starttime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

batches = vgg.get_batches(train_path, batch_size=batch_size)
val_batches = vgg.get_batches(validation_path, batch_size=batch_size*2)

# log time for when the process ends
endtime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print ("timedelta = ", datetime.strptime(endtime,'%Y-%m-%d %H:%M:%S') - datetime.strptime(starttime,'%Y-%m-%d %H:%M:%S'))

Found 17685 images belonging to 10 classes.
Found 4739 images belonging to 10 classes.
timedelta =  0:00:00


In [23]:
# start logging time to execute
starttime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

vgg.finetune(batches)

# log time for when the process ends
endtime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print ("timedelta = ", datetime.strptime(endtime,'%Y-%m-%d %H:%M:%S') - datetime.strptime(starttime,'%Y-%m-%d %H:%M:%S'))

timedelta =  0:00:00


In [24]:
# start logging time to execute
starttime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

vgg.fit(batches, val_batches, nb_epoch=1)

# log time for when the process ends
endtime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print ("timedelta = ", datetime.strptime(endtime,'%Y-%m-%d %H:%M:%S') - datetime.strptime(starttime,'%Y-%m-%d %H:%M:%S'))

Epoch 1/1
timedelta =  0:09:20


In [25]:
# if results path doesnt exist, make one!
if not os.path.isdir(results_path):
    !mkdir $results_path

In [27]:
# save weights of the 1st epoch
vgg.model.save_weights(results_path+'ft_1epoch.h5')

In [28]:
# go another epoch and see how much val acc changes
# start logging time to execute
starttime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')

vgg.fit(batches, val_batches, nb_epoch=1)

# log time for when the process ends
endtime = datetime.now().strftime('%Y-%m-%d %H:%M:%S')
print ("timedelta = ", datetime.strptime(endtime,'%Y-%m-%d %H:%M:%S') - datetime.strptime(starttime,'%Y-%m-%d %H:%M:%S'))

Epoch 1/1
timedelta =  0:09:10


In [29]:
# save weights of the 2nd epoch
vgg.model.save_weights(results_path+'ft_2epoch.h5')

In [30]:
vgg.fit(batches, val_batches, nb_epoch=1)

Epoch 1/1


In [31]:
# save weights of the 3rd epoch
vgg.model.save_weights(results_path+'ft_3epoch.h5')