# State Farm Distracted Driver Detection


[State Farm Distracted Driver Detection](https://www.kaggle.com/c/state-farm-distracted-driver-detection#evaluation)

## Action Plan
### 1. Data Preparation and Preprocessing
### 2. Finetune and Train Model
### 3. Generate and Validate Predictions 
### 4. Submit predictions to Kaggle

# 4. Submit predictions to Kaggle

## Setup 

In [44]:
%cd "~/kaggle/state-farm-driver-detection/code"
%pwd

/home/ubuntu/kaggle/state-farm-driver-detection/code


'/home/ubuntu/kaggle/state-farm-driver-detection/code'

In [45]:
#Create references to important directories we will use over and over
import os, sys
current_dir = os.getcwd()
CODE_HOME_DIR = current_dir
DATA_HOME_DIR = CODE_HOME_DIR + '/../input/'
print(CODE_HOME_DIR)
print(DATA_HOME_DIR)

/home/ubuntu/kaggle/state-farm-driver-detection/code
/home/ubuntu/kaggle/state-farm-driver-detection/code/../input/


In [3]:
#import modules
#from importlib import reload

import utils
from utils import *

import vgg16bn_ted
from vgg16bn_ted import Vgg16BN; 

%matplotlib inline

Using TensorFlow backend.
  return f(*args, **kwds)


#### Setup Paths

In [46]:
%cd $DATA_HOME_DIR

#Set path to sample/ path if desired
path = DATA_HOME_DIR #+ '/' # + '/sample/' 
results_path = path + 'results/'
train_path = path + 'train/'
valid_path = path + 'valid/'
test_path = path + 'test/'
model_path = path + 'models/'
if not os.path.exists(model_path): os.mkdir(model_path)

/home/ubuntu/kaggle/state-farm-driver-detection/input


In [47]:
results_path

'/home/ubuntu/kaggle/state-farm-driver-detection/code/../input/results/'

#### Load Data Classes, Labels, and Filenames

In [5]:
trn_classes = np.array(load_array(results_path+'train_classes.bc'))
val_classes = np.array(load_array(results_path+'valid_classes.bc'))
trn_labels = onehot(trn_classes)
val_labels = onehot(val_classes)

In [6]:
trn_filenames = load_array(results_path+'train_filenames.bc')
val_filenames = load_array(results_path+'valid_filenames.bc')
test_filenames = load_array(results_path+'test_filenames.bc')

In [7]:
trn_data = load_array(results_path+'trn_data.bc')
val_data = load_array(results_path+'val_data.bc')

#### Load Pretrained conv_model Feature Maps

In [8]:
conv_trn_features = load_array(results_path+'train_convlayer_features.dat')
conv_val_features = load_array(results_path+'valid_convlayer_features2.dat')
conv_test_features = load_array(results_path+'test_convlayer_features.dat')

# 4. Submit predictions to Kaggle

Here's the format Kaggle requires for new submissions:
```
img,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9
img_1.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_10.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_100.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_1000.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_100000.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_100001.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_100002.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_100003.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
 ...
img_99996.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_99998.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
img_99999.jpg,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1
```

Kaggle wants the imageId followed by the probability of each dog breeds. Kaggle uses a metric called [Log Loss](http://wiki.fast.ai/index.php/Log_Loss) to evaluate your submission.

#### Setup Models

In [9]:
vgg = Vgg16BN(size=(256, 256))
model = vgg.model

In [10]:
batch_size=64

In [11]:
#Finetune the model
vgg.finetune(vgg.get_batches(train_path, shuffle=False, batch_size=batch_size))

Found 17940 images belonging to 10 classes.


In [12]:
_, fc_model = get_split_models(model)

In [13]:
model_name = 'Vgg16BN'
sub_model_name = 'fc-model'
fc_model.load_weights(model_path + model_name + '_' + sub_model_name + '.h5')

### Find a good clipping amount using the validation set, prior to submitting

In [14]:
val_preds = fc_model.predict(conv_val_features, batch_size=batch_size)

In [15]:
for max_pred in [0.91, 0.93, 0.95, 0.97, 0.99] :
    clip_val_preds = do_clip(val_preds, max_pred)
    print("clip[{}] loss: {}, accuracy: {}".format(max_pred, 
                                               eval_crossentropy(val_labels, clip_val_preds), 
                                               eval_accuracy(val_labels, clip_val_preds)))

clip[0.91] loss: 0.11299741268157959, accuracy: 0.9953166842460632
clip[0.93] loss: 0.09213834255933762, accuracy: 0.9953166842460632
clip[0.95] loss: 0.07199419289827347, accuracy: 0.9953166842460632
clip[0.97] loss: 0.052834637463092804, accuracy: 0.9953166842460632
clip[0.99] loss: 0.035930994898080826, accuracy: 0.9953166842460632


### Predict Result

In [17]:
preds = fc_model.predict(conv_test_features, batch_size=batch_size*2)

In [29]:
subm = do_clip(preds,0.93)

In [48]:
subm_name = results_path+'subm.gz'

In [31]:
import pandas as pd
submission = pd.DataFrame(subm, columns=vgg.classes)
submission.insert(0, 'img', [a[8:] for a in test_filenames])
submission.head()

Unnamed: 0,img,c0,c1,c2,c3,c4,c5,c6,c7,c8,c9
0,img_1.jpg,0.007778,0.007778,0.007778,0.007778,0.007778,0.93,0.007778,0.007778,0.007778,0.007778
1,img_10.jpg,0.007778,0.007778,0.007778,0.016752,0.007778,0.93,0.007778,0.007778,0.007778,0.007778
2,img_100.jpg,0.93,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778
3,img_1000.jpg,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778,0.93,0.007778
4,img_100000.jpg,0.007778,0.007778,0.007778,0.93,0.007778,0.007778,0.007778,0.007778,0.007778,0.007778


In [49]:
submission.to_csv(subm_name, index=False, compression='gzip')

In [53]:
from IPython.display import FileLink
FileLink('../input/results/subm.gz')