**What I want to do: ** After making a few submissions, I want to devise a method to ascertain the accuracy and sensitivity of my predictions based on the validation set. Naturally, I would want to run this first before actually submitting to kaggle, but hey, this isn't real life.

## Admin stuff

In [11]:
%matplotlib inline

In [12]:
from __future__ import division, print_function

import os, json
from glob import glob
import numpy as np
np.set_printoptions(precision=4, linewidth=100)
from matplotlib import pyplot as plt
from datetime import datetime
import re

In [13]:
data_dir = "data_redux"
train_path = "data_redux/train/"
test_path = "data_redux/test/"
validation_path = "data_redux/valid/"
sample_train_path = "data_redux/sample/train/"
sample_validation_path = "data_redux/sample/valid/"
results_path = "data_redux/results/"

In [14]:
# some useful utilities and the vgg16 model
import utils; reload(utils)
from utils import plots
from utils import save_array, load_array

In [44]:
# import the vgg16 model
import vgg16; reload(vgg16)
from vgg16 import Vgg16, image

In [16]:
# instantiate the vgg16 model
vgg = Vgg16()

## Load model weights into the vgg16 model

In [17]:
vgg.model.load_weights(results_path+'fulltraining.h5')

## Visualise results using the validation set

Jeremy's big idea is to run 5 different kinds of visual tests to gain intuition about model performance:<br>
1. get a bunch of labels correctly labeled at random
2. get a bunch of lables incorrectly labeled at random
3. get a bunch of labels that are most correctly labeled (highest probability and turned out to be right)
4. get a bunch of most incorrectly labeled (highest probability and turned out to be incorrect)
5. get a bunch of labels most unsure of (mid range probability)

In [18]:
!ls $sample_validation_path

cat  dog


In [19]:
batch_size = 64

In [20]:
valBatches, preds = vgg.test(sample_validation_path, batch_size=batch_size)

Found 20 images belonging to 2 classes.


In [26]:
preds[:10]

array([[  9.9988e-01,   1.1681e-04],
       [  1.0000e+00,   1.3519e-24],
       [  1.0000e+00,   1.0034e-15],
       [  1.0000e+00,   1.0388e-11],
       [  1.0000e+00,   4.1032e-21],
       [  1.0000e+00,   7.2762e-10],
       [  1.0000e+00,   1.1195e-12],
       [  2.0145e-14,   1.0000e+00],
       [  4.2306e-12,   1.0000e+00],
       [  7.1702e-09,   1.0000e+00]], dtype=float32)

In [27]:
valBatches.filenames

['cat/cat.2326.jpg',
 'cat/cat.3917.jpg',
 'cat/cat.4073.jpg',
 'cat/cat.5143.jpg',
 'cat/cat.533.jpg',
 'cat/cat.7536.jpg',
 'cat/cat.8058.jpg',
 'dog/dog.1070.jpg',
 'dog/dog.1125.jpg',
 'dog/dog.11472.jpg',
 'dog/dog.2875.jpg',
 'dog/dog.4028.jpg',
 'dog/dog.4906.jpg',
 'dog/dog.5354.jpg',
 'dog/dog.5894.jpg',
 'dog/dog.6094.jpg',
 'dog/dog.6350.jpg',
 'dog/dog.7553.jpg',
 'dog/dog.792.jpg',
 'dog/dog.9642.jpg']

### Compare actuals with predicted performance

In [42]:
valBatches.classes[:10]

array([0, 0, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int32)

In [36]:
actual_dog = [1 if 'dog' in file else 0 for file in valBatches.filenames]
actual_dog[:10]

[0, 0, 0, 0, 0, 0, 0, 1, 1, 1]

In [38]:
predicted_dog = np.round(np.dot(preds,[0,1]),1)
predicted_dog[:10]

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  1.,  1.])

In [39]:
import pandas as pd

In [40]:
df_perf = pd.DataFrame({
        'filename':valBatches.filenames,
        'actuals':actual_dog,
        'predicted':predicted_dog
    })
df_perf.head(10)

Unnamed: 0,actuals,filename,predicted
0,0,cat/cat.2326.jpg,0.0
1,0,cat/cat.3917.jpg,0.0
2,0,cat/cat.4073.jpg,0.0
3,0,cat/cat.5143.jpg,0.0
4,0,cat/cat.533.jpg,0.0
5,0,cat/cat.7536.jpg,0.0
6,0,cat/cat.8058.jpg,0.0
7,1,dog/dog.1070.jpg,1.0
8,1,dog/dog.1125.jpg,1.0
9,1,dog/dog.11472.jpg,1.0


## Prepare plots

In [50]:
# number of plots to view at the same time
nView=4

In [71]:
def plot_idx(idx, titles=None, sample=False):
    if sample==True:
        plots([image.load_img(sample_validation_path+filenames[i] for i in idx)],titles=titles)
    else:
        plots([image.load_img(validation_path+filenames[i] for i in idx)],titles=titles)

### Plot where pred = actual

In [62]:
correct = np.where(actual_dog==predicted_dog)[0]
correct

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19])

In [63]:
idx = np.random.choice(correct,nView)
idx

array([11, 17,  3, 18])

In [69]:
titles = [valBatches.filenames[i] for i in idx]
titles

['dog/dog.4028.jpg', 'dog/dog.7553.jpg', 'cat/cat.5143.jpg', 'dog/dog.792.jpg']

In [72]:
plot_idx(idx,titles=titles, sample=True)

AttributeError: 'generator' object has no attribute 'read'