# Enter State Farm

In [1]:
%matplotlib inline

import utils;
from utils import *
from IPython.display import FileLink

Using TensorFlow backend.
  return f(*args, **kwds)


In [3]:
batch_size = 64

In [4]:
%pwd

'/home/ubuntu/kaggle/state-farm-driver-detection/code'

In [5]:
path = "../input/sample/" # "../Input/"

## Create batches

In [17]:
batches = get_batches(path+'train', batch_size=batch_size)
val_batches = get_batches(path+'valid', shuffle=False, batch_size=batch_size)

Found 1500 images belonging to 10 classes.
Found 944 images belonging to 10 classes.


In [18]:
trn_classes, val_classes, trn_labels, val_labels, filenames, val_filenames, test_filenames = get_classes(path)

Found 1500 images belonging to 10 classes.
Found 944 images belonging to 10 classes.
Found 50 images belonging to 1 classes.


## Basic models

### Linear model

First, we try the simplest model and use default parameters. Note the trick of making the first layer a batchnorm layer - that way we don't have to worry about normalizing the input ourselves.

In [8]:
model = Sequential([
    BatchNormalization(axis=1, input_shape=(3, 256, 256)),
    Flatten(),
    Dense(10, activation='softmax')
])

As you can see below, this training is going nowhere...

In [9]:
model.compile(Adam(), loss='categorical_crossentropy', metrics=['accuracy'])

In [10]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/2
 - 19s - loss: 14.2623 - acc: 0.1110 - val_loss: 14.1839 - val_acc: 0.1200
Epoch 2/2
 - 16s - loss: 14.4107 - acc: 0.1057 - val_loss: 14.1839 - val_acc: 0.1200


<keras.callbacks.History at 0x7f52ac13e630>

Let's first check the number of parameters to see that there's enough parameters to find some useful relationships:

In [11]:
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
batch_normalization_1 (Batch (None, 3, 256, 256)       12        
_________________________________________________________________
flatten_1 (Flatten)          (None, 196608)            0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1966090   
Total params: 1,966,102
Trainable params: 1,966,096
Non-trainable params: 6
_________________________________________________________________


Over 1.5 million parameters - that should be enough. Incidentally, it's worth checking you understand why this is the number of parameters in this layer:

In [12]:
10*3*256*256

1966080

**Since we have a simple model with no regularization and plenty of parameters, it seems most likely that our learning rate is too high. Perhaps it is jumping to a solution where it predicts one or two classes with high confidence, so that it can give a zero prediction to as many classes as possible - that's the best approach for a model that is no better than random, and there is likely to be where we would end up with a high learning rate. ** So let's check:

In [13]:
np.round(model.predict_generator(batches, steps=ceil(batches.n/batches.batch_size))[:10], 2)

array([[ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.]], dtype=float32)

**Our hypothesis was correct. It's nearly always predicting class 1 or 6, with very high confidence. So let's try a lower learning rate: **

In [19]:
model = Sequential([
    BatchNormalization(axis=1, input_shape=(3, 256, 256)),
    Flatten(),
    Dense(10, activation='softmax')
])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/2
 - 20s - loss: 2.0278 - acc: 0.3437 - val_loss: 1.6475 - val_acc: 0.4820
Epoch 2/2
 - 19s - loss: 1.0710 - acc: 0.6569 - val_loss: 0.9277 - val_acc: 0.7214


<keras.callbacks.History at 0x7f5297c68c18>

In [20]:
model.optimizer.lr = 0.001

In [21]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=4, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/4
 - 20s - loss: 0.6927 - acc: 0.8065 - val_loss: 0.7329 - val_acc: 0.7892
Epoch 2/4
 - 19s - loss: 0.4792 - acc: 0.8916 - val_loss: 0.5997 - val_acc: 0.8337
Epoch 3/4
 - 19s - loss: 0.3643 - acc: 0.9249 - val_loss: 0.5398 - val_acc: 0.8570
Epoch 4/4
 - 19s - loss: 0.2731 - acc: 0.9548 - val_loss: 0.5177 - val_acc: 0.8581


<keras.callbacks.History at 0x7f5297b88da0>

We're stabilizing at validation accuracy of 0.39. Not great, but a lot better than random. **Before moving on, let's check that our validation set on the sample is large enough that it gives consistent results:**

In [22]:
rnd_batches = get_batches(path+'valid', batch_size=batch_size*2, shuffle=True)

Found 944 images belonging to 10 classes.


In [23]:
val_res = [model.evaluate_generator(rnd_batches, steps=ceil(rnd_batches.n/rnd_batches.batch_size)) for i in range(10)]
np.round(val_res, 2)

array([[ 0.52,  0.86],
       [ 0.52,  0.86],
       [ 0.51,  0.86],
       [ 0.51,  0.86],
       [ 0.54,  0.85],
       [ 0.51,  0.86],
       [ 0.51,  0.86],
       [ 0.53,  0.85],
       [ 0.49,  0.87],
       [ 0.55,  0.85]])

**Yup, pretty consistent - if we see improvements of 3% or more, it's probably not random, based on the above samples.**

### L2 regularization

The previous model is over-fitting a lot, but we can't use dropout since we only have one layer. We can try to decrease overfitting in our model by adding [l2 regularization](http://www.kdnuggets.com/2015/04/preventing-overfitting-neural-networks.html/2) (i.e. add the sum of squares of the weights to our loss function):

In [24]:
model = Sequential([
    BatchNormalization(axis=1, input_shape=(3, 256, 256)),
    Flatten(),
    Dense(10, activation='softmax', kernel_regularizer=l2(0.01))
])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/2
 - 20s - loss: 2.2352 - acc: 0.3491 - val_loss: 1.6082 - val_acc: 0.5328
Epoch 2/2
 - 19s - loss: 1.3217 - acc: 0.6330 - val_loss: 1.2038 - val_acc: 0.7087


<keras.callbacks.History at 0x7f5295acc940>

In [25]:
model.optimizer.lr=0.001

In [26]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=4, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/4
 - 20s - loss: 0.9050 - acc: 0.7952 - val_loss: 1.1461 - val_acc: 0.6886
Epoch 2/4
 - 19s - loss: 0.7044 - acc: 0.8790 - val_loss: 0.8585 - val_acc: 0.8008
Epoch 3/4
 - 19s - loss: 0.5473 - acc: 0.9382 - val_loss: 0.7482 - val_acc: 0.8612
Epoch 4/4
 - 19s - loss: 0.4602 - acc: 0.9628 - val_loss: 0.6861 - val_acc: 0.8782


<keras.callbacks.History at 0x7f5295d591d0>

In [27]:
layers = model.layers
dense_idx = [idx for idx, layer in enumerate(layers) if type(layer)==Dense][0]
layers[dense_idx].kernel_regularizer = l2(0.1)

In [28]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/2
 - 20s - loss: 0.4129 - acc: 0.9668 - val_loss: 0.6251 - val_acc: 0.8972
Epoch 2/2
 - 19s - loss: 0.3620 - acc: 0.9847 - val_loss: 0.6058 - val_acc: 0.8951


<keras.callbacks.History at 0x7f5295cfd898>

Looks like we can get a bit over 50% accuracy this way. This will be a good benchmark for our future models - if we can't beat 50%, then we're not even beating a linear model trained on a sample, so we'll know that's not a good approach.

### Single hidden layer

The next simplest model is to add a single hidden layer.

In [20]:
model = Sequential([
    BatchNormalization(axis=1, input_shape=(3, 256, 256)),
    Flatten(),
    Dense(100, activation='relu'),
    BatchNormalization(),
    Dense(10, activation='softmax', kernel_regularizer=l2(0.01))
])
model.compile(Adam(lr=1e-5), loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/2
76s - loss: 2.3246 - acc: 0.2967 - val_loss: 5.7354 - val_acc: 0.1100
Epoch 2/2
80s - loss: 1.6303 - acc: 0.5384 - val_loss: 2.0633 - val_acc: 0.3600


<keras.callbacks.History at 0x1d68ab5e748>

In [21]:
model.optimizer.lr=0.01
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=5, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Epoch 1/5
58s - loss: 1.1671 - acc: 0.6984 - val_loss: 1.4368 - val_acc: 0.5500
Epoch 2/5
84s - loss: 0.9416 - acc: 0.8051 - val_loss: 1.4365 - val_acc: 0.5500
Epoch 3/5
88s - loss: 0.7385 - acc: 0.8768 - val_loss: 1.1436 - val_acc: 0.7100
Epoch 4/5
81s - loss: 0.5632 - acc: 0.9401 - val_loss: 0.9215 - val_acc: 0.8300
Epoch 5/5
82s - loss: 0.5080 - acc: 0.9518 - val_loss: 0.8652 - val_acc: 0.7900


<keras.callbacks.History at 0x1d68a89eef0>

Not looking very encouraging... which isn't surprising since we know that CNNs are a much better choice for computer vision problems. So we'll try one.

### Single conv layer

2 conv layers with max pooling followed by a simple dense network is a good simple CNN to start with:

In [29]:
def conv1(batches):
    model = Sequential([
        BatchNormalization(axis=1, input_shape=(3, 256, 256)),
        Conv2D(32, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3), strides=(3,3)),
        Conv2D(64, (3,3), activation='relu'),
        BatchNormalization(axis=1),
        MaxPooling2D((3,3), strides=(3,3)),
        Flatten(),
        Dense(200, activation='relu'),
        BatchNormalization(),
        Dense(10, activation='softmax')
    ])
    
    model.compile(Adam(lr=1e-4), loss='categorical_crossentropy', metrics=['accuracy'])
    model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=2, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))
    model.optimizer.lr=0.001
    model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=5, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))
    
    return model

In [30]:
conv1(batches)

Epoch 1/2


ResourceExhaustedError: OOM when allocating tensor with shape[46656,200]
	 [[Node: dense_5/kernel/Assign = Assign[T=DT_FLOAT, _class=["loc:@dense_5/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_5/kernel, dense_5/random_uniform)]]

Caused by op 'dense_5/kernel/Assign', defined at:
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/ubuntu/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py", line 16, in <module>
    app.launch_new_instance()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/traitlets/config/application.py", line 658, in launch_instance
    app.start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelapp.py", line 477, in start
    ioloop.IOLoop.instance().start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/ioloop.py", line 177, in start
    super(ZMQIOLoop, self).start()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/ioloop.py", line 888, in start
    handler_func(fd_obj, events)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 440, in _handle_events
    self._handle_recv()
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 472, in _handle_recv
    self._run_callback(callback, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/zmq/eventloop/zmqstream.py", line 414, in _run_callback
    callback(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tornado/stack_context.py", line 277, in null_wrapper
    return fn(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 283, in dispatcher
    return self.dispatch_shell(stream, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 235, in dispatch_shell
    handler(stream, idents, msg)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/kernelbase.py", line 399, in execute_request
    user_expressions, allow_stdin)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/ipkernel.py", line 196, in do_execute
    res = shell.run_cell(code, store_history=store_history, silent=silent)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/ipykernel/zmqshell.py", line 533, in run_cell
    return super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2698, in run_cell
    interactivity=interactivity, compiler=compiler, result=result)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2808, in run_ast_nodes
    if self.run_code(code, result):
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2862, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-30-067136c963f6>", line 1, in <module>
    conv1(batches)
  File "<ipython-input-29-23b002426329>", line 13, in conv1
    Dense(10, activation='softmax')
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/models.py", line 407, in __init__
    self.add(layer)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/models.py", line 475, in add
    output_tensor = layer(self.outputs[0])
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 576, in __call__
    self.build(input_shapes[0])
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/layers/core.py", line 830, in build
    constraint=self.kernel_constraint)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/engine/topology.py", line 400, in add_weight
    constraint=constraint)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 376, in variable
    v = tf.Variable(value, dtype=tf.as_dtype(dtype), name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 213, in __init__
    constraint=constraint)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 346, in _init_from_args
    validate_shape=validate_shape).op
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 276, in assign
    validate_shape=validate_shape)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 57, in assign
    use_locking=use_locking, name=name)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
    op_def=op_def)
  File "/home/ubuntu/anaconda3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[46656,200]
	 [[Node: dense_5/kernel/Assign = Assign[T=DT_FLOAT, _class=["loc:@dense_5/kernel"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](dense_5/kernel, dense_5/random_uniform)]]


The training set here is very rapidly reaching a very high accuracy. So if we could regularize this, perhaps we could get a reasonable result.

So, what kind of regularization should we try first? As we discussed in lesson 3, we should start with data augmentation.

## Data augmentation

To find the best data augmentation parameters, we can try each type of data augmentation, one at a time. For each type, we can try four very different levels of augmentation, and see which is the best. In the steps below we've only kept the single best result we found. We're using the CNN we defined above, since we have already observed it can model the data quickly and accurately.

#### Width shift: move the image left and right -

In [12]:
gen_t = image.ImageDataGenerator(width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
conv1(batches)

Found 1401 images belonging to 10 classes.
Epoch 1/2
1944s - loss: 2.1249 - acc: 0.2919 - val_loss: 2.4428 - val_acc: 0.1100
Epoch 2/2
1639s - loss: 1.4945 - acc: 0.5242 - val_loss: 1.7059 - val_acc: 0.4200
Epoch 1/5
1615s - loss: 1.1828 - acc: 0.6492 - val_loss: 1.0483 - val_acc: 0.7900
Epoch 2/5
2194s - loss: 0.9654 - acc: 0.7266 - val_loss: 0.8579 - val_acc: 0.7800
Epoch 3/5
4085s - loss: 0.8456 - acc: 0.7635 - val_loss: 0.8808 - val_acc: 0.7900
Epoch 4/5
1593s - loss: 0.7182 - acc: 0.8090 - val_loss: 0.5704 - val_acc: 0.8700
Epoch 5/5
1145s - loss: 0.6184 - acc: 0.8516 - val_loss: 0.6392 - val_acc: 0.7900


<keras.models.Sequential at 0x16818b3f550>

#### Height shift: move the image up and down -

In [13]:
gen_t = image.ImageDataGenerator(height_shift_range=0.05)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
conv1(batches)

Found 1401 images belonging to 10 classes.
Epoch 1/2
1176s - loss: 1.8384 - acc: 0.4183 - val_loss: 2.3572 - val_acc: 0.2600
Epoch 2/2
1156s - loss: 0.9802 - acc: 0.7110 - val_loss: 1.1929 - val_acc: 0.6000
Epoch 1/5
1167s - loss: 0.6405 - acc: 0.8239 - val_loss: 0.5394 - val_acc: 0.8500
Epoch 2/5
1187s - loss: 0.4766 - acc: 0.8857 - val_loss: 0.3585 - val_acc: 0.9400
Epoch 3/5
1895s - loss: 0.3868 - acc: 0.9162 - val_loss: 0.4166 - val_acc: 0.9000
Epoch 4/5
1802s - loss: 0.2990 - acc: 0.9432 - val_loss: 0.4091 - val_acc: 0.8800
Epoch 5/5
1966s - loss: 0.2761 - acc: 0.9460 - val_loss: 0.2518 - val_acc: 0.9600


<keras.models.Sequential at 0x16818ff1710>

** Random shear angles (max in radians) - **

In [None]:
gen_t = image.ImageDataGenerator(shear_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
conv1(batches)

**Rotation: max in degrees -**

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
conv1(batches)

** Channel shift: randomly changing the R,G,B colors - **

In [None]:
gen_t = image.ImageDataGenerator(channel_shift_range=20)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
conv1(batches)

#### And finally, putting it all together!

In [None]:
gen_t = image.ImageDataGenerator(rotation_range=15, height_shift_range=0.05, 
                shear_range=0.1, channel_shift_range=20, width_shift_range=0.1)
batches = get_batches(path+'train', gen_t, batch_size=batch_size)
conv1(batches)

** At first glance, this isn't looking encouraging, since the validation set is poor and getting worse. But the training set is getting better, and still has a long way to go in accuracy - so we should try annealing our learning rate and running more epochs, before we make a decisions.**

In [None]:
model.optimizer.lr=0.0001
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=5, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Lucky we tried that - we starting to make progress! Let's keep going.

In [None]:
model.fit_generator(batches, steps_per_epoch=ceil(batches.n/batches.batch_size), epochs=25, verbose=2,
                   validation_data=val_batches, validation_steps=ceil(val_batches.n/val_batches.batch_size))

Amazingly, using nothing but a small sample, a simple (not pre-trained) model with no dropout, and data augmentation, we're getting results that would get us into the top 50% of the competition! This looks like a great foundation for our futher experiments.

To go further, we'll need to use the whole dataset, since dropout and data volumes are very related, so we can't tweak dropout without using all the data.