# Pretrained networks for fashion

The fashion MNIST dataset is pretty big, but what if we only had a few 10s of examples from each category?

In this notebook and the next we will illustrate the benefit of pretraining your network on a very large dataset that is only somewhat related to the task you want to solve. In our case we will pretrain on MNIST digits to help solve the fashion MNIST dataset.

Here we train a small fully connected network using only 300 examples. This is to simulate what it is like to only have a small dataset.

After this in the next notebook we will pretrain a more complex model on MNIST digits, and then use that trained network for fashion MNIST with only 300 examples. Hopefully we get better performance than here.

In a real world setting you would use a network pretrained on ImageNet or similar and apply it to your use case (say rock, paper, scissors). However training on ImageNet still takes too long for a classroom setting so we have to make do with these smaller toy sets.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

from keras.datasets import fashion_mnist
from keras import utils

from sklearn.model_selection import train_test_split


(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

In [2]:
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
                                                  test_size=10000,
                                                  random_state=42)
# performing the split in two steps like this makes sure our validation
# dataset is the same in both cases. Just to make sure we don't suffer from
# statistical uncertainty when comparing validation accuracies
X_train = X_train[:300]
y_train = y_train[:300]

In [3]:
X_train.shape

In [4]:
X_val.shape

In [5]:
num_classes = 10
y_train = utils.to_categorical(y_train, num_classes)
y_val = utils.to_categorical(y_val, num_classes)
y_test = utils.to_categorical(y_test, num_classes)

## Fully connected network from scratch

Let's see how well it performs.

Question: compare the performance to what you get when you use the full training set (Maybe with a more powerful model -> more 'neurons' per layer).

In [6]:
from keras.models import Model
from keras.layers import Input, Dense, Activation, Flatten

In [7]:
batch_size = 64

x = Input(shape=(28, 28, ))

h = Flatten()(x)

h = Dense(20)(h)
h = Activation('relu')(h)
h = Dense(20)(h)
h = Activation('relu')(h)
h = Dense(10)(h)
y = Activation('softmax')(h)

net = Model(x, y)

In [8]:
net.compile(loss='categorical_crossentropy',
            optimizer='adam',
            metrics=['accuracy'])

In [9]:
history = net.fit(X_train, y_train,
                  batch_size=batch_size,
                  epochs=20,
                  verbose=1,
                  validation_data=(X_val, y_val))