To create a generator, we'll want numpy and keras. Begin by importing these.

In [None]:
import numpy as np
import keras

Now we can create a class inheriting from the 'keras.utils.Sequence' class. 

There are a few required methods for this class:

\__init\__ - your standard initialisation method

\__len\__ - returning the number of batches per epoch

\__getitem\__ - the core of the class, which returns (from a given index) a batch of data for training, testing and evaluation.

Let's look at an example generator.

In [0]:
class MyGenerator(keras.utils.Sequence):
    'Generates data for Keras'
    def __init__(self, ids, train_dir):
        'Initialization'
        self.ids = ids
        self.train_dir = train_dir
    def __len__(self):
        'Denotes the number of batches per epoch'
        return len(self.ids)
    def __getitem__(self, index):
        batch_id = self.ids[index]
        # load data
        X = numpy.load(self.train_dir + '/batch_data/lbatchX_' + str(batch_id) + '.npy')
        Y = numpy.load(self.train_dir + '/batch_data/lbatchY_' + str(batch_id) + '.npy')
        return X, Y

This is probably as simple as a generator can get. 

I precreate the batches as individual files 'lbatchX_ZZ.npy' and 'lbatchY_ZZ.npy' (where ZZ defines the index of each batch); 'lbatchX_ZZ.npy' contains the inputs and 'lbatchY_ZZ.npy' contains the outputs. All the generator has to do is load in the right files and spit out the numpy arrays. 

When Keras fits/evaluates/predicts using a generator of this class, it will iterate over the values from zero to the length of the generator (as calculated in the  \__len\__ method and supply this as an input to the \__getitem\__ method. If training or evaluating, the generator must return both input and output variables for the model; if predicting, the generator can return only input (but returning both is permissable)

I give the directory containing the training data as a variable in class initialization as an example of an optional input to the generator.

To implement this generator, let's pretend that we've created one hundred batches and that the first ninety are a training set and the last ten are a validation set.

In [0]:
trainGen = MyGenerator(range(90), "trainingDir")
valGen = MyGenerator(range(90, 100), "trainingDir")

Now you can just construct your model, compile it, and fit using the generators created above.  Shuffling the data will shuffle the indices supplied to \__getitem\__ at the start of each epoch. If you supply 'steps_per_epoch', only this number of steps will be carried out over each epoch, otherwise, the whole set of indices will be used.

Similarly, you can use 'model.predict_generator' and 'model.evaluate_generator' to replace 'model.predict' and 'model.evaluate'.

In [None]:
model.fit_generator(generator = trainGen, validation_data = valGen, epochs=100, shuffle = True)
model.predict_generator(trainGen)
model.evaluate_generator(valGen)