Saving and restoring models are important for training if we want to restart training from a certain position. This allows us to save all of the parameters during training and start whenever is needed. In this notebook, [MNIST](http://yann.lecun.com/exdb/mnist/) dataset is used and only the first 1000 samples are used in order to train fast to demonstrate how model saving works.

In [1]:
# Import needed packages.
import os

import tensorflow as tf
from tensorflow import keras

In [2]:
# Download dataset.
(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()

train_labels = train_labels[:1000]
test_labels = test_labels[:1000]

train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0
test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0

In [3]:
# Model construction.
def model_create():
    model = keras.models.Sequential([keras.layers.Dense(512, activation=tf.nn.relu,input_shape=(784,)),
                                   keras.layers.Dropout(0.2),
                                   keras.layers.Dense(10, activation=tf.nn.softmax)

    ])

    model.compile(optimizer=keras.optimizers.Adam(),
                 loss=keras.losses.sparse_categorical_crossentropy,
                 metrics=['accuracy'])
    return model

model = model_create()
model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 512)               401920    
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________


In [4]:
# Save model checkpoint.
ckpt_path = 'ckpts/model1.ckpt'
ckpt_dir = os.path.dirname(ckpt_path)

# Create checkpoint callback.
ckpt_callback = keras.callbacks.ModelCheckpoint(ckpt_path,
                                                save_weights_only=True,
                                                verbose=1)

model.fit(train_images, train_labels, epochs=10,
          validation_data=(test_images, test_labels),
          callbacks=[ckpt_callback])

Train on 1000 samples, validate on 1000 samples
Epoch 1/10

Epoch 00001: saving model to ckpts/model1.ckpt
Epoch 2/10

Epoch 00002: saving model to ckpts/model1.ckpt
Epoch 3/10

Epoch 00003: saving model to ckpts/model1.ckpt
Epoch 4/10

Epoch 00004: saving model to ckpts/model1.ckpt
Epoch 5/10

Epoch 00005: saving model to ckpts/model1.ckpt
Epoch 6/10

Epoch 00006: saving model to ckpts/model1.ckpt
Epoch 7/10

Epoch 00007: saving model to ckpts/model1.ckpt
Epoch 8/10

Epoch 00008: saving model to ckpts/model1.ckpt
Epoch 9/10

Epoch 00009: saving model to ckpts/model1.ckpt
Epoch 10/10

Epoch 00010: saving model to ckpts/model1.ckpt


<tensorflow.python.keras.callbacks.History at 0x1c6f04d13c8>

Now if we nevigate to the jupyter notebook directory, we will find a new folder called "ckpts" with three different files of "checkpoint", "model1.ckpt.data-00000-of-00001" and "model1.ckpt.index". Next step is to create a new model to be evaluated using the validation set. Then load the checkpoint to evaluate again. The expected result is that the new model would have low accuracy due to the randomness of the parameters, while the model with loaded checkpoint would have a high accuracy since it comes from a trained model.

In [5]:
# Create and evaluate a new model.
new_model = model_create()

loss, acc = new_model.evaluate(test_images, test_labels)
print('Accuracy of the untrained new model is: ', acc)

Accuracy of the untrained new model is:  0.073


In [6]:
# Load the checkpoint and evaluate again.
new_model.load_weights(ckpt_path)

loss, acc = new_model.evaluate(test_images, test_labels)
print('Accuracy of the new model with loaded checkpoint is: ', acc)

Accuracy of the new model with loaded checkpoint is:  0.869


The difference obviously comes from the random model and the trained model. This means the checkpoint parameters are loaded successfully to the new model. And the next step is to do the same trick but save model every 5 epochs, which is more common and useful in deep learning training.

In [7]:
# Save model checkpoint.
ckpt_path = 'ckpts/model2-{epoch:02d}.ckpt'
ckpt_dir = os.path.dirname(ckpt_path)

# Create checkpoint callback.
ckpt_callback = keras.callbacks.ModelCheckpoint(ckpt_path,
                                                save_weights_only=True,
                                                verbose=1,
                                                period=5)

model = model_create()
model.fit(train_images, train_labels, epochs=50,
          validation_data=(test_images, test_labels),
          callbacks=[ckpt_callback],
          verbose=0)


Epoch 00005: saving model to ckpts/model2-05.ckpt

Epoch 00010: saving model to ckpts/model2-10.ckpt

Epoch 00015: saving model to ckpts/model2-15.ckpt

Epoch 00020: saving model to ckpts/model2-20.ckpt

Epoch 00025: saving model to ckpts/model2-25.ckpt

Epoch 00030: saving model to ckpts/model2-30.ckpt

Epoch 00035: saving model to ckpts/model2-35.ckpt

Epoch 00040: saving model to ckpts/model2-40.ckpt

Epoch 00045: saving model to ckpts/model2-45.ckpt

Epoch 00050: saving model to ckpts/model2-50.ckpt


<tensorflow.python.keras.callbacks.History at 0x1c6efcdc1d0>

Now we have 10 new saved checkpoionts in the "ckpts" folder, which are saved for each 5 epochs of the trained model. And we can extract the last checkpoint file directly by using the following code:

In [8]:
latest_ckpt = tf.train.latest_checkpoint(ckpt_dir)
latest_ckpt

'ckpts\\model2-50.ckpt'

In [9]:
# Load the latest checkpoint and evaluate the model.
new_model = model_create()
new_model.load_weights(latest_ckpt)

loss, acc = new_model.evaluate(test_images, test_labels)
print('Accuracy of the new model with loaded checkpoint is: ', acc)

Accuracy of the new model with loaded checkpoint is:  0.874


Besides, tensorflow also provides a way to save and load the model weights manually as shown below(notice that there could be no postfix):

In [10]:
new_model.save_weights('./ckpts/new_model_manual')  # save weights manually
model = model_create()
model.load_weights('./ckpts/new_model_manual')  # load weights manually

loss, acc = model.evaluate(test_images, test_labels)
print('Accuracy of the new model with loaded checkpoint is: ', acc)

Accuracy of the new model with loaded checkpoint is:  0.874


Furthermore, tensorflow also proveides us a way to save the entire model instead of only parameters. The model would also contain optimizer and model configuration.

In [11]:
model = model_create()

model.fit(train_images, train_labels, epochs=10,
          validation_data=(test_images, test_labels))

model.save('ckpts/entire_model')  # save the entire model

Train on 1000 samples, validate on 1000 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [12]:
new_model = keras.models.load_model('ckpts/entire_model')  # load the entire model

new_model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 512)               401920    
_________________________________________________________________
dropout_5 (Dropout)          (None, 512)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 10)                5130      
Total params: 407,050
Trainable params: 407,050
Non-trainable params: 0
_________________________________________________________________
