<a href="https://colab.research.google.com/github/ujjwaltyagi355/Machine-learning/blob/master/Model_Pruning__on_MNIST_Dataset.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Model Pruning On MNIST Dataset:


With the ever evolving technological sphere and with bigger and bigger models being proposed to increase the accuracy or performance of the currently available models, the need to reduce the data and computational cost needs a direction. And with model pruning we can achieve this.
The most comman type of model pruning is Weight pruning which can be employed by reducing the less useful weights from the trained model(or in some cases where random weights are initialised at the start).
Such that number of parameters to be used becomes less, and hence the computational cost becomes less, the model size(with respect to the memory it takes) decreases, which minor compromise to the accuracy.
Let's see how it's done...

We will be working with the keras implementation.
So importing the required setup.

In [None]:
pip install -q tensorflow-model-optimization

[K     |████████████████████████████████| 174kB 4.9MB/s 
[K     |████████████████████████████████| 296kB 15.4MB/s 
[?25h

In [None]:
import tempfile
import os

import tensorflow as tf
import numpy as np

from tensorflow import keras

%load_ext tensorboard

Now, training the MNIST dataset based model without pruning.
This will be a less complex model to start with.

In [None]:
# Load MNIST dataset
mnist = keras.datasets.mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# Normalize the input image so that each pixel value is between 0 to 1.
train_images = train_images / 255.0
test_images = test_images / 255.0

# Define the model architecture.
model = keras.Sequential([
  keras.layers.InputLayer(input_shape=(28, 28)),
  keras.layers.Reshape(target_shape=(28, 28, 1)),
  keras.layers.Conv2D(filters=12, kernel_size=(3, 3), activation='relu'),
  keras.layers.MaxPooling2D(pool_size=(2, 2)),
  keras.layers.Flatten(),
  keras.layers.Dense(10)
])

# Train the digit classification model
model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model.fit(
  train_images,
  train_labels,
  epochs=5,
  validation_split=0.25,
)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<tensorflow.python.keras.callbacks.History at 0x7f010a223710>

We will now be calculating the baseline accuracy and saving the model to use its weight as a pre-trained for the later use while pruning.

In [None]:
_, baseline_model_accuracy = model.evaluate(
    test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy)

_, keras_file = tempfile.mkstemp('.h5')
tf.keras.models.save_model(model, keras_file, include_optimizer=False)
print('Saved baseline model to:', keras_file)

Baseline test accuracy: 0.9771000146865845
Saved baseline model to: /tmp/tmpayddxtd7.h5


Here, we will be importing the built-in tensoflow optimization  module as tfmot for pruning.
Then we will be applyonh sparsity(removing multiple weights and interconnections) on to the model starting for 40% to 85%.

In [None]:
import tensorflow_model_optimization as tfmot

prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude

# Compute end step to finish pruning after 4 epochs.
batch_size = 128
epochs = 4
validation_split = 0.25 # 25% of training set will be used for validation set. 

num_images = train_images.shape[0] * (1 - validation_split)
end_step = np.ceil(num_images / batch_size).astype(np.int32) * epochs #defining the end step to erdicate the chances of model exceding the limit of batch size

# Define model for pruning.
pruning_params = {
      'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(initial_sparsity=0.40,
                                                               final_sparsity=0.85,
                                                               begin_step=0,
                                                               end_step=end_step)
}

#here we have taken the model defined above, and applied the pruning_parameters to it.
model_for_pruning = prune_low_magnitude(model, **pruning_params)

# `prune_low_magnitude` requires a recompile.
model_for_pruning.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])

model_for_pruning.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
prune_low_magnitude_reshape_ (None, 28, 28, 1)         1         
_________________________________________________________________
prune_low_magnitude_conv2d_2 (None, 26, 26, 12)        230       
_________________________________________________________________
prune_low_magnitude_max_pool (None, 13, 13, 12)        1         
_________________________________________________________________
prune_low_magnitude_flatten_ (None, 2028)              1         
_________________________________________________________________
prune_low_magnitude_dense_2  (None, 10)                40572     
Total params: 40,805
Trainable params: 20,410
Non-trainable params: 20,395
_________________________________________________________________


Here, if we look at the results the total parameters have reduced from 40,805 to 20,410. Which implies the speed or performance of the model increases.


Fine tune with pruning for two epochs.

`tfmot.sparsity.keras.UpdatePruningStep` is required during training,
and `tfmot.sparsity.keras.PruningSummaries` provides logs for tracking progress and debugging.

In [None]:
logdir = tempfile.mkdtemp()

callbacks = [
  tfmot.sparsity.keras.UpdatePruningStep(),
  tfmot.sparsity.keras.PruningSummaries(log_dir=logdir),
]
  
model_for_pruning.fit(train_images, train_labels,
                  batch_size=batch_size, epochs=epochs, validation_split=validation_split,
                  callbacks=callbacks)

Epoch 1/4
Epoch 2/4
Epoch 3/4
Epoch 4/4


<tensorflow.python.keras.callbacks.History at 0x7f0057437978>

In [None]:
_, model_for_pruning_accuracy = model_for_pruning.evaluate(
   test_images, test_labels, verbose=0)

print('Baseline test accuracy:', baseline_model_accuracy) 
print('Pruned test accuracy:', model_for_pruning_accuracy)

Baseline test accuracy: 0.9771000146865845
Pruned test accuracy: 0.9733999967575073


With comparison of the baseline and pruned accuracy we can make out the fact that the pruning did not have a great impact on accuracy of the model, but as we have seen earlier the model's performance has increased as the parameters have reduced.

Now, let's see a method through which we can reduce the size of the model namely we are going to gzip the model now to reduce it size.
Both `tfmot.sparsity.keras.strip_pruning` and applying a standard compression algorithm (e.g. via gzip) are necessary to see the compression benefits of pruning.

Creating a compressible model.


In [None]:
model_for_export = tfmot.sparsity.keras.strip_pruning(model_for_pruning)

_, pruned_keras_file = tempfile.mkstemp('.h5')
tf.keras.models.save_model(model_for_export, pruned_keras_file, include_optimizer=False)
print('Saved pruned Keras model to:', pruned_keras_file)

Saved pruned Keras model to: /tmp/tmpw3dtf_8l.h5


 Creating a compressible model for TFLite.

In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(model_for_export)
pruned_tflite_model = converter.convert()

_, pruned_tflite_file = tempfile.mkstemp('.tflite')

with open(pruned_tflite_file, 'wb') as f:
  f.write(pruned_tflite_model)

print('Saved pruned TFLite model to:', pruned_tflite_file)

Saved pruned TFLite model to: /tmp/tmp3e_u6w8y.tflite


Creating an function to gzip the model and return the zipped model.

In [None]:
def get_gzipped_model_size(file):
  # Returns size of gzipped model, in bytes.
  import os
  import zipfile

  _, zipped_file = tempfile.mkstemp('.zip')
  with zipfile.ZipFile(zipped_file, 'w', compression=zipfile.ZIP_DEFLATED) as f:
    f.write(file)

  return os.path.getsize(zipped_file)

In [None]:
print("Size of gzipped baseline Keras model: %.2f bytes" % (get_gzipped_model_size(keras_file)))
print("Size of gzipped pruned Keras model: %.2f bytes" % (get_gzipped_model_size(pruned_keras_file)))
print("Size of gzipped pruned TFlite model: %.2f bytes" % (get_gzipped_model_size(pruned_tflite_file)))

Size of gzipped baseline Keras model: 77978.00 bytes
Size of gzipped pruned Keras model: 25630.00 bytes
Size of gzipped pruned TFlite model: 24403.00 bytes


Here, we see that we have reduced the size of the model from 77978 bytes to 25630 bytes.

The model size could be reduced to 10X further with the help of combining puring and quantisation together.