# Quantization of models

<a target="_blank" href="https://colab.research.google.com/github/toelt-llc/HSLU-WSCS_2025/blob/master/06%20-%20Quantization_of_models_Complete_examples.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

(C) Umberto Michelucci

umberto.michelucci@toelt.ai

www.toelt.ai


In [None]:
!pip3 uninstall tensorflow
!pip3 install tf-nightly

Uninstalling tensorflow-2.1.0rc1:
  Would remove:
    /tensorflow-2.1.0/python3.6/tensorflow-2.1.0rc1.dist-info/*
    /tensorflow-2.1.0/python3.6/tensorflow/*
    /tensorflow-2.1.0/python3.6/tensorflow_core/*
Proceed (y/n)? y
  Successfully uninstalled tensorflow-2.1.0rc1
Collecting tf-nightly
[?25l  Downloading https://files.pythonhosted.org/packages/53/c5/b0217554ef2b896042f6b78f0bd6e3dfc707d7c467a20ae219ab178d92cc/tf_nightly-2.2.0.dev20200113-cp36-cp36m-manylinux2010_x86_64.whl (449.5MB)
[K     |████████████████████████████████| 449.5MB 38kB/s 
Collecting tb-nightly<2.3.0a0,>=2.2.0a0
[?25l  Downloading https://files.pythonhosted.org/packages/ba/32/0379b65809c879b75a736f667c01cc59a9a0c8d760670b04a12b62078388/tb_nightly-2.2.0a20200106-py3-none-any.whl (3.9MB)
[K     |████████████████████████████████| 3.9MB 30.9MB/s 
Collecting tf-estimator-nightly
[?25l  Downloading https://files.pythonhosted.org/packages/7c/92/a7d06f4d28090d0d8cb0e960942d7eafc50f4de45b6629966dfac8ff47e7/tf_estim

In [None]:
#
# DON'T USE THE FOLLOWING MAGIC COMMAND IN CASE YOU ARE INSTALLING TF-NIGHTLY.
# IT WILL USE THE WRONG VERSION OF TENSORFLOW.
#
#try:
  # %tensorflow_version only exists in Colab.
  #%tensorflow_version 2.x
#except Exception:
#  pass


import tensorflow as tf

In [None]:
print(tf.__version__)

2.12.0


In [None]:
import sys
import os

if sys.version_info.major >= 3:
    import pathlib
else:
    import pathlib2 as pathlib

# Add `models` to the python path.
models_path = os.path.join(os.getcwd(), "models")
sys.path.append(models_path)

In [None]:
saved_models_root = "/tmp/mnist_saved_model"

#  Simple MNIST Model

## Fitting and saving the model

In [None]:
import tensorflow.keras as keras
import numpy as np

import matplotlib.pyplot as plt

We first import the `MNIST` dataset with the `tf.keras.datasets`. Note that we reshape the input into a `(60000, 784)` array.

In [None]:
mnist = tf.keras.datasets.mnist

(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
print(x_train.shape)

x_train = x_train.reshape((60000,784))

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
(60000, 28, 28)


We build a very dumb network with just one hidden layer with 128 neurons with `relu` activation functions. The output layer has 10 neurons (since we want to classify 10 classes) and then a `softmax` activation function.

In [None]:
model = tf.keras.models.Sequential([
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(10, activation='softmax')
])

model.compile(loss=keras.losses.sparse_categorical_crossentropy,
         optimizer=keras.optimizers.Adam(),
         metrics=['accuracy'])

In [None]:
model.fit(x_train, y_train, epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7a518aa258a0>

The following cell will save the model. We will use this saved version in the next sections.

In [None]:
export_dir = "/tmp/mnist"
tf.saved_model.save(model, export_dir)



# Plain Conversion (without quantization)

As a first step, we can simply convert the model without any quantization. Basically we simply change its format to `TFLite` but we will not change any weight.

In [None]:
import tensorflow as tf

converter = tf.lite.TFLiteConverter.from_saved_model(export_dir)
converter.experimental_new_converter = True # otherwise you get a warning
tflite_model = converter.convert()

In [None]:
tflite_models_dir = pathlib.Path("/tmp/mnist_tflite_models/")
tflite_models_dir.mkdir(exist_ok=True, parents=True)

In [None]:
tflite_model_file = tflite_models_dir/"mnist_model.tflite"
tflite_model_file.write_bytes(tflite_model),

(408768,)

In [None]:
from google.colab import files
files.download(tflite_models_dir/"mnist_model.tflite")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

## Convert with weight quantization to 8-bits

The following `optimizations` will quantize the weights to 8-bits. So the converted model will have a reduced size as you can see below in the last cell of this section. Note that going from 32-bit to 8-bit will reduce the model size of 1/4.

In [None]:
converter.optimizations = [tf.lite.Optimize.OPTIMIZE_FOR_SIZE]

In [None]:
tflite_model_quant = converter.convert()
tflite_model_quant_file = tflite_models_dir/"mnist_model_quant.tflite"
tflite_model_quant_file.write_bytes(tflite_model_quant)



104024

In [None]:
from google.colab import files
files.download(tflite_models_dir/"mnist_model_quant.tflite")

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
!ls -lh {tflite_models_dir}

total 504K
-rw-r--r-- 1 root root 102K Sep  4 14:12 mnist_model_quant.tflite
-rw-r--r-- 1 root root 400K Sep  4 14:11 mnist_model.tflite


# Evaluation of accuracy of the models

In [None]:
interpreter = tf.lite.Interpreter(model_path=str(tflite_model_file))
interpreter.allocate_tensors()

In [None]:
interpreter_quant = tf.lite.Interpreter(model_path=str(tflite_model_quant_file))
interpreter_quant.allocate_tensors()

In [None]:
image_1 = x_train[0].astype(np.float32)
image_1 = image_1.reshape(1,784)

interpreter.set_tensor(interpreter.get_input_details()[0]["index"], image_1)
interpreter.invoke()
predictions = interpreter.get_tensor(
    interpreter.get_output_details()[0]["index"])

In [None]:
print(np.argmax(predictions))
print(y_train[0])

5
5


Now test the quantized model (using the uint8 data):

In [None]:
image_1 = x_train[0].astype(np.float32)
image_1 = image_1.reshape(1,784)

interpreter_quant.set_tensor(
    interpreter_quant.get_input_details()[0]["index"], image_1)
interpreter_quant.invoke()
predictions = interpreter_quant.get_tensor(
    interpreter_quant.get_output_details()[0]["index"])

In [None]:
print(np.argmax(predictions))
print(y_train[0])

5
5
