# Converting a model with TF Lite converter

<a target="_blank" href="https://colab.research.google.com/github/toelt-llc/HSLU-WSCS_2025/blob/master/05%20-%20Converting_a_model_with_TF_Lite_converter.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

(C) Umberto Michelucci

umberto.michelucci@toelt.ai

www.toelt.ai


First we simply import the packages we need. Note that we want to use `TensorFlow 2.0` and therefore we use the magic command `%tensorflow_version 2.x`. Note that this works only in Google colab, and no if you are using it on a local installation.

In [None]:
import numpy as np
import tensorflow as tf

Is always a good idea to check the version of `TensorFlow` that you are really using, to make sure you get what you need.

In [None]:
print(tf.__version__)

2.12.0


We first download, from the package `tf.keras.applications` the `MobileNetV2` pretrained network that we will use in this example. Note how we give the parameter `weights="imagenet"` that means we want to get the entire network with all the weights after the training with the `imagenet` dataset.

In [None]:
model = tf.keras.applications.MobileNetV2(
    weights="imagenet", input_shape=(224, 224, 3))

Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/mobilenet_v2/mobilenet_v2_weights_tf_dim_ordering_tf_kernels_1.0_224.h5


Now let's use the converted model

The most basic usage of `TensorFlow Lite` is simply to instantiate a `converter`, give the `optimizations` options, and then convert the model with `convert.convert()`.

## Model conversion

In [None]:
converter = tf.lite.TFLiteConverter.from_keras_model(model)
#converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.experimental_new_converter = True
tflite_model = converter.convert()



## Inference with the converted model

Now to do inference with the converted model, we first need to instantiate an `interpreter`, then allocate the tensors necessary (for input and output)

In [None]:
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()

Now we can simply get the information on the inputs (for example the shape) and the outputs of the model with the following code

In [None]:
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

In [None]:
print(input_details)

[{'name': 'serving_default_input_1:0', 'index': 0, 'shape': array([  1, 224, 224,   3], dtype=int32), 'shape_signature': array([ -1, 224, 224,   3], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]


Looking at the output you can see that the output has the shape `[1,1000]`, and that means that the model has been trained on `1000` classes.

In [None]:
print(output_details)

[{'name': 'StatefulPartitionedCall:0', 'index': 177, 'shape': array([   1, 1000], dtype=int32), 'shape_signature': array([  -1, 1000], dtype=int32), 'dtype': <class 'numpy.float32'>, 'quantization': (0.0, 0), 'quantization_parameters': {'scales': array([], dtype=float32), 'zero_points': array([], dtype=int32), 'quantized_dimension': 0}, 'sparsity_parameters': {}}]


The last step for doing inference is to get the input shape with

`input_details[0]['shape']`

and then just for testing purposes we get an array with random values with the right shape and then `invoke()` the `interpreter`.

In [None]:
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

This will get us the resulting tensor. So an array of 1000 values that will contain the probabilities for the input to be a specific class.

In [None]:
tflite_results = interpreter.get_tensor(output_details[0]['index'])
print(tflite_results.shape)

(1, 1000)


In [None]:
tf_results = model(tf.constant(input_data))
print(tf_results.shape)

(1, 1000)


This line will check that the result from the original model and the converted one are equal up to the 5th digit. Note that if you do quantization this will not be true anymore!

In [None]:
for tf_result, tflite_result in zip(tf_results, tflite_results):
  np.testing.assert_almost_equal(tf_result, tflite_result, decimal=5)

## Estimation of model size on disk

In [None]:
import pickle

size_estimate = len(pickle.dumps(tflite_model))
print(size_estimate)

13986741
