# Export a Quantized Keras Model With the Model Compression Toolkit (MCT)

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_export.ipynb)

## Overview
This tutorial demonstrates how to export a Keras model to `.keras` and TFLite formats using the Model Compression Toolkit (MCT). It covers the steps of creating a simple Keras model, applying post-training quantization (PTQ) using MCT, and then exporting the quantized model to `.keras` and TFLite. The tutorial also shows how to use the exported model for inference.

## Summary
In this tutorial, we will cover:

1. Constructing a simple Keras model for demonstration purposes.
2. Applying post-training quantization to the model using the Model Compression Toolkit.
3. Exporting the quantized model to the `.keras` and `TFLite` formats.
4. Using the exported model for inference.

## Setup
Install the relevant packages:

In [None]:
TF_VER = '2.14.0'

!pip install -q tensorflow=={TF_VER}

In [None]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
from keras.applications.mobilenet_v2 import MobileNetV2

float_model = MobileNetV2()

## Quantize the Model with the Model Compression Toolkit
Let's begin by applying quantization using MCT. This process will prepare the model for export.

### Representative Dataset
For post-training quantization with MCT, a representative dataset is required.  In this example, we use a random dataset for demonstration purposes.

In [None]:
import numpy as np
import model_compression_toolkit as mct

# Quantize the model.
# Notice that here the representative dataset is random for demonstration only.
quantized_exportable_model, _ = mct.ptq.keras_post_training_quantization(float_model,
                                                                         representative_data_gen=lambda: [np.random.random((1, 224, 224, 3))])

## Keras export
The model will be exported as a tensorflow `.keras` model, where both weights and activations are represented as dtype float32.
There are two optional formats available for export: MCTQ and FAKELY_QUANT.

#### MCTQ

By default, `mct.exporter.keras_export_model` exports the quantized Keras model to a `.keras` model using custom quantizers from the mct_quantizers module. 

In [None]:
# Path of exported model
keras_file_path = 'exported_model_mctq.keras'

# Export a keras model with mctq custom quantizers.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=keras_file_path)

Note that the model's size remains unchanged compared to the quantized exportable model, as the weight data types are still represented as floats.
#### MCTQ - Loading the Exported Model

To load the exported model with MCTQ quantizers, use `mct.keras_load_quantized_model`:

In [None]:
loaded_model = mct.keras_load_quantized_model(keras_file_path)

#### Fakely-Quantized Format
To export a fakely-quantized model, use the `QuantizationFormat.FAKELY_QUANT` option. This format ensures that quantization is simulated but does not alter the data types of the weights and activations during export.

In [None]:
# Path of exported model
keras_file_path = 'exported_model_fakequant.keras'

# Use mode KerasExportSerializationFormat.KERAS for a .keras model
# and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights
# and activations.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=keras_file_path,
                                quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)

Note that the fakely-quantized model has the same size as the quantized exportable model, as the weights are still represented as floats.

### TFLite
There are two optional tflite serializations available for export: `INT8` and `FAKELY_QUANT`.

#### INT8 TFLite

The model will be exported as a tflite model where weights and activations are represented as 8bit integers.

In [None]:
tflite_file_path = 'exported_model_int8.tflite'

# Use mode KerasExportSerializationFormat.TFLITE for tflite model and quantization_format.INT8.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=tflite_file_path,
                                serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,
                                quantization_format=mct.exporter.QuantizationFormat.INT8)

Compare size of float and quantized model:


In [None]:
import os

# Save float model to measure its size
float_file_path = 'exported_model_float.keras'
float_model.save(float_file_path)

print("Float model in Mb:", os.path.getsize(float_file_path) / float(2 ** 20))
print("Quantized model in Mb:", os.path.getsize(tflite_file_path) / float(2 ** 20))

#### Fakely-Quantized TFLite

The model will be exported as a tflite model where weights and activations are quantized but represented with a float data type.

In [None]:
# Path of exported model
tflite_file_path = 'exported_model_fakequant.tflite'


# Use mode KerasExportSerializationFormat.TFLITE for tflite model and QuantizationFormat.FAKELY_QUANT for fakely-quantized weights
# and activations.
mct.exporter.keras_export_model(model=quantized_exportable_model,
                                save_model_path=tflite_file_path,
                                serialization_format=mct.exporter.KerasExportSerializationFormat.TFLITE,
                                quantization_format=mct.exporter.QuantizationFormat.FAKELY_QUANT)

Note that the fakely-quantized model has the same size as the quantized exportable model, as the weights are still represented as floats.

Copyright 2024 Sony Semiconductor Solutions, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
