# Model Compression Toolkit (MCT) Wrapper API (Keras)

[Run this tutorial in Google Colab](https://colab.research.google.com/github/SonySemiconductorSolutions/mct-model-optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_mct_wrapper.ipynb)

### Attention

The MCT (Model Compression Toolkit) used in this tutorial requires TensorFlow 2.15 or earlier, which are not compatible with the default Google Colab environment (Python 3.12 or later).

**If you are running this tutorial on Google Colab, you must change the runtime type to use Python 3.11 before proceeding.**  
For detailed instructions, please refer to the [README.md](../../../README.md).

## Overview 
In this notebook, we provide a detailed explanation of the MCTWrapper class from the Model Compression Toolkit (MCT).
Using this class enables a consistent implementation, making it easy to compare various quantization methods.
In this tutorial, we take MobileNetV2 as an example and use MCTWrapper to apply the following quantization techniques:
PTQ (Post-Training Quantization), PTQ with Mixed Precision, GPTQ (Gradient-based PTQ), GPTQ with Mixed Precision.
By working through these methods, you will experience the convenience and flexibility of MCTWrapper, 
helping you to select the optimal quantization approach for your application.

## Summary
- **Setup**: Import required libraries and configure MCT with MobileNetV2 model
- **Dataset Preparation**: Load and prepare ImageNet validation dataset with representative data generation
- **Model Quantization using MCTWrapper**: Quantize the float model using MCTWrapper with four methods
  - **PTQ**: Perform PTQ
  - **PTQ + Mixed Precision**: Assign optimal quantization bit-width to each layer based on PTQ
  - **GPTQ**: Perform GPTQ
  - **GPTQ + Mixed Precision**: Assign optimal quantization bit-width to each layer based on GPTQ
- **Evaluation**: Evaluate accuracy of all quantization methods

## Setup

In [None]:
TF_VER = '2.15.0'
!pip install -q tensorflow~={TF_VER}
!pip install -q scipy

In [None]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
import keras
import tensorflow as tf
import scipy
from typing import Tuple, Generator, List, Any
import model_compression_toolkit as mct

Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format.

In [None]:
from keras.applications.mobilenet_v2 import MobileNetV2

float_model = MobileNetV2()

## Dataset Preparation
### Download ImageNet validation set
Download ImageNet dataset (validation split only).

This step may take several minutes...

**Note:** For demonstration purposes, we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.

In [None]:
import os
 
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    
    !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \
     mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val

The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders.

In [None]:
import shutil
from pathlib import Path
root = Path('./imagenet')
imgs_dir = root / 'ILSVRC2012_img_val'
target_dir = root /'val'

def extract_labels():
    mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)
    cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} 
    with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:
        return [cls_to_nid[int(cls)] for cls in f.readlines()]

if not target_dir.exists():
    labels = extract_labels()
    for lbl in set(labels):
        os.makedirs(target_dir / lbl)
    
    for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):
        shutil.move(imgs_dir / img_file, target_dir / lbl)

These functions generate a `tf.data.Dataset` from image files in a directory.

In [None]:
def imagenet_preprocess_input(images: tf.Tensor, labels: tf.Tensor):
    return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels

def get_dataset(batch_size: int, shuffle: bool):
    dataset = tf.keras.utils.image_dataset_from_directory(
        directory='./imagenet/val',
        batch_size=batch_size,
        image_size=[224, 224],
        shuffle=shuffle,
        crop_to_aspect_ratio=True,
        interpolation='bilinear')
    dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
    return dataset

## Representative Dataset
For quantization with MCT, we need to define a representative dataset. This dataset is a generator that returns a list of images:

In [None]:
batch_size = 16
n_iter = 10

dataset = get_dataset(batch_size, shuffle=True)

def representative_dataset_gen():
    for _ in range(n_iter):
        yield [dataset.take(1).get_single_element()[0].numpy()]

## Model Quantization using MCTWrapper

We implement quantizing example using MCTWrapper with four methods.

By specifying the SDSP converter version, you can select the optimal quantization settings for IMX500.
Here, we use the settings for SDSP Converter 3.14. For other settings, please see [here](https://github.com/SonySemiconductorSolutions/mct-model-optimization/tree/main/model_compression_toolkit/target_platform_capabilities).

**Note:** This tutorial sets the minimum parameters required to run MCTWrapper. For details on omitted parameters, refer to [MCT Documentation](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/classes/Wrapper.html#ug-wrapper).

**Note:** This tutorial uses parameters focused on shorter run time for demonstration, resulting in lower accuracy. For improve accuracy, refer to other tutorials.

Run PTQ with Keras

In [None]:
def PTQ_Keras(float_model: keras.Model) -> Tuple[bool, keras.Model]:
    """
    Perform PTQ on Keras model.
    
    Args:
        float_model: Original floating-point Keras model
    
    Returns:
        tuple: (success_flag, quantized_model)
    """
    # Configuration
    framework = 'tensorflow'          # Target framework (Keras/TensorFlow)
    method = 'PTQ'                    # Quantization method
    use_mixed_precision = False       # Disable mixed-precision quantization

    # Parameter configuration for PTQ
    param_items = [
        ['sdsp_version', '3.14'],                          # Version of the SDSP converter
        ['save_model_path', './qmodel_PTQ_Keras.keras']    # Path to save quantized model as Keras format
    ]

    # Execute quantization using MCTWrapper
    wrapper = mct.wrapper.mct_wrapper.MCTWrapper()
    flag, quantized_model = wrapper.quantize_and_export(
        float_model=float_model, 
        representative_dataset=representative_dataset_gen,
        framework=framework, 
        method=method, 
        use_mixed_precision=use_mixed_precision, 
        param_items=param_items)
    return flag, quantized_model

Run PTQ + Mixed Precision with Keras

In [None]:
def PTQ_Keras_mixed_precision(float_model: keras.Model) -> Tuple[bool, keras.Model]:
    """
    Perform PTQ with Mixed Precision on Keras model.
    
    Args:
        float_model: Original floating-point Keras model
    
    Returns:
        tuple: (success_flag, quantized_model)
    """
    # Configuration
    framework = 'tensorflow'          # Target framework (Keras/TensorFlow)
    method = 'PTQ'                    # Quantization method
    use_mixed_precision = True        # Enable mixed-precision quantization

    # Parameter configuration
    param_items = [
        ['sdsp_version', '3.14'],                                         # Version of the SDSP converter
        ['num_of_images', 5],                                             # Number of images for Mixed-Precision calibration
        ['weights_compression_ratio', 0.5],                               # Compression ratio of weights for Mixed-Precision
        ['save_model_path', './qmodel_PTQ_Keras_mixed_precision.keras']   # Path to save quantized model as Keras format
    ]

    # Execute quantization using MCTWrapper
    wrapper = mct.wrapper.mct_wrapper.MCTWrapper()
    flag, quantized_model = wrapper.quantize_and_export(
        float_model=float_model, 
        representative_dataset=representative_dataset_gen,
        framework=framework, 
        method=method, 
        use_mixed_precision=use_mixed_precision, 
        param_items=param_items)
    return flag, quantized_model

Run GPTQ (Gradient-based PTQ) with Keras

In [None]:
def GPTQ_Keras(float_model: keras.Model) -> Tuple[bool, keras.Model]:
    """
    Perform GPTQ on Keras model.
    
    Args:
        float_model: Original floating-point Keras model
    
    Returns:
        tuple: (success_flag, quantized_model)
    """
    # Configuration
    framework = 'tensorflow'          # Target framework (Keras/TensorFlow)
    method = 'GPTQ'                   # Quantization method
    use_mixed_precision = False       # Disable mixed-precision quantization

    # Parameter configuration
    param_items = [
        ['sdsp_version', '3.14'],                          # Version of the SDSP converter
        ['n_epochs', 5],                                   # Number of epochs for GPTQ optimization
        ['save_model_path', './qmodel_GPTQ_Keras.keras']   # Path to save quantized model as Keras format
    ]

    # Execute quantization using MCTWrapper
    wrapper = mct.wrapper.mct_wrapper.MCTWrapper()
    flag, quantized_model = wrapper.quantize_and_export(
        float_model=float_model, 
        representative_dataset=representative_dataset_gen,
        framework=framework, 
        method=method, 
        use_mixed_precision=use_mixed_precision, 
        param_items=param_items)
    return flag, quantized_model

Run GPTQ + Mixed Precision with Keras

In [None]:
def GPTQ_Keras_mixed_precision(float_model: keras.Model) -> Tuple[bool, keras.Model]:
    """
    Perform GPTQ with Mixed Precision on Keras model.
    
    Args:
        float_model: Original floating-point Keras model
    
    Returns:
        tuple: (success_flag, quantized_model)
    """
    # Configuration
    framework = 'tensorflow'          # Target framework (Keras/TensorFlow)
    method = 'GPTQ'                   # Quantization method
    use_mixed_precision = True        # Enable mixed-precision quantization

    # Parameter configuration
    param_items = [
        ['sdsp_version', '3.14'],                                          # Version of the SDSP converter
        ['n_epochs', 5],                                                   # Number of epochs for GPTQ optimization
        ['num_of_images', 5],                                              # Number of images for Mixed-Precision calibration
        ['weights_compression_ratio', 0.5],                                # Compression ratio of weights for Mixed-Precision
        ['save_model_path', './qmodel_GPTQ_Keras_mixed_precision.keras']   # Path to save quantized model as Keras format
    ]

    # Execute quantization using MCTWrapper
    wrapper = mct.wrapper.mct_wrapper.MCTWrapper()
    flag, quantized_model = wrapper.quantize_and_export(
        float_model=float_model, 
        representative_dataset=representative_dataset_gen,
        framework=framework, 
        method=method, 
        use_mixed_precision=use_mixed_precision, 
        param_items=param_items)
    return flag, quantized_model

### Run Quantization
Lastly, we quantize our model using MCTWrapper API.

In [None]:
# Basic PTQ
flag, quantized_model_ptq = PTQ_Keras(float_model)

In [None]:
# PTQ with Mixed Precision
flag, quantized_model_ptq_mixed_precision = PTQ_Keras_mixed_precision(float_model)

In [None]:
# GPTQ
flag, quantized_model_gptq = GPTQ_Keras(float_model)

In [None]:
# GPTQ with Mixed Precision
flag, quantized_model_gptq_mixed_precision = GPTQ_Keras_mixed_precision(float_model)

## Evaluation
Create dataset loader for evaluation with larger batch size for efficiency.

In [None]:
val_dataset = get_dataset(batch_size=50, shuffle=False)

Finally, let's evaluate each model.

In [None]:
# Original floating-point Keras model
float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
float_accuracy = float_model.evaluate(val_dataset)
print(f"Float model Accuracy: {(float_accuracy[1] * 100):.2f}%")

In [None]:
# PTQ model
quantized_model_ptq.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
ptq_quantized_accuracy = quantized_model_ptq.evaluate(val_dataset)
print(f"PTQ_Keras Accuracy: {(ptq_quantized_accuracy[1] * 100):.2f}%")

In [None]:
# PTQ + Mixed Precision model
quantized_model_ptq_mixed_precision.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
ptq_mixed_precision_quantized_accuracy = quantized_model_ptq_mixed_precision.evaluate(val_dataset)
print(f"PTQ_Keras_mixed_precision Accuracy: {(ptq_mixed_precision_quantized_accuracy[1] * 100):.2f}%")

In [None]:
# GPTQ model
quantized_model_gptq.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
gptq_quantized_accuracy = quantized_model_gptq.evaluate(val_dataset)
print(f"GPTQ_Keras Accuracy: {(gptq_quantized_accuracy[1] * 100):.2f}%")

In [None]:
# GPTQ + Mixed Precision model
quantized_model_gptq_mixed_precision.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
gptq_mixed_precision_quantized_accuracy = quantized_model_gptq_mixed_precision.evaluate(val_dataset)
print(f"GPTQ_Keras_mixed_precision Accuracy: {(gptq_mixed_precision_quantized_accuracy[1] * 100):.2f}%")

## Conclusion

In this tutorial, we demonstrated how to quantize a pre-trained model using MCTWrapper with a few lines of code.

## Copyrights

Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
