# How to Use the Network Editor to Easily Modify Quantization Configurations in the Model Compression Toolkit (MCT)

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_network_editor.ipynb)

## Overview
In this tutorial, we will demonstrate how to utilize the Model Compression Toolkit (MCT) to quantize a simple Keras model and modify the quantization configuration for specific layers using MCT’s network editor. The example model comprises a `Conv2D` layer followed by a `Dense` layer.

## Summary
In this tutorial, we will cover:

1. Quantizing the model using the default configuration and inspecting bit allocation for each layer.
2. Applying a custom edit rule to adjust the bit-width for the `Conv2D` layer.
3. Showcasing MCT’s flexibility for layer-specific quantization.

## Setup
Install and import the relevant packages:

In [None]:
TF_VER = '2.14.0'
!pip install -q tensorflow~={TF_VER}

In [None]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
import model_compression_toolkit as mct
import numpy as np
from tensorflow.keras.layers import Input, Conv2D, Dense
from tensorflow.keras.models import Model

Next, we will create a simple Keras model consisting of a `Conv2D` layer followed by a `Dense` layer.

In [None]:
input_shape = (16, 16, 3)

inputs = Input(shape=input_shape)
x = Conv2D(filters=1, kernel_size=(3, 3))(inputs)
x = Dense(units=10)(x)
model = Model(inputs=inputs, outputs=x)

### Represenatative Dataset
In this tutorial, for demonstration purposes and to expedite the process, we create a simple representative dataset generator using random data. This generator produces batches of random input data that match the model’s input shape.

In [None]:
batch_size = 1
def representative_data_gen():
    yield [np.random.randn(batch_size, *input_shape)]

## Model Quantization with MCT
Let’s define a function that takes a Keras model, a representative data generator, and a core configuration for quantization. The function will use the MCT’s post-training quantization (PTQ) API to apply quantization to the model.

In [None]:
def quantize_keras_mct(model, representative_data_gen, core_config):
  quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(
      in_model=model,
      representative_data_gen=representative_data_gen,
      core_config=core_config
  )
  return quantized_model

We define a function to inspect the bit-width used for quantizing specific layers. The function retrieves and prints the bit-width for the `kernel` attribute in both the `Conv2D` and `Dense` layers.

In [None]:
def print_model_weights_by_layer(model):
    conv2d_layer = model.layers[2]
    conv2d_nbits = conv2d_layer.weights_quantizers['kernel'].get_config()['num_bits']
    
    dense_layer = model.layers[4]
    dense_nbits = dense_layer.weights_quantizers['kernel'].get_config()['num_bits']
    
    print(f"Conv2D nbits: {conv2d_nbits}, Dense nbits: {dense_nbits}")

### Quantization
In this section, we start by setting a default core configuration for quantization using MCT’s `CoreConfig`. With this configuration, the model is quantized using the default 8-bit precision for all layer types. Next, we print the bit-width settings to verify the quantization of both the Conv2D and Dense layers.

In [None]:
# Use default core config for observing baseline quantized model
core_config = mct.core.CoreConfig()
quantized_model = quantize_keras_mct(model, representative_data_gen, core_config)
print_model_weights_by_layer(quantized_model)

## Edit Configration Using Edit Rules List

 Now, let's customize the quantization process for specific layers using MCT’s network editor. We create an `EditRule` with a `NodeTypeFilter` targeting the `Conv2D` layer type.

The associated action changes the kernel attribute’s bit-width to 4 bits instead of the default 8 bits. This rule is then added to an `edit_rules_list`, which is passed to `DebugConfig`.

The custom `DebugConfig` is used to create a `CoreConfig`, enabling `Conv2D` layers to be quantized at 4 bits while other layers retain the default configuration.

In [None]:
edit_rules_list = [
    mct.core.network_editor.EditRule(
        filter=mct.core.network_editor.NodeTypeFilter(Conv2D),
        action=mct.core.network_editor.ChangeCandidatesWeightsQuantConfigAttr(attr_name='kernel', weights_n_bits=4)
    )
]

debug_config = mct.core.DebugConfig(network_editor=edit_rules_list)
core_config_edit_weight_bits = mct.core.CoreConfig(debug_config=debug_config)

Now we will apply this customized quantization configuration to the Keras model.

By calling `quantize_keras_mct` with the `core_config` containing our edit rule, we quantize the `Conv2D` layer using 4 bits as specified. The resulting `quantized_model` reflects these changes, which we verify by inspecting the bit-width used in both the `Conv2D` and `Dense` layers.

The output confirms the effect of the edit rule: the `Conv2D` layer is quantized with 4 bits, while the `Dense` layer retains the default 8-bit setting.

In [None]:
quantized_model = quantize_keras_mct(model, representative_data_gen, core_config_edit_weight_bits)
print_model_weights_by_layer(quantized_model)

## Edit Z-Threshold for Activation Quantization
In model quantization, the Z-Threshold helps manage outliers in activation data, which can negatively impact the efficiency and accuracy of the quantization process. It sets a boundary to exclude extreme values when determining quantization parameters, improving robustness and model performance.

Adjusting the Z-Threshold is useful for fine-tuning model accuracy and handling outliers. A higher Z-Threshold includes more data, potentially accounting for outliers, while a lower value effectively filters them out.

The following code demonstrates how to customize the Z-Threshold for specific layer types, such as `Conv2D`, using MCT’s network editor. By default, all layers have an infinite threshold, meaning no outlier removal occurs.

In [None]:
z_threshold_target = 5
edit_rules_list = [
    mct.core.network_editor.EditRule(
        filter=mct.core.network_editor.NodeTypeFilter(Conv2D),
        action=mct.core.network_editor.ChangeCandidatesActivationQuantConfigAttr(z_threshold=z_threshold_target)
    )
]

debug_config = mct.core.DebugConfig(network_editor=edit_rules_list)
core_config_edit_z_threshold = mct.core.CoreConfig(debug_config=debug_config)
quantized_model = quantize_keras_mct(model, representative_data_gen, core_config_edit_z_threshold)

## Conclusion
In this tutorial, we explored how to leverage the Model Compression Toolkit (MCT) for quantizing Keras models and customizing the quantization configuration for specific layers using the network editor. We started by applying the default 8-bit quantization and inspecting the results. Then, we demonstrated how to use the network editor to modify the bit-width for individual layers and fine-tune activation quantization using Z-Threshold adjustments.


Copyright 2024 Sony Semiconductor Solutions, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
