# Enhancing Post-Training Quantization with Z-Score Outlier Handling
[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/keras/example_keras_activation_z_score_threshold.ipynb)

## Overview
This tutorial demonstrates the process used to find the activation z-score threshold, a step that MCT can use during post-training quantization.

In this example we will explore how setting different z scores effects threshold and accuracy. We will start by demonstrating how to apply the corresponding MCT configurations, then, we will feed a representative dataset through the model, plot the activation distribution of an activation layer with their respective MCT calculated z-score thresholds, and finally compare the quantized model accuracy of the examples of different z-score.

## Managing Outliers with Activation Z-Score Thresholding
During the quantization process, thresholds are used to map a distribution of 32-bit floating-point values to their quantized equivalents. Achieving this with minimal data loss while preserving the most representative range is crucial for maintaining the model’s final accuracy.

Some models can exhibit anomalous values when evaluated on a representative dataset. These outliers can negatively impact the range selection, leading to suboptimal quantization. To ensure a more reliable range mapping, it is beneficial to remove these values.

The **Model Compression Toolkit (MCT)** provides an option to filter out such outliers using **Z-score thresholding**, allowing users to exclude values based on their deviation from the standard distribution.

The Z-score of a value is calculated by subtracting the dataset’s mean from the value and then dividing by the standard deviation. This metric indicates how many standard deviations a particular value is away from the mean.



The quantization threshold, $t$, is defined as a function of $Z_t$, the mean, $μ$, and the standard deviation, $σ$, of the activation values:

$$
t(Z_t) = μ + Z_t \cdot σ
$$


Where:

- $t(Z_t)$: The calculated quantization threshold based on the Z-score threshold $Z_t$.
- $Z_t$: The chosen Z-score threshold. It indicates how many standard deviations an activation value must be from the mean to qualify for removal or special handling prior to quantization.
- $\mu = \frac{1}{n_s} \sum_{X \in F_l(D)} X$: The mean of activations
- $\sigma = \sqrt{\frac{1}{n_s} \sum_{X \in F_l(D)} (X - \mu)^2}$: The standard deviation of activations in $F_l(D)$.
    where:
    - $F_l(D)$: Represents the distribution of activation values.
    - $X$: An individual activation within the distribution.


This equation for $t(Z_t)$ enables the identification of activation values that deviate significantly from the mean, helping to remove outliers before the main quantization step. This process results in a more reliable range for mapping floating-point values to quantized representations, ultimately improving quantization accuracy.
## Setup
Install the relevant packages:

In [None]:
TF_VER = '2.14.0'
!pip install -q tensorflow~={TF_VER}

In [None]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
import tensorflow as tf
import keras

Load a pre-trained MobileNetV2 model from Keras, in 32-bits floating-point precision format.

In [None]:
from keras.applications.mobilenet_v2 import MobileNetV2

float_model = MobileNetV2()

## Dataset preparation
### Download the ImageNet validation set
Download the ImageNet dataset with only the validation split.
**Note:** For demonstration purposes we use the validation set for the model quantization routines. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.

This step may take several minutes...

In [None]:
import os
 
if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar
    
    !cd imagenet && tar -xzf ILSVRC2012_devkit_t12.tar.gz && \
     mkdir ILSVRC2012_img_val && tar -xf ILSVRC2012_img_val.tar -C ILSVRC2012_img_val

The following code organizes the extracted data into separate folders for each label, making it compatible with Keras dataset loaders.

In [None]:
from pathlib import Path
import shutil

root = Path('./imagenet')
imgs_dir = root / 'ILSVRC2012_img_val'
target_dir = root /'val'

def extract_labels():
    !pip install -q scipy
    import scipy
    mat = scipy.io.loadmat(root / 'ILSVRC2012_devkit_t12/data/meta.mat', squeeze_me=True)
    cls_to_nid = {s[0]: s[1] for i, s in enumerate(mat['synsets']) if s[4] == 0} 
    with open(root / 'ILSVRC2012_devkit_t12/data/ILSVRC2012_validation_ground_truth.txt', 'r') as f:
        return [cls_to_nid[int(cls)] for cls in f.readlines()]

if not target_dir.exists():
    labels = extract_labels()
    for lbl in set(labels):
        os.makedirs(target_dir / lbl)
    
    for img_file, lbl in zip(sorted(os.listdir(imgs_dir)), labels):
        shutil.move(imgs_dir / img_file, target_dir / lbl)


These functions generate a `tf.data.Dataset` from image files in a directory.

In [None]:
def imagenet_preprocess_input(images, labels):
    return tf.keras.applications.mobilenet_v2.preprocess_input(images), labels

def get_dataset(batch_size, shuffle):
    dataset = tf.keras.utils.image_dataset_from_directory(
        directory='./imagenet/val',
        batch_size=batch_size,
        image_size=[224, 224],
        shuffle=shuffle,
        crop_to_aspect_ratio=True,
        interpolation='bilinear')
    dataset = dataset.map(lambda x, y: (imagenet_preprocess_input(x, y)), num_parallel_calls=tf.data.AUTOTUNE)
    dataset = dataset.prefetch(buffer_size=tf.data.AUTOTUNE)
    return dataset

## Representative Dataset
For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:

In [None]:
batch_size = 32
n_iter = 10

dataset = get_dataset(batch_size, shuffle=True)

def representative_dataset_gen():
    for _ in range(n_iter):
        yield [dataset.take(1).get_single_element()[0].numpy()]

## Target Platform Capabilities
MCT optimizes the model for dedicated hardware. This is done using TPC (for more details, please visit our [documentation](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html)). Here, we use the default Tensorflow TPC:

In [None]:
import model_compression_toolkit as mct

# Get a FrameworkQuantizationCapabilities object that models the hardware for the quantized model inference. Here, for example, we use the default platform that is attached to a Keras layers representation.
target_platform_cap = mct.get_target_platform_capabilities('tensorflow', 'default')

## Post-Training Quantization using MCT
This step we quantize the model with a few Z-score thresholds.
The quantization parameters are predefined, and we use the default values except for the quantization method. Feel free to modify the code below to experiment with other Z-scores values.

In [None]:
# List of error methods to iterate over
q_configs_dict = {}

# Z-score values to iterate over
z_score_values = [3,5,9]

# Iterate and build the QuantizationConfig objects
for z_score in z_score_values:
    q_config = mct.core.QuantizationConfig(
        z_threshold=z_score,
    )
    q_configs_dict[z_score] = q_config

Now we will run post-training quantization for each configuration:

In [None]:
quantized_models_dict = {}

for z_score, q_config in q_configs_dict.items():
    # Create a CoreConfig object with the current quantization configuration
    ptq_config = mct.core.CoreConfig(quantization_config=q_config)

    # Perform MCT post-training quantization
    quantized_model, quantization_info = mct.ptq.keras_post_training_quantization(
        in_model=float_model,
        representative_data_gen=representative_dataset_gen,
        core_config=ptq_config,
        target_platform_capabilities=target_platform_cap
    )

    # Update the dictionary to include the quantized model
    quantized_models_dict[z_score] = {
        "quantization_config": q_config,
        "quantized_model": quantized_model,
        "quantization_info": quantization_info
    }


### Z-Score Threshold and Distribution Visualization
To aid in understanding, we will plot the activation distribution of an activation layer in MobileNetV2. This distribution will be generated by inferring a representative dataset through the model.

To visualize the activations, the model must be rebuilt up to and including the selected layer. Once the activations are extracted, we can calculate their Z-score threshold values manually using the equation provided in the introduction.

Before plotting the distribution, we need to list the layer names. With Keras, this can be done easily using the following code. We determined the index of the layer of interest through a series of checks, which are detailed in the appendix section.

In [None]:
#print layer name
layer_name = float_model.layers[51].name
print(layer_name)

The example activation layer in the model is named `conv_dw_8_relu`.

We will use this layer name to build a model that ends at `conv_dw_8_relu`.

In [None]:
from tensorflow.keras.models import Model

layer_output = float_model.get_layer(layer_name).output
activation_model_relu = Model(inputs=float_model.input, outputs=layer_output)

Infer the representative dataset using these models and store the outputs for further analysis.

In [None]:
import numpy as np
activation_batches_relu = []
activation_batches_project = []
for images in representative_dataset_gen():
    activations_relu = activation_model_relu.predict(images)
    activation_batches_relu.append(activations_relu)

all_activations_relu = np.concatenate(activation_batches_relu, axis=0).flatten()

We can compute the Z-score for a layer using the formulas provided in the introduction.

In [None]:
optimal_thresholds_relu = {}

# Calculate the mean and standard deviation of the activation data
mean = np.mean(all_activations_relu)
std_dev = np.std(all_activations_relu)

# Calculate and store the threshold for each Z-score
for zscore in z_score_values:
    optimal_threshold = zscore * std_dev + mean
    optimal_thresholds_relu[f'z-score {zscore}'] = optimal_threshold

### Distribution Plots
In this section, we visualize the activation distribution from the constructed model along with the corresponding Z-score thresholds.
From this list, we randomly select layers and evaluate their corresponding thresholds.

In [None]:
mse_error_thresholds = {
    z_score: data["quantized_model"].layers[53].activation_holder_quantizer.get_config()['threshold'][0]
    for z_score, data in quantized_models_dict.items()
}
print(mse_error_thresholds)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Plotting
plt.figure(figsize=(10, 6))
plt.hist(all_activations_relu, bins=100, alpha=0.5, label='Activations')
for z_score, threshold in optimal_thresholds_relu.items():
    random_color=np.random.rand(3,)
    plt.axvline(threshold, linestyle='--', linewidth=2, color=random_color, label=f'{z_score}, z-score threshold: {threshold:.2f}')
    z_score_1 = int(z_score.split(' ')[1])  # Splits the string and converts the second element to an integer
    error_value = mse_error_thresholds[z_score_1]  # Now using the correct integer key to access the value
    plt.axvline(error_value, linestyle='-', linewidth=2, color=random_color, label=f'{z_score}, MSE error Threshold: {error_value:.2f}')

plt.title('Activation Distribution with Optimal Quantization Thresholds - First ReLU Layer')
plt.xlabel('Activation Value')
plt.ylabel('Frequency')
plt.legend()
plt.show()

The impact of the Z-score on the error threshold is clearly visible here. A lower Z-score, such as 3, decreases the error threshold for the given layer.

## Model Evaluation
Finally, we can demonstrate how varying Z-score thresholds affect the model's accuracy.
In order to evaluate our models, we first need to load the validation dataset.

In [None]:
val_dataset = get_dataset(batch_size=50, shuffle=False)

In [None]:
float_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics="accuracy")
float_accuracy = float_model.evaluate(val_dataset)
print(f"Float model's Top 1 accuracy on the Imagenet validation set: {(float_accuracy[1] * 100):.2f}%")

In [None]:
#prepare quantised models and evaluate
evaluation_results = {}

for z_score, data in quantized_models_dict.items():
    quantized_model = data["quantized_model"]

    quantized_model.compile(loss=keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])

    results = quantized_model.evaluate(val_dataset, verbose=0)  # Set verbose=0 to suppress the log messages

    evaluation_results[z_score] = results

    # Print the results
    print(f"Results for {z_score}: Loss = {results[0]}, Accuracy = {results[1]}")

We observe only minor improvements when adjusting the Z-score threshold. This pattern is common for most simple models. However, our testing shows that transformer models tend to benefit more from outlier removal. It is advisable to experiment with these parameters if the quantized accuracy is noticeably lower than the float model’s accuracy.

## Conclusion
In this tutorial, we demonstrated the use of Z-score thresholding as a critical step in the quantization process. This technique helps refine activation ranges by removing outliers, ultimately leading to improved quantized model accuracy. You can use the provided code as a starting point to experiment with selecting optimal Z-score thresholds for your own models.

Our testing indicates that the optimal Z-score threshold typically falls between 8 and 12. Setting the threshold above 12 tends to show negligible improvement, while values below 8 may distort the distribution. However, finding the right threshold will require experimentation based on the specific characteristics of your model and use case.

By applying Z-score thresholding thoughtfully, you can mitigate quantization errors and ensure that the quantized model's performance remains as close as possible to that of the original floating-point version.

## Appendix
Below are selected code samples used to identify the most suitable layers for plotting thresholds and distributions.

**Listing Layers Affected by Z-Score Adjustments**
The following code snippet provides a list of layers that are impacted by Z-score thresholding, helping to determine which layers to focus on when visualizing distributions:

In [None]:
# Initialize a dictionary to hold threshold values for comparison
thresholds_by_index = {}

# Try to access each layer for each quantized model and collect threshold values
for z_score, data in quantized_models_dict.items():
    quantized_model = data["quantized_model"]
    for layer_index in range(len(quantized_model.layers)):
        try:
            # Attempt to access the threshold value for this layer
            threshold = quantized_model.layers[layer_index].activation_holder_quantizer.get_config()['threshold'][0]
            # Store the threshold value for comparison
            if layer_index not in thresholds_by_index:
                thresholds_by_index[layer_index] = set()
            thresholds_by_index[layer_index].add(threshold)
        except Exception as e:
            pass

# Find indices where threshold values are not consistent
inconsistent_indices = [index for index, thresholds in thresholds_by_index.items() if len(thresholds) > 1]

print("Inconsistent indices:", inconsistent_indices)



Next, we want to verify which layers correspond to the indices based on the layer names in the original float model. For example, index 52 has no matching layer, as it represents a quantized version of the previous layer. However, checking index 51 reveals that it aligns with the layer named `conv_dw_8_relu`, which we can use to plot the distribution.

In [None]:
target_z_score = 9

for index, layer in enumerate(float_model.layers):
    search_string = str(layer.name)

    # Check if the target_z_score is in the quantized_models_dict
    if target_z_score in quantized_models_dict:
        data = quantized_models_dict[target_z_score]
        # Iterate over each layer of the target quantized model
        for quantized_index, quantized_layer in enumerate(data["quantized_model"].layers):
            found = search_string in str(quantized_layer.get_config())
            # If found, print details including the indices of the matching layers
            if found:
                print(f"Float Model Layer Index {index} & Quantized Model Layer Index {quantized_index}: Found match in layer name  {search_string}")
    else:
        print(f"Z-Score {target_z_score} not found in quantized_models_dict.")


In [None]:
data["quantized_model"].layers[51].get_config()



Copyright 2024 Sony Semiconductor Israel, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
