# Gradient-Based Post Training Quantization using the Model Compression Toolkit - A Quick-Start Guide

[Run this tutorial in Google Colab](https://colab.research.google.com/github/sony/model_optimization/blob/main/tutorials/notebooks/mct_features_notebooks/pytorch/example_pytorch_mobilenet_gptq.ipynb)

## Overview

This tutorial demonstrates a pre-trained model quantization using the **Model Compression Toolkit (MCT)** with **Gradient-based PTQ (GPTQ)**. 
GPTQ stands as an optimization procedure that markedly enhances the performance of models undergoing post-training quantization.
This is achieved through an optimization process applied post-quantization, specifically adjusting the rounding of quantized weights.
GPTQ is especially effective in case of low bit width quantization and mixed precision quantization.


## Summary

In this tutorial we will cover:

1. Gradient-Based Post-Training Quantization using MCT.
2. Loading and preprocessing ImageNet's validation dataset.
3. Constructing an unlabeled representative dataset.
4. Accuracy evaluation of the floating-point and the quantized models.

## Setup

Install and import the relevant packages:

In [None]:
!pip install -q torch torchvision onnx
!pip install -q tqdm

In [None]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
import torch
from torch.utils.data import DataLoader
from torchvision.models import mobilenet_v2, MobileNet_V2_Weights
from torchvision.datasets import ImageNet


Load a pre-trained MobileNetV2 model from torchvision, in 32-bits floating-point precision format.

In [None]:
weights = MobileNet_V2_Weights.IMAGENET1K_V2

float_model = mobilenet_v2(weights=weights)

## Dataset preparation

**Note** that for demonstration purposes we use the validation set for the model quantization and GPTQ optimization. Usually, a subset of the training dataset is used, but loading it is a heavy procedure that is unnecessary for the sake of this demonstration.

This step may take several minutes...

In [None]:
import os

if not os.path.isdir('imagenet'):
    !mkdir imagenet
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_devkit_t12.tar.gz
    !wget -P imagenet https://image-net.org/data/ILSVRC/2012/ILSVRC2012_img_val.tar

Extract ImageNet validation dataset using torchvision "datasets" module

In [None]:
dataset = ImageNet(root='./imagenet', split='val', transform=weights.transforms())

### Representative Dataset

GPTQ is a gradient-based optimization process, which requires representative dataset to perform inference and compute gradients. 

Separate representative datasets can be used for the PTQ statistics collection and for GPTQ. In this tutorial we use the same representative dataset for both.

A complete pass through the representative dataset generator constitutes an epoch (batch_size x n_iter samples). 

In [None]:
batch_size = 50
n_iter = 10

dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)

def representative_dataset_gen():
    dataloader_iter = iter(dataloader)
    for _ in range(n_iter):
        yield [next(dataloader_iter)[0]]


## Model Gradient-Based Post-Training quantization using MCT

This is the main part in which we quantize and our model.

Next, we create a **GPTQ configuration** with possible GPTQ optimization options (such as the number of epochs for the optimization process). 
MCT will quantize the model and start the GPTQ process to optimize the model's parameters and quantization parameters.

In addition, we need to define a `TargetPlatformCapability` object, representing the HW specifications on which we wish to eventually deploy our quantized model.

In [None]:
import model_compression_toolkit as mct

# Create a GPTQ quantization configuration and set the number of training iterations. 
# 50 epochs are sufficient for this tutorial. For GPTQ run after mixed precision quantization a higher number of iterations
# will be required.
gptq_config = mct.gptq.get_pytorch_gptq_config(n_epochs=50)

# Specify the target platform capability (TPC)
tpc = mct.get_target_platform_capabilities("pytorch", 'imx500', target_platform_version='v1')

### Run model Gradient-based Post-Training Quantization
Finally, we quantize our model using MCT's GPTQ API (this may take several minutes).

In [None]:
quantized_model, quantization_info = mct.gptq.pytorch_gradient_post_training_quantization(
    float_model,
    representative_dataset_gen,
    gptq_config=gptq_config,
    target_platform_capabilities=tpc
)

That's it! Our model is now quantized.

## Models evaluation

In order to evaluate our models, we first need to load the validation dataset. 

In [None]:
val_dataloader = DataLoader(dataset, batch_size=50, shuffle=False)

Now, we will create a function for evaluating a model.

In [None]:
from tqdm import tqdm

def evaluate(model, testloader):
    """
    Evaluate a model using a test loader.
    """
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval()  # Set the model to evaluation mode
    correct = 0
    total = 0
    with torch.no_grad():
        for data in tqdm(testloader):
            images, labels = data
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
    val_acc = (100 * correct / total)
    print('Accuracy: %.2f%%' % val_acc)
    return val_acc


Let's start with the floating-point model evaluation.

In [None]:
evaluate(float_model, val_dataloader)

Finally, let's evaluate the quantized model:

In [None]:
evaluate(quantized_model, val_dataloader)

You can see that we got a very small degradation with a compression rate of x4 !
Now, we can export the model to ONNX:

In [None]:
mct.exporter.pytorch_export_model(quantized_model, save_model_path='qmodel.onnx', repr_dataset=representative_dataset_gen)

## Conclusion
In this tutorial, we demonstrated how to quantize a pre-trained model using MCT with gradient-based optimization with a few lines of code. We saw that we can achieve an x4 compression ratio with minimal performance degradation.

## Copyrights

Copyright 2023 Sony Semiconductor Solutions, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
