# Deeplabv3+ and Mixed-Precision Post-Training Quantization in PyTorch using the Model Compression Toolkit(MCT)

## Overview
This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a DeepLabv3+ semantic segmentation model. We will load a pre-trained model and quantize it using the MCT with **Mixed-Precision Post-Training Quantization (PTQ)** .

## Summary
In this tutorial, we will cover:

1. Loading and preprocessing PASCAL VOC's dataset.
2. Constructing an unlabeled representative dataset.
3. Post-Training Quantization using MCT.
4. Accuracy evaluation of the floating-point and the quantized models.

## DeepLabV3Plus-Pytorch(Dependent External Repository)
This tutorial uses the repository linked below. Installation instructions are provided in the **Setup** section.  
The model uses MobileNetV2 as its backbone.
[DeepLabV3Plus-Pytorch](https://github.com/VainF/DeepLabV3Plus-Pytorch)

### License(DeepLabV3Plus-Pytorch)
MIT License

Copyright (c) 2020 Gongfan Fang

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

## Setup 
First, clone the GitHub repository.
This repository is mentioned earlier.

In [None]:
import os

if not os.path.isdir('DeepLabV3Plus-Pytorch'):
    !git clone https://github.com/VainF/DeepLabV3Plus-Pytorch.git

Next, please download the pre-trained model **best_deeplabv3plus_mobilenet_voc_os16.pth** from the following link. (Dropbox or Tencent Weiyun)  
[1.Available Architectures(DeepLabV3Plus-Pytorch)](https://github.com/VainF/DeepLabV3Plus-Pytorch?tab=readme-ov-file#1-available-architectures)

We will modify **DeepLabV3Plus-Pytorch/network/modeling.py** to set dilation to only 1 for IMX500-compatible models.  
This modification may slightly reduce accuracy.  
Run the following command to apply the modification:

In [None]:
!sed -i 's/    aspp_dilate = \[12, 24, 36\]/    aspp_dilate = [1, 1, 1] #aspp_dilate = [12, 24, 36]/g' ./DeepLabV3Plus-Pytorch/network/modeling.py
!sed -i 's/    aspp_dilate = \[6, 12, 18\]/    aspp_dilate = [1, 1, 1] #aspp_dilate = [6, 12, 18]/g' ./DeepLabV3Plus-Pytorch/network/modeling.py

```python
def _segm_mobilenet(name, backbone_name, num_classes, output_stride, pretrained_backbone):
    if output_stride==8:
        aspp_dilate = [1, 1, 1] #aspp_dilate = [12, 24, 36]
    else:
        aspp_dilate = [1, 1, 1] #aspp_dilate = [6, 12, 18]
```

Install the relevant packages:  
This step may take several minutes...


In [None]:
!pip install torch==2.6.0 torchvision==0.21.0
!pip install onnx==1.16.1
!pip install numpy==1.26.4
!pip install opencv-python==4.9.0.80
!pip install torchmetrics

In [None]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [None]:
import torch
from torchvision.datasets import VOCSegmentation
from torchvision import transforms
import itertools
from torch.utils.data import DataLoader
from tqdm import tqdm
import sys
from torchmetrics import JaccardIndex
sys.path.append('./DeepLabV3Plus-Pytorch')
import network

### Various Settings
Here, you can configure the parameters listed below.  

#### Parameter setting
- IMG_HEIGHT, IMG_WIDTH  
  This parameter allows you to set the size of input images.
- NUM_WORKERS  
  This parameter allows you to set the number of processes for parallelizing the data loading process.
- CALIB_ITER  
  This parameter allows you to set how many samples to use when generating representative data for quantization.
- WEIGHTS_COMPRESSION_RATIO  
  This parameter allows you to set the quantization ratio based on the weight size of the 8-bit model when using Mixed-precision quantization.

In [None]:
# Parameter setting
IMG_HEIGHT = 224
IMG_WIDTH = 224
CALIB_ITER = 10
NUM_WORKERS = 1
WEIGHTS_COMPRESSION_RATIO = 0.95

Load a pre-trained Deeplabv3+(MobileNetV2 backbone) model.  

In [None]:
model_builder = network.modeling.__dict__["deeplabv3plus_mobilenet"]
float_model = model_builder(output_stride=16)
float_model.load_state_dict(torch.load( "best_deeplabv3plus_mobilenet_voc_os16.pth", weights_only=False)['model_state'])

## Dataset preparation
### Download PASCAL VOC's dataset

**Note**  
In this tutorial, we will use a subset of PASCAL VOC 2012 dataset for calibration during quantization and for evaluation.

This step may take several minutes...



In [None]:
if not os.path.isdir('VOC_dataset'):
    !mkdir VOC_dataset
    !wget -P VOC_dataset https://datasets.cms.waikato.ac.nz/ufdl/data/pascalvoc2012/VOCtrainval_11-May-2012.tar
    !tar -xf VOC_dataset/VOCtrainval_11-May-2012.tar -C ./VOC_dataset

### Prepare PASCAL VOC's dataset

In [None]:
# Preprocess of VOC dataset.
transform_img = transforms.Compose(
    [
        transforms.Resize((IMG_WIDTH,IMG_HEIGHT)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

transform_target = transforms.Compose(
    [
        transforms.Resize((IMG_WIDTH,IMG_HEIGHT), interpolation=transforms.InterpolationMode.NEAREST),
        transforms.PILToTensor(),
    ]
)

In [None]:
train_dataset = VOCSegmentation(root="./VOC_dataset/", year='2012', image_set='train', transform = transform_img, target_transform=transform_target)
val_dataset = VOCSegmentation(root="./VOC_dataset/", year='2012', image_set='val',  transform = transform_img, target_transform=transform_target)

In [None]:
# For evaluation (batch size 1)
val_dataloader = DataLoader(
    val_dataset, batch_size=1, shuffle=False,
    num_workers=NUM_WORKERS,
    
)

# For calibration（No label required）
calib_loader = DataLoader(
    train_dataset, batch_size=1, shuffle=True,
    num_workers=NUM_WORKERS,
)

print(len(train_dataset))
print(len(val_dataset))

## Representative Dataset
For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:

In [None]:
def representative_dataset_gen():
    for sample in itertools.islice(itertools.cycle(calib_loader), CALIB_ITER):
        yield [sample[0]]

## Target Platform Capabilities (TPC)
In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html)). Here, we use the default Pytorch TPC:

In [None]:
import model_compression_toolkit as mct

tpc = mct.get_target_platform_capabilities('pytorch', 'default')

## Mixed-Precision Configurations
We will create a `MixedPrecisionQuantizationConfig` that defines the search options for Mixed-Precision:


In [None]:
configuration = mct.core.CoreConfig(
    mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=CALIB_ITER))

In [None]:
# Get Resource Utilization information to constraint your model's memory size.
resource_utilization_data = mct.core.pytorch_resource_utilization_data(
    float_model,
    representative_dataset_gen,
    configuration,
    target_platform_capabilities=tpc)

# Create a ResourceUtilization object 
resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * WEIGHTS_COMPRESSION_RATIO)

# Post-Training Quantization using MCT
Now for the exciting part! Let's run PTQ on the model.

In [None]:
quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(
                                        in_module=float_model,
                                        representative_data_gen=representative_dataset_gen,
                                        target_platform_capabilities=tpc,
                                        core_config=configuration,
                                        target_resource_utilization=resource_utilization)

# Model Evaluation
Now, we will create a function for evaluating a model.  
The inference results before and after quantization are displayed on the terminal.

In [None]:
@torch.no_grad()
def evaluate(model: torch.nn.Module, val_dataloader: DataLoader,
             num_classes: int = 21):
    """
    Evaluation of the PASCAL VOC dataset.

    Args:
        model (torch.nn.Module): Evaluation model.
        val_dataloader (DataLoader): Evaluation dataset.
        num_classes (int): num of classes(defualt:21)
    """
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval()

    miou_metric = JaccardIndex(task="multiclass", num_classes=num_classes, ignore_index=255).to(device)
    for sample in tqdm(val_dataloader, desc="Evaluating"):
        img, target = sample
        img = img.to(device)
        target = target.squeeze(1).long().to(device)

        logits = model(img)
        preds = torch.argmax(logits, dim=1)
        
        miou_metric.update(preds, target)

    miou = miou_metric.compute()
    print(f"VOC2012 val mIoU: {miou.item():.4f}")

Let's start with the floating-point model evaluation.  
This step may take several minutes...

In [None]:
print("evaluating float model (VOC mIoU)...")
evaluate(model=float_model, val_dataloader=val_dataloader)

Finally, let's evaluate the quantized model:  
This step may take several minutes...

In [None]:
print("evaluating quantized model (VOC mIoU) ...")
evaluate(model=quantized_model, val_dataloader=val_dataloader)

## Copyrights

Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
