# PoseNet and Mixed-Precision Post-Training Quantization in PyTorch using the Model Compression Toolkit(MCT)

## Overview
This quick-start guide explains how to use the **Model Compression Toolkit (MCT)** to quantize a PoseNet model. We will load a pre-trained model and quantize it using the MCT with **Mixed-Precision Post-Training Quantization (PTQ)** .

## Summary
In this tutorial, we will cover:

1. Loading and preprocessing COCO’s dataset.
2. Constructing an unlabeled representative dataset.
3. Post-Training Quantization using MCT.
4. Accuracy evaluation of the floating-point and the quantized models.

## posenet-pytorch(Dependent External Repository)
This tutorial uses the repository linked below. Installation instructions are provided in the **Setup** section.  
This repository accesses Google's TensorFlow.js version of the PoseNet model and converts the retrieved model into a PyTorch model.  
The model uses MobileNetV1 as its backbone.  
You can choose from four model depths: 50, 75, 100, and 101.Model selection can be configured in the **Parameter setting** section described later.  
[posenet-pytorch](https://github.com/michellelychan/posenet-pytorch)

### License(posenet-pytorch)
Copyright 2018 Ross Wightman

Licensed under the Apache License, Version 2.0 (the "License");  
you may not use this file except in compliance with the License.  
You may obtain a copy of the License at  

http://www.apache.org/licenses/LICENSE-2.0

## Setup 
First, clone the GitHub repository.
This repository is mentioned earlier.

In [138]:
import os

if not os.path.isdir('posenet-pytorch'):
    !git clone https://github.com/michellelychan/posenet-pytorch.git

In the `__init__.py` file within the cloned repository (posenet-pytorch/posenet/\_\_init\_\_.py), the function `decode_multiple_poses` is currently disabled by being commented out. Therefore, enable it using the following command:

In [139]:
!sed -i '2s/^# *\(from .* import .*\)/\1/' ./posenet-pytorch/posenet/__init__.py

```python
# ./posenet-pytorch/posenet/__init__.py
from posenet.constants import *
from posenet.decode_multi import decode_multiple_poses  # <-- this sentence
from posenet import decode
from posenet.models.model_factory import load_model
from posenet.models import MobileNetV1, MOBILENET_V1_CHECKPOINTS
from posenet.utils import *
```

Install the relevant packages:  
This step may take several minutes...


In [None]:
!pip install torch==2.6.0 torchvision==0.21.0
!pip install onnx==1.16.1
!pip install numpy==1.26.4
!pip install opencv-python==4.9.0.80
!pip install pycocotools==2.0.10
!pip install requests

In [141]:
import importlib
if not importlib.util.find_spec('model_compression_toolkit'):
    !pip install model_compression_toolkit

In [142]:
import itertools
import json
from typing import List, Dict, Any
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
import os
import sys
sys.path.append('./posenet-pytorch')
sys.path.append('./posenet-pytorch/posenet')
import posenet
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval

import cv2

### Various Settings
Here, you can configure the parameters listed below.  

#### File path setting
- SAVE_FLOAT_EVAL_RESULT  
  This parameter sets the filename for outputting inference results before quantization.
- SAVE_QUANT_EVAL_RESULT  
  This parameter sets the filename for outputting inference results after quantization.

#### Parameter setting
- MODEL_ID  
  This parameter allows you to select the model depth to use(50, 75, 100, 101).
- SCALE_FACTOR  
  This parameter allows you to set the scaling for the input image.
- DECODE_MAX_POSES  
  This parameter allows you to set the maximum number of detections in pose estimation.
- DECODE_MIN_POSE_SCORE  
  This parameter allows you to set the minimum score for pose detection.
- KPT_VIS_THR  
  This parameter allows you to set the visibility of keypoints.
- NUM_WORKERS  
  This parameter allows you to set the number of processes for parallelizing the data loading process.
- CALIB_ITER  
  This parameter allows you to set how many samples to use when generating representative data for quantization.
- WEIGHTS_COMPRESSION_RATIO  
  This parameter allows you to set the quantization ratio based on the weight size of the 8-bit model when using mixed-precision quantization.

In [None]:
# File path setting
SAVE_FLOAT_EVAL_RESULT = "float_eval_result"
SAVE_QUANT_EVAL_RESULT = "quant_eval_result"

# Parameter setting
MODEL_ID = 75
SCALE_FACTOR = 1.0
DECODE_MAX_POSES = 20
DECODE_MIN_POSE_SCORE = 0
KPT_LAB_THR = 0.2
KPT_VIS_THR = 0.5
NUM_WORKERS = 0
CALIB_ITER = 10
IMG_HEIGHT = 480
IMG_WIDTH = 640
WEIGHTS_COMPRESSION_RATIO = 0.70

Load a pre-trained PoseNet(MobileNetV1 backbone) model.  

In [None]:
float_model = posenet.load_model(MODEL_ID)
output_stride = getattr(float_model, 'output_stride', 8)
print(output_stride)

**Note**  
When you run the code for the first time, the model download will begin.  
This step may take several minutes...

## Dataset preparation
### Download COCO's dataset

**Note**  
In this tutorial, we will use a subset of COCO train2017 for calibration during quantization and COCO val2017 for evaluation.

This step may take several minutes...

In [145]:
if not os.path.isdir('COCO_dataset'):
    !mkdir COCO_dataset
    !wget -P COCO_dataset http://images.cocodataset.org/annotations/annotations_trainval2017.zip
    !wget -P COCO_dataset http://images.cocodataset.org/zips/train2017.zip
    !wget -P COCO_dataset http://images.cocodataset.org/zips/val2017.zip
    !unzip COCO_dataset/annotations_trainval2017.zip -d COCO_dataset
    !unzip COCO_dataset/train2017.zip -d COCO_dataset
    !unzip COCO_dataset/val2017.zip -d COCO_dataset

Here, we are setting the paths for the annotation file and image folder of the downloaded dataset.

In [146]:
COCO_TRAIN_IMG_DIR = "COCO_dataset/train2017/"
COCO_VAL_IMG_DIR = "COCO_dataset/val2017/"
COCO_TRAIN_ANN_JSON = "COCO_dataset/annotations/person_keypoints_train2017.json"
COCO_VAL_ANN_JSON = "COCO_dataset/annotations/person_keypoints_val2017.json"

In this class, we process the downloaded COCO's dataset for calibration during quantization and for use in evaluation.

In [None]:
class CocoPoseNetDataset(Dataset):
    def __init__(self, img_dir: str, ann_json: str, output_stride: int, scale_factor: float = 1.0):
        self.img_dir = img_dir
        self.coco = COCO(ann_json)
        self.img_ids = self.coco.getImgIds(catIds=[1])
        self.output_stride = output_stride
        self.scale_factor = scale_factor

    def __len__(self):
        return len(self.img_ids)

    def __getitem__(self, idx: int) -> Dict[str, Any]:
        img_id = self.img_ids[idx]
        img_info = self.coco.loadImgs([img_id])[0]
        img_path = os.path.join(self.img_dir, img_info['file_name'])

        input_image, draw_image, output_scale = posenet.read_imgfile(
            img_path, scale_factor=self.scale_factor, output_stride=self.output_stride
        )

        input_image = np.squeeze(input_image, axis=0)
        input_image = input_image.transpose((1, 2, 0))
        input_image = cv2.resize(input_image,(IMG_WIDTH, IMG_HEIGHT), interpolation=cv2.INTER_LINEAR)
        input_image = input_image.transpose((2, 0, 1)).reshape(1, 3, IMG_HEIGHT, IMG_WIDTH)
        input_image = input_image.astype(np.float32)

        output_scale_height, output_scale_width, _ = draw_image.shape
        output_scale = np.array([output_scale_height / IMG_HEIGHT, output_scale_width / IMG_WIDTH])
            
        input_tensor = torch.from_numpy(input_image)

        sample = {
            'input': input_tensor,
            'img_id': img_id,
            'output_scale': output_scale,
            'file_name': img_info['file_name'],
        }
        return sample

Generate an array of keypoints with visibility/invisibility flags based on keypoint information and their scores.

In [148]:
def coco_kpts_xy_score_to_xyv(
    kpts_xy: np.ndarray, kpts_score: np.ndarray,
    visible_thr: float = 0.5, label_thr: float = 0.2) -> np.ndarray:

    # COCO v Format Extension: Assign 0/1/2 based on score
    # visible_thr: Threshold for determining visibility
    # label_thr: Threshold for determining presence of label
    
    # v=0: does not exist within the image (< label_thr)
    vis = np.zeros_like(kpts_score, dtype=np.int32)
    # v=2: Completely visible (> visible_thr)
    vis[kpts_score > visible_thr] = 2
    # v=1: The label is present but not visible within the image (> label_thr, < visible_thr)
    vis[(kpts_score > label_thr) & (kpts_score <= visible_thr)] = 1
    # Combination to [x, y, v]
    kpts_xyv = np.concatenate([kpts_xy, vis[:, None]], axis=1)
    return kpts_xyv

Organize key point information into a one-dimensional list.

In [149]:
def flatten_xyv(kpts_xyv: np.ndarray) -> List[float]:
    return [float(v) for row in kpts_xyv for v in row]

In [None]:
val_dataset = CocoPoseNetDataset(
    img_dir=COCO_VAL_IMG_DIR, ann_json=COCO_VAL_ANN_JSON,
    output_stride=output_stride, scale_factor=SCALE_FACTOR
)

calib_dataset = CocoPoseNetDataset(
    img_dir=COCO_TRAIN_IMG_DIR, ann_json=COCO_TRAIN_ANN_JSON,
    output_stride=output_stride, scale_factor=SCALE_FACTOR
)

# For evaluation (batch size 1)
val_dataloader = DataLoader(
    val_dataset, batch_size=1, shuffle=False,
    num_workers=NUM_WORKERS, collate_fn=lambda x: x[0]
)

# For calibration（No label required）
calib_loader = DataLoader(
    calib_dataset, batch_size=1, shuffle=True,
    num_workers=NUM_WORKERS, collate_fn=lambda x: x[0]
)

print(len(calib_dataset))
print(len(val_dataset))

## Representative Dataset
For quantization with MCT, we need to define a representative dataset required by the PTQ algorithm. This dataset is a generator that returns a list of images:

In [151]:
def representative_dataset_gen():
    for sample in itertools.islice(itertools.cycle(calib_loader), CALIB_ITER):
        yield [sample['input']]

## Target Platform Capabilities (TPC)
In addition, MCT optimizes the model for dedicated hardware platforms. This is done using TPC (for more details, please visit our [documentation](https://sonysemiconductorsolutions.github.io/mct-model-optimization/api/api_docs/modules/target_platform_capabilities.html)). Here, we use the default Pytorch TPC:

In [152]:
import model_compression_toolkit as mct

tpc = mct.get_target_platform_capabilities('pytorch', 'default')

## Mixed Precision Configurations
We will create a `MixedPrecisionQuantizationConfig` that defines the search options for mixed-precision:


In [None]:
configuration = mct.core.CoreConfig(
    mixed_precision_config=mct.core.MixedPrecisionQuantizationConfig(num_of_images=CALIB_ITER))

In [None]:
# Get Resource Utilization information to constraint your model's memory size.
resource_utilization_data = mct.core.pytorch_resource_utilization_data(
    float_model,
    representative_dataset_gen,
    configuration,
    target_platform_capabilities=tpc)

# Create a ResourceUtilization object 
resource_utilization = mct.core.ResourceUtilization(resource_utilization_data.weights_memory * WEIGHTS_COMPRESSION_RATIO)

# Post-Training Quantization using MCT
Now for the exciting part! Let's run PTQ on the model.

In [None]:
quantized_model, quantization_info = mct.ptq.pytorch_post_training_quantization(
                                        in_module=float_model,
                                        representative_data_gen=representative_dataset_gen,
                                        target_platform_capabilities=tpc,
                                        core_config=configuration,
                                        target_resource_utilization=resource_utilization)

# Model Evaluation
Now, we will create a function for evaluating a model.  
The inference results before and after quantization are displayed on the terminal and simultaneously written to a JSON file.

In [None]:
@torch.no_grad()
def evaluate(model: torch.nn.Module,
             val_dataset: CocoPoseNetDataset,
             val_dataloader: DataLoader,
             decode_max_poses: int = 1,
             decode_min_pose_score: float = 0,
             kpt_lab_thr: float = 0.2,
             kpt_vis_thr: float = 0.5) -> float:

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    model.to(device)
    model.eval()

    output_stride = val_dataset.output_stride

    results = []
    for sample in tqdm(val_dataloader, desc="Evaluating"):
        inp = sample['input'].to(device)
        img_id = sample['img_id']
        output_scale = sample['output_scale']

        heat, off, disp_f, disp_b = model(inp)
        heat, off, disp_f, disp_b = heat.squeeze(0), off.squeeze(0), disp_f.squeeze(0), disp_b.squeeze(0)

        # decode
        pose_scores, keypoint_scores, keypoint_coords, pose_offsets = posenet.decode_multiple_poses(
            heat,
            off,
            disp_f,
            disp_b,
            output_stride=output_stride,
            max_pose_detections=decode_max_poses,
            min_pose_score=decode_min_pose_score)

        for p_idx, ps in enumerate(pose_scores):
            if ps == 0.0:
                continue
            kpts_xy = keypoint_coords[p_idx]
            kpts_xy_img = np.zeros_like(kpts_xy)
            kpts_xy_img[:, 0] = kpts_xy[:, 1]* output_scale[1]
            kpts_xy_img[:, 1] = kpts_xy[:, 0]* output_scale[0]
            kpts_sc = keypoint_scores[p_idx]
            kpts_xyv = coco_kpts_xy_score_to_xyv(kpts_xy_img, kpts_sc, visible_thr=kpt_vis_thr, label_thr=kpt_lab_thr)
            keypoint = flatten_xyv(kpts_xyv)
            results.append({
                "image_id": int(img_id),
                "category_id": 1,
                "keypoints": keypoint,
                "score": float(ps)
            })

    if len(results) == 0:
        print("WARNING : No detection results found. Returning AP=0.0.")
        return
    
    if model==float_model:
        with open(os.path.join(SAVE_FLOAT_EVAL_RESULT + '.json'), 'w') as f:
            json.dump(results,f,ensure_ascii=False,indent=1)
    else:
        with open(os.path.join(SAVE_QUANT_EVAL_RESULT + '.json'), 'w') as f:
            json.dump(results,f,ensure_ascii=False,indent=1)

    # evaluation
    coco_gt = val_dataset.coco
    coco_dt = coco_gt.loadRes(results)
    evaluator = COCOeval(coco_gt, coco_dt, iouType='keypoints')
    evaluator.evaluate()
    evaluator.accumulate()
    evaluator.summarize()
    ap = float(evaluator.stats[0])
    print(f"AP (OKS mAP): {ap:.4f}")

Let's start with the floating-point model evaluation.  
This step may take several minutes...

In [None]:
print("evaluating float model（COCO mAP）...")
evaluate(float_model,
            val_dataset,
            val_dataloader,
            decode_max_poses=DECODE_MAX_POSES,
            decode_min_pose_score=DECODE_MIN_POSE_SCORE,
            kpt_vis_thr=KPT_VIS_THR,
            kpt_lab_thr=KPT_LAB_THR)

Finally, let's evaluate the quantized model:  
This step may take several minutes...

In [None]:
print("evaluating quantized model（COCO mAP）...")
evaluate(quantized_model,
            val_dataset,
            val_dataloader,
            decode_max_poses=DECODE_MAX_POSES,
            decode_min_pose_score=DECODE_MIN_POSE_SCORE,
            kpt_vis_thr=KPT_VIS_THR,
            kpt_lab_thr=KPT_LAB_THR)

## Copyrights

Copyright 2025 Sony Semiconductor Solutions, Inc. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
