<a href="https://colab.research.google.com/github/tiffcmw/Maker-Portfolio/blob/main/Neural-Network-Quantization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

So, this is going to be the code that was intended to be included in my research, but was pulled out for several reasons:

1. I ran into a backend problem that made the quantized models unable to be deployed. Could be for other reasons, but I am puzzled till this day. (I would welcome any advice!)
2. The paper has already reached beyond the page limit before I even got the chance to write about this section, and this section would likely require many words to eloquently explain.

The research I did was about a detail examination and optimal quantizaiton method of the YOLOv8 object detection model.

I will include anything that IS included in the research in the research supplement, but I thought it'll still be interesting to share the quantization process, since it's the actual implementation of the theories and is quite interesting how they are translated into code.

The code here is just my process of doing quantizing the YOLOv8 model using the PyTorch framework (mainly). I did:

* Static Post Training Quantization
* Dynamic Post Training Quantization
* Creating a dataloader

None of this code (except for boilerplate codes used to import libs, packages, models) were submitted to the conference, so there is no conflicts with the blind review policy (in case you're one of the reviewers, who knows).

# Loading and Prepping the model

In [None]:
# install the ultralytics lib (they made yolov8)
!pip install ultralytics

# use wget to download the yolov8n model (it was trained using coco) from ths ultralytics git repository
!wget https://github.com/ultralytics/assets/releases/download/v0.0.0/yolov8n.pt

In [2]:
import ultralytics
import torch
import torch.quantization

# .pt is a pytorch file, so might as well load it using torch
# checkpoint should remain immutable in the session, because the model could be used from many purposes
# checkpoint will be called to retrieve the original model
checkpoint = torch.load("/content/yolov8n.pt")

# extract the state_dict
model = checkpoint['model']

# set the model to evaluation mode
model = model.float().eval()

In [None]:
# function to create visual image (not needed yet)
def imShow(path):
  import cv2
  import matplotlib.pyplot as plt
  %matplotlib inline

  image = cv2.imread(path)
  height, width = image.shape[:2]
  resized_image = cv2.resize(image,(3*width, 3*height), interpolation = cv2.INTER_CUBIC)

  fig = plt.gcf()
  fig.set_size_inches(18, 10)
  plt.axis("off")
  plt.imshow(cv2.cvtColor(resized_image, cv2.COLOR_BGR2RGB))
  plt.show()

# Getting Calibration Data

Creating Tensor List

## Downloading COCO Val2017

In [3]:
# getting the coco val2017 dataset and annotations as the calibration dataset
# this will take a while!

!wget http://images.cocodataset.org/zips/val2017.zip -O coco_val2017.zip
!wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip -O coco_ann2017.zip

--2023-12-24 15:50:32--  http://images.cocodataset.org/zips/val2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 3.5.28.53, 3.5.25.201, 52.217.101.60, ...
Connecting to images.cocodataset.org (images.cocodataset.org)|3.5.28.53|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 815585330 (778M) [application/zip]
Saving to: ‘coco_val2017.zip’


2023-12-24 15:51:34 (12.6 MB/s) - ‘coco_val2017.zip’ saved [815585330/815585330]

--2023-12-24 15:51:34--  http://images.cocodataset.org/annotations/annotations_trainval2017.zip
Resolving images.cocodataset.org (images.cocodataset.org)... 52.217.163.113, 52.216.221.81, 52.216.49.177, ...
Connecting to images.cocodataset.org (images.cocodataset.org)|52.217.163.113|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 252907541 (241M) [application/zip]
Saving to: ‘coco_ann2017.zip’


2023-12-24 15:51:51 (14.6 MB/s) - ‘coco_ann2017.zip’ saved [252907541/252907541]



In [4]:
# unzipping the downloaded files
# I copied this off the internet

from zipfile import ZipFile, BadZipFile
import os

def extract_zip_file(extract_path):
    try:
        with ZipFile(extract_path+".zip") as zfile:
            zfile.extractall(extract_path)
        # remove zipfile
        zfileTOremove=f"{extract_path}"+".zip"
        if os.path.isfile(zfileTOremove):
            os.remove(zfileTOremove)
        else:
            print("Error: %s file not found" % zfileTOremove)
    except BadZipFile as e:
        print("Error:", e)

extract_val_path = "/content/coco_val2017"
extract_ann_path="/content/coco_ann2017"

extract_zip_file(extract_val_path)
extract_zip_file(extract_ann_path)

In [5]:
# Check if I go the correct amount of calib data (should be 5000 for validation datset)
import os

# folder path
dir_path = r"/content/coco_val2017/val2017"
count = 0

# Iterate directory
for path in os.listdir(dir_path):
    # check if current path is a file
    if os.path.isfile(os.path.join(dir_path, path)):
        count += 1

# should be File count: 5000 if done right
print('File count:', count)

File count: 5000


## Creating Tensor List

The list of tensor will be the photos. Tensorized photos get fed to the neural network - its just part of image processing

In [6]:
import os
import torchvision.transforms as transforms
from PIL import Image

# Function to process images with specified range
def process_images(folder_dir, start_index, end_index):
    tensor_list = []

    # defines a transformation function that transform the images to a uniform resolution of 640 * 640.
    # it is the resolution that YOLOv8 accepts
    transform = transforms.Compose([
        transforms.Resize((640, 640)),
        transforms.ToTensor()
    ])

    # get sorted list of all images
    images = sorted(os.listdir(folder_dir))

    # slice the list for the desired range specified by start and end in the input parameter
    selected_images = images[start_index:end_index]

    for image_name in selected_images:
        img_path = os.path.join(folder_dir, image_name)
        img = Image.open(img_path).convert('RGB')  # Convert to RGB

        # Apply the transformation
        tensor = transform(img)

        # Add another dimension at the front to get NCHW shape
        tensor = tensor.unsqueeze(0)

        # Add the tensorized photo to the list
        tensor_list.append(tensor)

    return tensor_list

# Example usage:
folder_dir = "/content/coco_val2017/val2017"
# extract whatever range of images desired, I always do just 100 for the tensorlist or else runtime might crash
# dataloaders will handle larger dataset usages

start = 1   # Start from the first image (1st indexed, not 0 indexed)
end = 101   # End at the 100th image

tensor_list = process_images(folder_dir, start, end)

In [7]:
# I found out that there are images that doensn't fit the criteria of [1, 3, 640, 640]
# 3 layers (RGB) of size 640 x 640
# The for loop interates through all the 100 items in the tensor list, and check

# If all the images fit the criteria, 100 will be printed, nothing else
# Images with defects will have their size and index number printed (for info)
# Internally, the unfit list records the index of defected images and remove it from the tensor list

# Again, it doesn't matter if the list is 100. The list is created just to have some images that can be easily accessed.
# I also created it before the idea of a dataloader came with heavier workloads (and need for backward propagation customisation)

unfit = []

for i in range(0, len(tensor_list)):
    tensor_list[i] = tensor_list[i].float()
    # check if tensor is the right size
    if tensor_list[i].size() != torch.Size([1, 3, 640, 640]):
      print(tensor_list[i].size())
      print(i)
      unfit.append(i)

# remove the tensors of unfit size
for x in unfit:
  tensor_list.pop(x)

print(len(tensor_list))

100


## Creating Dataloader

the output is a list with three items. item 1 is a tensor of size torch.Size([64, 144, 80, 80]), second one is torch.Size([64, 144, 40, 40]), last one is torch.Size([64, 144, 20, 20]).

In [9]:
# Examples of an entry in instances_val2017

import json

# Open the JSON file for reading
with open('/content/coco_ann2017/annotations/instances_val2017.json', 'r') as file:
    # Parse the JSON data into a Python object
    instances_val2017 = json.load(file)

# Access the first image in the 'images' list
first_image = instances_val2017['images'][0]

# Print the first image dictionary
print(first_image)

{'license': 4, 'file_name': '000000397133.jpg', 'coco_url': 'http://images.cocodataset.org/val2017/000000397133.jpg', 'height': 427, 'width': 640, 'date_captured': '2013-11-14 17:02:52', 'flickr_url': 'http://farm7.staticflickr.com/6116/6255196340_da26cf2c9e_z.jpg', 'id': 397133}


In [10]:
image_id = instances_val2017['images'][0]['id']

# there is an annotation dictionary for each image in val2017, and each image has its onw image_id.
# the for loop will iterate through all the annotations in the annotation section in the instances_val2017 dictionary
# the annotations variable wil be initiated by the content of some annotation which image_id matching the id specified by the line above
annotations = [item for item in instances_val2017['annotations'] if item['image_id'] == image_id]

# Print the annotations for the first image
for annotation in annotations:
    print(annotation)

# each segmentation dictionary is one object on the same image.
# each object has its own bounding box coordinate (MAPPED TO THE ORIGINAL IMAGE)

# SOOO basically the image size is not fit and the bounding boxes doesn't fit what it needed to be matched to the output of the neural network,
# as part of loss calculation. sad.
# Therefore, I had to create a function to reformat the bounding boxes. I'll not bother with the segmentation stuff because I don't need it
# I just need to resive the bounding boxes using its original resolution

{'segmentation': [[224.24, 297.18, 228.29, 297.18, 234.91, 298.29, 243.0, 297.55, 249.25, 296.45, 252.19, 294.98, 256.61, 292.4, 254.4, 264.08, 251.83, 262.61, 241.53, 260.04, 235.27, 259.67, 230.49, 259.67, 233.44, 255.25, 237.48, 250.47, 237.85, 243.85, 237.11, 240.54, 234.17, 242.01, 228.65, 249.37, 224.24, 255.62, 220.93, 262.61, 218.36, 267.39, 217.62, 268.5, 218.72, 295.71, 225.34, 297.55]], 'area': 1481.3806499999994, 'iscrowd': 0, 'image_id': 397133, 'bbox': [217.62, 240.54, 38.99, 57.75], 'category_id': 44, 'id': 82445}
{'segmentation': [[292.37, 425.1, 340.6, 373.86, 347.63, 256.31, 198.93, 240.24, 4.02, 311.57, 1.0, 427.0, 291.36, 427.0]], 'area': 54085.6217, 'iscrowd': 0, 'image_id': 397133, 'bbox': [1.0, 240.24, 346.63, 186.76], 'category_id': 67, 'id': 119568}
{'segmentation': [[446.71, 70.66, 466.07, 72.89, 471.28, 78.85, 473.51, 88.52, 473.51, 98.2, 462.34, 111.6, 475.74, 126.48, 484.67, 136.16, 494.35, 157.74, 496.58, 174.12, 498.07, 182.31, 485.42, 189.75, 474.25, 189

In [11]:
from typing import List, Tuple, Dict

def resize_bbox(bbox: List[float], original_size: Tuple[int, int], target_size: Tuple[int, int]) -> List[float]:
    """
    Resize the bounding box coordinates to match the target resolution.

    Parameters:
    # bbox is the original bbox [x_min, y_min, width, height] from 'segmentation'
    # original size is (width, height) from 'image'
    # target size is (whatever, whatever) target resolution e.g.(80, 80)

    """
    # the x scale is the ratio between the original width and desired width
    # the y scale is the ratio between the original height and desired height
    scale_x = target_size[0] / original_size[0]
    scale_y = target_size[1] / original_size[1]

    # Apply the scale factors to the bbox coordinates [x_min, y_min, width, height]
    resized_bbox = [bbox[0] * scale_x, bbox[1] * scale_y, bbox[2] * scale_x, bbox[3] * scale_y]

    return resized_bbox

In [12]:
# Example usage:

# this is the first line in the code box before the previous
# one of the bounding boxes of the first image in val2017
original_bbox = annotations[0]['bbox']  # Original bounding box

# 'width' and 'height' in first image in 'images' list from instances_val2017
original_image_size = (first_image['width'], first_image['height'])  # Original image size (width, height)

# Resize bounding box for 80x80 resolution
bbox_80x80 = resize_bbox(original_bbox, original_image_size, (80, 80))

# Resize bounding box for 40x40 resolution
bbox_40x40 = resize_bbox(original_bbox, original_image_size, (40, 40))

# Resize bounding box for 20x20 resolution
bbox_20x20 = resize_bbox(original_bbox, original_image_size, (20, 20))

print("Original BBox: ", original_bbox)
print("80x80 BBox: ", bbox_80x80)
print("40x40 BBox: ", bbox_40x40)
print("20x20 BBox: ", bbox_20x20)

Original BBox:  [217.62, 240.54, 38.99, 57.75]
80x80 BBox:  [27.2025, 45.066042154566745, 4.87375, 10.819672131147541]
40x40 BBox:  [13.60125, 22.533021077283372, 2.436875, 5.409836065573771]
20x20 BBox:  [6.800625, 11.266510538641686, 1.2184375, 2.7049180327868854]


In [None]:
!pip install --upgrade torchvision

In [14]:
# I discovered a COCO API for getting images instead, so doesn't need wget and download for runtime
from pycocotools.coco import COCO
from torchvision import transforms
import torchvision.transforms.functional as F
from torch.utils.data import Dataset, DataLoader
from torch.utils.data.dataloader import default_collate
from torch.nn.utils.rnn import pad_sequence
from PIL import Image
import random
import os

def resize_bbox(bbox: List[float], original_size: Tuple[int, int], target_size: Tuple[int, int]) -> List[float]:
      """
      Resize the bounding box coordinates to match the target resolution.

      Parameters:
      # bbox is the original bbox [x_min, y_min, width, height] from 'segmentation'
      # original size is (width, height) from 'image'
      # target size is (whatever, whatever) target resolution e.g.(80, 80)

      """
      # the x scale is the ratio between the original width and desired width
      # the y scale is the ratio between the original height and desired height
      scale_x = target_size[0] / original_size[0]
      scale_y = target_size[1] / original_size[1]

      # Apply the scale factors to the bbox coordinates [x_min, y_min, width, height]
      resized_bbox = [bbox[0] * scale_x, bbox[1] * scale_y, bbox[2] * scale_x, bbox[3] * scale_y]

      return resized_bbox

class COCODataset(Dataset):
    def __init__(self, coco, image_ids, base_dir, transform=None):
        # store COCO API interface for accessing the val2017 datset
        self.coco = coco

        # store list of image IDs used in this dataset instance
        self.image_ids = image_ids

        # store base directory of image location
        self.base_dir = base_dir

        # Store the transform(s) that will be applied to the images (e.g., for data augmentation)
        self.transform = transform

        # Generate a list of image file paths using the COCO API and the provided image IDs.
        # The COCO API's loadImgs method is called for each image ID (the image IDs are distinct for each image),
        # and the 'file_name' is retrieved from the resulting dictionary, same as 'image'.
        self.image_paths = [self.coco.loadImgs(id)[0]['file_name'] for id in self.image_ids]

        # Generate a list of annotations for each image using the COCO API.
        # The COCO API's getAnnIds method retrieves all annotation IDs for a given image ID,
        # and then loadAnns retrieves the actual annotations for those IDs.
        self.annotations = [self.coco.loadAnns(self.coco.getAnnIds(imgIds=id)) for id in self.image_ids]

        # Run a method to filter out non-RGB images and their corresponding IDs and annotations.
        # Ensures that only RGB images are used in the dataset.
        self.image_paths, self.image_ids, self.annotations = self.filter_rgb_images_and_ids()

    # A custom method that removes grayscale images or images with alpha channels
    # I think I added this when I ran into lots of errors having wrong sizes of outputs or something
    def filter_rgb_images_and_ids(self):
      # Initialize empty lists to store paths, IDs, and annotations for RGB images only
      rgb_image_paths = []
      rgb_image_ids = []
      rgb_annotations = []

      # Iterate over all image paths, IDs, and annotations
      for image_path, id, annotation in zip(self.image_paths, self.image_ids, self.annotations):
          # Open the image file to check its mode
          image = Image.open(os.path.join(self.base_dir, image_path))

          # Check if the image is in RGB mode
          if image.mode == 'RGB':
              # If it is RGB, append the image path, ID, and annotation to their respective lists
              rgb_image_paths.append(image_path)
              rgb_image_ids.append(id)
              rgb_annotations.append(annotation)

          # Close the image file to free up resources
          image.close()

      # Return the lists containing only the information for RGB images
      return rgb_image_paths, rgb_image_ids, rgb_annotations

    def __getitem__(self, index):
      # Retrieve the file path for the image at the given index
      image_path = self.image_paths[index]

      # Open the image file using the path
      image = Image.open(os.path.join(self.base_dir, image_path))

      # Get the original size of the image as a (width, height) tuple
      original_size = image.size

      """
      Image Processing: Resize -> Transform to tensor
      """

      # If a transform is set, apply it to the image
      # the image needs to be resized to 640*640 for the input
      if self.transform:
          image = self.transform(image)

      # If the image is not a PyTorch tensor, convert it to one
      if not isinstance(image, torch.Tensor):
          image = F.to_tensor(image)

      """
      Annotation Processing: Resize + get category id
      """

      # Retrieve the annotations for the current image
      annotations = self.annotations[index]

      # Initialize lists to hold the resized annotations for each scale
      annotations_scales = {
          '80': [],
          '40': [],
          '20': []
      }

      # The scales to which the annotations will be resized
      target_scales = [80, 40, 20]

      # Iterate over all annotations to resize them for each scale
      for ann in annotations:
          # Retrieve the original bounding box from the annotation
          bbox = ann['bbox']
          category_id = ann['category_id']

          # Resize the bounding box for each target scale and store them in the annotations_scales dict
          for scale in target_scales:
              resized_bbox = resize_bbox(bbox, original_size, (scale, scale))
              annotations_scales[str(scale)].append({'bbox': resized_bbox, 'category_id': category_id})

      # Return the image tensor and the resized annotations for each scale
      return image, {
          'annotations_80': annotations_scales['80'],
          'annotations_40': annotations_scales['40'],
          'annotations_20': annotations_scales['20']
      }

    def __len__(self):
        # Return the number of images in the dataset
        return len(self.image_ids)

def pad_tensors(tensor_list):
    # Filter out any tensors with zero size in the second dimension
    non_empty_tensors = [t for t in tensor_list if t.nelement() != 0]

    if not non_empty_tensors:
        # If all tensors are empty, return a placeholder tensor
        # Adjust the dimensions and type as per your specific requirement
        return torch.zeros((0, 0), dtype=torch.float32)
    else:
        # Pad the non-empty tensors
        return pad_sequence(non_empty_tensors, batch_first=True)

# size issues, related to batch size and paddings for uniformality in images
def my_collate(batch):
    # refer to the return statement above.
    # the image is the first item
    # the anotations is stored in a form of a dictionary, and the amount of annotations depends on the amount of objects there are
    images = [item[0] for item in batch]
    annotations_batch = [item[1] for item in batch]

    # Prepare lists for bounding boxes for each scale and categories for all scales
    bboxes_80 = []
    bboxes_40 = []
    bboxes_20 = []
    categories = []  # Categories should be the same for all scales within each image

    # Process each item in the batch
    for annotations in annotations_batch:

        # The categories are the same for all scales, because its just the same image with bounding boxes at different scales
        # so take the set of categories from annotations_80
        categories.append(torch.tensor([ann['category_id'] for ann in annotations['annotations_80']], dtype=torch.int64))

        # Process bounding boxes for each scale
        bboxes_80.append(torch.tensor([ann['bbox'] for ann in annotations['annotations_80']], dtype=torch.float32))
        bboxes_40.append(torch.tensor([ann['bbox'] for ann in annotations['annotations_40']], dtype=torch.float32))
        bboxes_20.append(torch.tensor([ann['bbox'] for ann in annotations['annotations_20']], dtype=torch.float32))

    # Use the utility function to pad each set
    bboxes_80_padded = pad_tensors(bboxes_80)
    bboxes_40_padded = pad_tensors(bboxes_40)
    bboxes_20_padded = pad_tensors(bboxes_20)
    categories_padded = pad_tensors(categories)

    # Stack images into a single tensor
    images_stacked = torch.stack(images)

    # Return the collated batch
    return images_stacked, {
        'bboxes_80': bboxes_80_padded,
        'bboxes_40': bboxes_40_padded,
        'bboxes_20': bboxes_20_padded,
        'categories': categories_padded  # Categories are the same for all scales
    }

transform = transforms.Compose([
    transforms.Resize(640),  # Resize the smaller edge to 640
    transforms.CenterCrop((640, 640)),  # Crop the center to make the image 640x640
    transforms.ToTensor()  # Convert to tensor
])

In [15]:
# Path to the images
image_directory = '/content/coco_val2017/val2017'

# Full path to the annotations
annotation_file = '/content/coco_ann2017/annotations/instances_val2017.json'

# Creates an instance of the COCO class from pycocotools.coco import COCO
# it parses instances_val2017 without needing to manually access it through dictionaries and lists
coco = COCO(annotation_file)

# Get ids of all images in the dataset instantiated by COCO(annotation file)
image_ids = coco.getImgIds()

# Assume image_ids is a list of all image ids
# random.shuffle(image_ids) # for shuffled order

# Use only the first x images
image_ids_subsampled = image_ids[:100]

# passes the retrieved info into the COCODataset class defined above
dataset = COCODataset(coco, image_ids_subsampled, image_directory, transform=transform)

# uses the dataloader module from torch.utils.data to create a dataloader
"""
batch_size=64: 64 samples for each iteration
num_workers=4: Four parallel workers will load the data to increase speed
pin_memory=True: Performance optimization for data transfer to CUDA-enabled GPUs.(me with cloud gpu subscriptions)
prefetch_factor=2: 2 batches are preloaded in advance. Can improve performance by overlapping data loading with computation.
shuffle=True: The dataset will be shuffled at the start of each epoch, which is good practice for training
collate_fn=my_collate: Uses a custom collate function from the COCODataset class, to handle variable numbers of objects and annotations per image.
"""
dataloader = DataLoader(dataset, batch_size=64, num_workers=4, pin_memory=True, prefetch_factor=2, shuffle=True, collate_fn=my_collate)

loading annotations into memory...
Done (t=0.72s)
creating index...
index created!


# Quantization

## Static Post Training Quantization

In [16]:
import torch
from torch.ao.quantization.observer import MinMaxObserver

# backstory: in my research, I proposed the optimal clipping range for weights to utilise 3 standard deviations away from the mean,
# upon fitting the activation values onto a Laplace Distribution.
# On the other hand, due to the skewed distribution of activations, using percentiles (0.5th and 99.5th) for clipping range instead of fitting them to curves,
# seemed like the better move and offers better accuracy.

# the observer class is able to extract the observed values at a given moment based on the prescribed instructions

# Define a class PercentileObserver that inherits from MinMaxObserver
class PercentileObserver(MinMaxObserver):
    # Constructor with default percentiles set to 0.5 and 99.5
    def __init__(self, min_percentile=0.5, max_percentile=99.5, **kwargs):
        super().__init__(**kwargs)  # Initialize the base class with any additional keyword arguments
        self.min_percentile = min_percentile  # Set the minimum percentile threshold
        self.max_percentile = max_percentile  # Set the maximum percentile threshold

    # Method called when a batch of data x is passed through the observer
    def forward(self, x):
        # Calculate the value at the minimum percentile
        min_val, _ = x.view(-1).kthvalue(int(x.numel() * self.min_percentile / 100))
        # Calculate the value at the maximum percentile
        max_val, _ = x.view(-1).kthvalue(int(x.numel() * self.max_percentile / 100))
        # Update self.min_val to be the minimum between the existing min_val and the new min_val
        self.min_val = min(self.min_val, min_val)
        # Update self.max_val to be the maximum between the existing max_val and the new max_val
        self.max_val = max(self.max_val, max_val)
        # Return the unchanged input data
        return x

# Define a class LaplaceObserver that also inherits from MinMaxObserver
class LaplaceObserver(MinMaxObserver):
    # Constructor with a default number of standard deviations set to 3
    def __init__(self, num_stddev=3, **kwargs):
        super().__init__(**kwargs)  # Initialize the base class with any additional keyword arguments
        self.num_stddev = num_stddev  # Set the number of standard deviations for range estimation

    # Method called when a batch of data x is passed through the observer
    def forward(self, x):
        mean = x.mean()  # Calculate the mean of the input data
        std = x.std()  # Calculate the standard deviation of the input data
        # Set the minimum observed value, ensuring it does not go below mean - num_stddev * std
        self.min_val = max(self.min_val, mean - self.num_stddev * std)
        # Set the maximum observed value, ensuring it does not exceed mean + num_stddev * std
        self.max_val = min(self.max_val, mean + self.num_stddev * std)
        # Return the unchanged input data
        return x

In [17]:
import torch
import torch.quantization

# Load pre-trained model
checkpoint = torch.load("/content/yolov8n.pt")

# Extract the state_dict
model = checkpoint['model'].float().eval()

model.qconfig = torch.ao.quantization.get_default_qconfig('fbgemm')
# Update the qconfig for activation and weight to use your custom observers
model.qconfig = torch.ao.quantization.qconfig.QConfig(
    activation=PercentileObserver.with_args(min_percentile=0.5, max_percentile=99.5, dtype=torch.qint8),
    weight=LaplaceObserver.with_args(num_stddev=3, dtype=torch.qint8)
)

# Prepare the model for static quantization
model_static_quant = torch.quantization.prepare(model)

# Define a calibration function
def calibrate(model, loader):
    with torch.no_grad():  # No need to track gradients
        for images, _ in loader:
            model(images)

# i could've used less photos but 1.5 min is okay :)
# Calibrate the model using your data loader
calibrate(model_static_quant, dataloader)

# Convert the model to a quantized version
model_static_quant = torch.quantization.convert(model_static_quant)

print(model_static_quant)  # Print the quantized model

DetectionModel(
  (model): Sequential(
    (0): Conv(
      (conv): QuantizedConv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), scale=0.0044291215017437935, zero_point=5, padding=(1, 1), bias=False)
      (bn): QuantizedBatchNorm2d(16, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act): SiLU(inplace=True)
    )
    (1): Conv(
      (conv): QuantizedConv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), scale=0.0473511628806591, zero_point=5, padding=(1, 1), bias=False)
      (bn): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
      (act): SiLU(inplace=True)
    )
    (2): C2f(
      (cv1): Conv(
        (conv): QuantizedConv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), scale=0.07088109105825424, zero_point=25, bias=False)
        (bn): QuantizedBatchNorm2d(32, eps=0.001, momentum=0.1, affine=True, track_running_stats=True)
        (act): SiLU(inplace=True)
      )
      (cv2): Conv(
        (conv): QuantizedConv2d(48, 32, ker



In [18]:
dummy_input = tensor_list[0]
model_static_quant(dummy_input)

# so yeah this is the error I was talking about :(
# the SAME ONE i was seeing for a solid week


NotImplementedError: ignored

## Dynamic Post Training Quantization

PyTorch only supports these modules:
- nn.Linear
- nn.LSTM
- nn.GRU
- nn.LSTMCell
- nn.RNNCell
- nn.GRUCell

None of which are used in YOLOv8.



### PyTorch

Hence and thereof, the default dynamic quantization does not work.

In the documentations, the srcs often fuse the modules before quantization to increase efficiency even more but SiLU is not supported, and just fusing Conv2d and BatchNorm2d and leaving SiLU out there seemed weird so I just didn't bother fusing.

In [19]:
import torch
import torch.quantization

model = checkpoint['model'].float().eval()

# create a quantized model instance
model_int8 = torch.ao.quantization.quantize_dynamic(
    model,  # the original model  # a set of layers to dynamically quantize
    dtype=torch.qint8)  # the target dtype for quantized weights

In [20]:
print(model_int8)

# none of the modules are quantized
# same as original

DetectionModel(
  (model): Sequential(
    (0): Conv(
      (conv): Conv2d(3, 16, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(16, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): SiLU(inplace=True)
    )
    (1): Conv(
      (conv): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn): BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
      (act): SiLU(inplace=True)
    )
    (2): C2f(
      (cv1): Conv(
        (conv): Conv2d(32, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
        (act): SiLU(inplace=True)
      )
      (cv2): Conv(
        (conv): Conv2d(48, 32, kernel_size=(1, 1), stride=(1, 1), bias=False)
        (bn): BatchNorm2d(32, eps=0.001, momentum=0.03, affine=True, track_running_stats=True)
        (act): SiLU(inplace=True)
      )
    

### ONNX Export


In [None]:
!pip install onnx

In [23]:
import torch
import torch.onnx

# Check if CUDA is available
# FP16 export is only possible if CUDA-capable GPU is being used

if torch.cuda.is_available():
    device = torch.device('cuda:0')  # Set the device to GPU (device index 0)

    # Load your model
    model = torch.load('yolov8n.pt')['model'].float().eval()  # Adjust as necessary for your model loading
    model.to(device)  # Move your model to the GPU

    # convert the model to half precision, which is float16 (by default its float32)
    model.half()

    # Createthen move the tensor to GPU and change it to half precision
    # (so it can be passed to the FP16 model in GPU)
    input_tensor = torch.randn(1, 3, 640, 640, device=device).half()

    # Export the model to ONNX
    torch.onnx.export(model,
                      input_tensor,
                      'yolov8n.onnx',
                      export_params=True,
                      opset_version=17,  # The ONNX opset version to export the model with
                      do_constant_folding=True,
                      input_names=['input'],
                      output_names=['output'],
                      dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}})
else:
    print("CUDA is not available. GPU export cannot be performed.")

In [24]:
import onnx

# Load the ONNX model
model = onnx.load("/content/yolov8n.onnx")

# Print a human-readable representation of the model
print(onnx.helper.printable_graph(model.graph))

# Quantized, FLOAT16 is used across the board

graph main_graph (
  %input[FLOAT16, batch_sizex3x640x640]
) initializers (
  %model.22.cv2.0.2.weight[FLOAT16, 64x64x1x1]
  %model.22.cv2.0.2.bias[FLOAT16, 64]
  %model.22.cv2.1.2.weight[FLOAT16, 64x64x1x1]
  %model.22.cv2.1.2.bias[FLOAT16, 64]
  %model.22.cv2.2.2.weight[FLOAT16, 64x64x1x1]
  %model.22.cv2.2.2.bias[FLOAT16, 64]
  %model.22.cv3.0.2.weight[FLOAT16, 80x80x1x1]
  %model.22.cv3.0.2.bias[FLOAT16, 80]
  %model.22.cv3.1.2.weight[FLOAT16, 80x80x1x1]
  %model.22.cv3.1.2.bias[FLOAT16, 80]
  %model.22.cv3.2.2.weight[FLOAT16, 80x80x1x1]
  %model.22.cv3.2.2.bias[FLOAT16, 80]
  %model.22.dfl.conv.weight[FLOAT16, 1x16x1x1]
  %onnx::Conv_986[FLOAT16, 16x3x3x3]
  %onnx::Conv_987[FLOAT16, 16]
  %onnx::Conv_989[FLOAT16, 32x16x3x3]
  %onnx::Conv_990[FLOAT16, 32]
  %onnx::Conv_992[FLOAT16, 32x32x1x1]
  %onnx::Conv_993[FLOAT16, 32]
  %onnx::Conv_995[FLOAT16, 16x16x3x3]
  ......
  %onnx::Conv_1154[FLOAT16, 80x80x3x3]
  %onnx::Conv_1155[FLOAT16, 80]
) {
  %/model.0/conv/Conv_output_0 = Conv[d

I also did it with Tensorflow, but it requires lower version of both Python. I don't want to mess with different dependencies and environments in a singular colab file so I will not include it here.