## LAPQ
This notebook demonstrates the implimentation of the paper [Loss Aware Post-training Quantization](https://arxiv.org/abs/1911.07190)

### Steps to quantize the pretrained model
- Load the dataset and create dataloader. A subset of training data is used for calibration.
- Load the pretrained full precision model.
- Load the configurations from the YAML file.
- Create a `LAPQ` object and pass the full precision model, dataloaders and configurations.
- Quantize the model by calling the `compress_model` method.

In [None]:
USE_COLAB = True

if USE_COLAB:
  from google.colab import drive
  drive.mount("/content/drive")
  base_path = "/content/drive/MyDrive/trail"
else:
  base_path = "../../../.."

library_path = base_path + "/trailmet"
requirements_path = library_path + "/requirements.txt"
config_path = library_path + "/experiments/quantization/LAPQ/lapq_config.yaml"
weights_path = base_path + "/weights/resnet50_cifar100_pretrained.pth"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%pip install -q -r $requirements_path

In [None]:
import sys
sys.path.append(library_path)

In [None]:
import yaml
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
from trailmet.datasets.classification import DatasetFactory
from trailmet.models import resnet, mobilenet
from trailmet.algorithms import quantize

## Datasets

### Augmentations

In [None]:
stats = ((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761))

train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4, padding_mode='reflect'),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(*stats, inplace=True)
])
val_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])

input_transforms = {
    'train': train_transform,
    'val': val_transform,
    'test': test_transform}

target_transforms = {
    'train': None,
    'val': None,
    'test': None}

### Load Datasets

In [None]:
cifar100_dataset = DatasetFactory.create_dataset(
        name = 'CIFAR100',
        root = './data',
        split_types = ['train', 'val', 'test'],
        val_fraction = 0.2,
        transform = input_transforms,
        target_transform = target_transforms)

# getting the size of the different splits
print('Train samples: ',cifar100_dataset['info']['train_size'])
print('Val samples: ',cifar100_dataset['info']['val_size'])
print('Test samples: ',cifar100_dataset['info']['test_size'] )

Downloading https://www.cs.toronto.edu/~kriz/cifar-100-python.tar.gz to ./data/cifar-100-python.tar.gz


100%|██████████| 169001437/169001437 [00:12<00:00, 13715362.31it/s]


Extracting ./data/cifar-100-python.tar.gz to ./data
Files already downloaded and verified
Files already downloaded and verified
Train samples:  40000
Val samples:  10000
Test samples:  10000


### Define Dataloaders

In [None]:
train_loader = DataLoader(
        cifar100_dataset['train'], batch_size=128,
        sampler=cifar100_dataset['train_sampler'],
        num_workers=0)
val_loader = DataLoader(
        cifar100_dataset['val'], batch_size=128,
        sampler=cifar100_dataset['val_sampler'],
        num_workers=0)
test_loader = DataLoader(
        cifar100_dataset['test'], batch_size=128,
        sampler=cifar100_dataset['test_sampler'],
        num_workers=0)

dataloaders = {"train": train_loader, "val": val_loader, "test": test_loader}

print('No. of training batches: ', len(dataloaders['train']))
print('No. of validation batches: ', len(dataloaders['val']))
print('No. of test batches: ', len(dataloaders['test']))

No. of training batches:  313
No. of validation batches:  79
No. of test batches:  79


### Load Pretrained Model

In [None]:
model = resnet.make_resnet50(100,32)
checkpoint = torch.load(weights_path, map_location='cuda:0')
model.load_state_dict(checkpoint['state_dict'])

<All keys matched successfully>

### Load Method Config

In [None]:
with open(config_path, 'r') as f:
    config = yaml.safe_load(f)
    kwargs = config['GENERAL']

kwargs['MAX_ITER']=2000
kwargs['MAX_FEV']=2000
kwargs['A_BITS']=7
kwargs['W_BITS']=7

kwargs

{'GPU_ID': 0,
 'SEED': 42,
 'W_BITS': 7,
 'A_BITS': 7,
 'ACT_QUANT': True,
 'CALIB_BATCHES': 4,
 'MAX_ITER': 2000,
 'MAX_FEV': 2000,
 'VERBOSE': True}

### Quantization Method: LAPQ

In [None]:
quantizer = quantize.lapq.LAPQ(model, dataloaders, **kwargs)
qmodel = quantizer.compress_model()

==> Using seed: 42 and device: cuda:0
testing pretrained model before quantization


100%|██████████| 79/79 [00:10<00:00,  7.57it/s, acc1=72.5, acc5=91.5]


top-1 acc: 72.52%, top-5 acc: 91.52%


100%|██████████| 79/79 [00:08<00:00,  8.84it/s, acc1=69.5, acc5=90]


==> Quantization (W7A7) accuracy before LAPQ: 69.5115 | 90.0316


100%|██████████| 10/10 [00:20<00:00,  2.00s/it, loss=0.176, p_val=4]


==> using p val : 3.66  with lp-loss : 0.18


100%|██████████| 79/79 [00:10<00:00,  7.89it/s, acc1=72.3, acc5=91.3]


==> Quantization (W7A7) accuracy before Optimization: 72.2508 | 91.3370
==> Starting Powell Optimization


100%|██████████| 2000/2000 [09:21<00:00,  3.56it/s, curr_loss=0.176, min_loss=0.152]


==> Layer-wise Scales :
 [ 0.33818299  0.62134768  0.82564299  1.02090729  0.95200617  0.36689557
  0.56235786  0.83859347  0.14576932  0.61901607  1.22165686  0.47066361
  0.26149886  0.74342126  0.45708764  0.14837253  0.15520184  1.14811294
  0.17855284  0.39177551  1.10277545  0.2693316   0.21350731  0.71349451
  0.36680011  0.12632736  0.51384439  0.2863148   0.12838054  0.18741796
  0.85134318  0.10731246  0.16548798  0.66784387  0.09932491  0.16292682
  0.85598874  0.12488759  0.20705417  0.68615477  0.18071352  0.27441487
  0.54972186  0.30531706  0.2248336   1.19240661  0.4168146   0.04306774
  0.33290503  0.80163794  0.01937079  0.19003982  0.72733849  0.34951356
  1.99468306  1.23356115  0.83784936  0.51849612  0.43591889  0.5504359
  0.84351204  0.81898661  0.38176331  0.57372303  0.48332823  0.81326441
  0.30857745  0.38395615  0.6832149   0.65068425  0.63195158  0.52181466
  0.65721197  0.64087668  0.65638089  0.24518578  0.48958696  0.70771214
  0.7810806   0.36986201  0

100%|██████████| 79/79 [00:08<00:00,  9.15it/s, acc1=72, acc5=91.2]


==> Full quantization (W7A7) accuracy: (72.03322784810126, 91.18868670886076)
swapped <class 'torch.ao.nn.quantized.modules.functional_modules.FloatFunctional'>: <class 'torch.ao.nn.quantized.modules.functional_modules.QFunctional'>
swapped <class 'torch.ao.nn.intrinsic.modules.fused.ConvReLU2d'>: <class 'torch.ao.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>
swapped <class 'torch.ao.nn.intrinsic.modules.fused.ConvReLU2d'>: <class 'torch.ao.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>
swapped <class 'torch.nn.modules.conv.Conv2d'>: <class 'torch.ao.nn.quantized.modules.conv.Conv2d'>
swapped <class 'torch.nn.modules.conv.Conv2d'>: <class 'torch.ao.nn.quantized.modules.conv.Conv2d'>
swapped <class 'torch.ao.nn.quantized.modules.functional_modules.FloatFunctional'>: <class 'torch.ao.nn.quantized.modules.functional_modules.QFunctional'>
swapped <class 'torch.ao.nn.intrinsic.modules.fused.ConvReLU2d'>: <class 'torch.ao.nn.intrinsic.quantized.modules.conv_relu.ConvReLU2d'>
s

In [None]:
print('testing quantized model')
qmodel.to(torch.device('cpu'))
acc1, acc5 = quantizer.test(model=qmodel, dataloader=dataloaders['test'], device=torch.device('cpu'))

testing quantized model


100%|██████████| 79/79 [01:15<00:00,  1.04it/s, acc1=72.3, acc5=91.1]


In [None]:
print('testing full precision model')
model.to(torch.device('cpu'))
acc1, acc5 = quantizer.test(model=model, dataloader=dataloaders['test'], device=torch.device('cpu'))

testing full precision model


100%|██████████| 79/79 [02:54<00:00,  2.21s/it, acc1=72.5, acc5=91.5]


In [None]:
import os
def print_model_size(model):
    torch.save(model.state_dict(), "temp.p")
    print(f'Size: {os.path.getsize("temp.p")/1e6:.2f} MB')
    os.remove('temp.p')

print_model_size(qmodel)
print_model_size(model)

Size: 23.84 MB
Size: 95.12 MB


In [None]:
torch.save(qmodel.state_dict(), "quantized_res50_c100.pth")