## LAPQ
This notebook demonstrates the implimentation of the paper [Loss Aware Post-training Quantization](https://arxiv.org/abs/1911.07190)

### Steps to quantize the pretrained model
- Load the dataset and create dataloader. A subset of training data is used for calibration.
- Load the pretrained full precision model.
- Load the configurations from the YAML file.
- Create a `LAPQ` object and pass the full precision model, dataloaders and configurations.
- Quantize the model by calling the `compress_model` method.

In [1]:
import sys
sys.path.append("./trail/trailmet/")

In [3]:
import yaml
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
from trailmet.datasets.classification import DatasetFactory
from trailmet.models import resnet, mobilenet
from trailmet.algorithms import quantize

## Datasets

### Augmentations

In [4]:
stats = ((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761))

train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4, padding_mode='reflect'),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(*stats, inplace=True)
])
val_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])

input_transforms = {
    'train': train_transform, 
    'val': val_transform, 
    'test': test_transform}

target_transforms = {
    'train': None, 
    'val': None, 
    'test': None}

### Load Datasets

In [5]:
cifar100_dataset = DatasetFactory.create_dataset(
        name = 'CIFAR100', 
        root = './data',
        split_types = ['train', 'val', 'test'],
        val_fraction = 0.2,
        transform = input_transforms,
        target_transform = target_transforms)

# getting the size of the different splits
print('Train samples: ',cifar100_dataset['info']['train_size'])
print('Val samples: ',cifar100_dataset['info']['val_size'])
print('Test samples: ',cifar100_dataset['info']['test_size'] )

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Train samples:  40000
Val samples:  10000
Test samples:  10000


### Define Dataloaders

In [6]:
train_loader = DataLoader(
        cifar100_dataset['train'], batch_size=128, 
        sampler=cifar100_dataset['train_sampler'],
        num_workers=0)
val_loader = DataLoader(
        cifar100_dataset['val'], batch_size=128, 
        sampler=cifar100_dataset['val_sampler'],
        num_workers=0)
test_loader = DataLoader(
        cifar100_dataset['test'], batch_size=128, 
        sampler=cifar100_dataset['test_sampler'],
        num_workers=0)

dataloaders = {"train": train_loader, "val": val_loader, "test": test_loader}

print('No. of training batches: ', len(dataloaders['train']))
print('No. of validation batches: ', len(dataloaders['val']))
print('No. of test batches: ', len(dataloaders['test']))

No. of training batches:  313
No. of validation batches:  79
No. of test batches:  79


### Load Pretrained Model

In [7]:
r50_model = resnet.make_resnet50(100,32)
checkpoint = torch.load("./weights/resnet50_cifar100_pretrained.pth", map_location='cuda:0')
r50_model.load_state_dict(checkpoint['state_dict'])

<All keys matched successfully>

### Load Method Config

In [8]:
with open('./lapq_config.yaml', 'r') as f:
    config = yaml.safe_load(f)
    kwargs = config['GENERAL']
    
kwargs

{'ARCH': 'ResNet50',
 'DATASET': 'CIFAR100',
 'SAVE_PATH': './scales/',
 'GPU_ID': 0,
 'SEED': 42,
 'W_BITS': 4,
 'A_BITS': 8,
 'ACT_QUANT': True,
 'CALIB_BATCHES': 4,
 'MAX_ITER': 1000,
 'MAX_FEV': 1000,
 'VERBOSE': True}

### Quantization Method: BRECQ

In [9]:
quantizer = quantize.lapq.LAPQ(r50_model, dataloaders, **kwargs)

print('testing pretrained model before quantization')
acc1, acc5 = quantizer.test(model=r50_model, dataloader=dataloaders['test'], device=torch.device('cuda:0'))
print(f'top-1 acc: {acc1:.2f}%, top-5 acc: {acc5:.2f}%')

qmodel = quantizer.compress_model()

==> Using seed: 42 and device: cuda:0
testing pretrained model before quantization


100%|██████████████████████████████████████████████| 79/79 [00:11<00:00,  6.73it/s, acc1=72.5, acc5=91.5]


top-1 acc: 72.52%, top-5 acc: 91.53%


100%|██████████████████████████████████████████████| 79/79 [00:11<00:00,  7.02it/s, acc1=13.7, acc5=33.7]


==> Quantization (W4A8) accuracy before LAPQ: 13.7164 | 33.7322


100%|███████████████████████████████████████████████| 10/10 [00:16<00:00,  1.62s/it, loss=0.971, p_val=4]


==> using p intr : 3.19


100%|███████████████████████████████████| 79/79 [00:10<00:00,  7.32it/s, acc1=65.8, acc5=86.7, loss=1.45]


==> Quantization (W4A8) accuracy before Optimization: 65.8327 | 86.6990
==> Loss after LpNormQuantization: 1.4541
==> Starting Powell Optimization


1460it [06:26,  3.78it/s, curr_loss=0.28, min_loss=0.28]                                                 



==> Loss at end of iter [0] : 0.2799

==> Layer-wise Scales :
 [0.50955792 0.18828929 0.31613244 0.81631171 0.42218429 0.366656
 0.55021008 0.38726476 1.41155748 0.10301812 0.24723457 0.85611573
 2.48589166 0.24058036 0.1148315  0.40290593 0.47664084 0.22061472
 0.07745003 0.0908086  0.34169396 0.44500282 0.10184874 0.1226325
 0.45055139 2.31353931 0.12635786 0.12439413 0.37995522 0.42099514
 0.2080612  0.1430494  0.16357016 1.0591869  0.1679621  0.04993571
 0.08009648 0.37081608 0.32296252 0.06324497 0.171397   0.24212186
 1.04833014 0.07766192 0.09171855 0.34700797 0.90700426 0.09967942
 0.10883904 0.47051399 2.91956117 0.08315149 0.12278894 0.42552245
 1.72949481 0.1811553  0.10039172 0.49262293 0.91781381 0.04820398
 0.02332346 1.01650378 1.05462085 1.97667816 0.01142614 0.19098938]


100%|██████████████████████████████████████████████| 79/79 [00:07<00:00, 11.01it/s, acc1=68.4, acc5=88.8]

==> Full quantization (W4A8) accuracy: (68.40387658227849, 88.76582278481013)





In [10]:
print('testing quantized model')
acc1, acc5 = quantizer.test(model=qmodel, dataloader=dataloaders['test'], device=torch.device('cuda:0'))
print(f'top-1 acc: {acc1:.2f}%, top-5 acc: {acc5:.2f}%')

testing quantized model


100%|██████████████████████████████████████████████| 79/79 [00:07<00:00, 10.12it/s, acc1=68.4, acc5=88.8]

top-1 acc: 68.40%, top-5 acc: 88.77%



