## LAPQ
This notebook demonstrates the implimentation of the paper [Loss Aware Post-training Quantization](https://arxiv.org/abs/1911.07190)

### Steps to quantize the pretrained model
- Load the dataset and create dataloader. A subset of training data is used for calibration.
- Load the pretrained full precision model.
- Load the configurations from the YAML file.
- Create a `LAPQ` object and pass the full precision model, dataloaders and configurations.
- Quantize the model by calling the `compress_model` method.

In [None]:
USE_COLAB = True

if USE_COLAB:
  from google.colab import drive
  drive.mount("/content/drive")
  base_path = "/content/drive/MyDrive/trail"
else:
  base_path = "../../../.."

library_path = base_path + "/trailmet"
requirements_path = library_path + "/requirements.txt"
config_path = library_path + "/experiments/quantization/LAPQ/lapq_config.yaml"
weights_path = base_path + "/weights/resnet50_cifar100_pretrained.pth"

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
%pip install -q -r $requirements_path

In [None]:
import sys
sys.path.append(library_path)

In [None]:
import yaml
import torch
from torch.utils.data import DataLoader
from torchvision import transforms
from trailmet.datasets.classification import DatasetFactory
from trailmet.models import resnet, mobilenet
from trailmet.algorithms import quantize

## Datasets

### Augmentations

In [None]:
stats = ((0.5071, 0.4867, 0.4408), (0.2675, 0.2565, 0.2761))

train_transform = transforms.Compose([
    transforms.RandomCrop(32, padding=4, padding_mode='reflect'),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize(*stats, inplace=True)
])
val_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])
test_transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(*stats)
])

input_transforms = {
    'train': train_transform,
    'val': val_transform,
    'test': test_transform}

target_transforms = {
    'train': None,
    'val': None,
    'test': None}

### Load Datasets

In [None]:
cifar100_dataset = DatasetFactory.create_dataset(
        name = 'CIFAR100',
        root = './data',
        split_types = ['train', 'val', 'test'],
        val_fraction = 0.2,
        transform = input_transforms,
        target_transform = target_transforms)

# getting the size of the different splits
print('Train samples: ',cifar100_dataset['info']['train_size'])
print('Val samples: ',cifar100_dataset['info']['val_size'])
print('Test samples: ',cifar100_dataset['info']['test_size'] )

Files already downloaded and verified
Files already downloaded and verified
Files already downloaded and verified
Train samples:  40000
Val samples:  10000
Test samples:  10000


### Define Dataloaders

In [None]:
train_loader = DataLoader(
        cifar100_dataset['train'], batch_size=128,
        sampler=cifar100_dataset['train_sampler'],
        num_workers=0)
val_loader = DataLoader(
        cifar100_dataset['val'], batch_size=128,
        sampler=cifar100_dataset['val_sampler'],
        num_workers=0)
test_loader = DataLoader(
        cifar100_dataset['test'], batch_size=128,
        sampler=cifar100_dataset['test_sampler'],
        num_workers=0)

dataloaders = {"train": train_loader, "val": val_loader, "test": test_loader}

print('No. of training batches: ', len(dataloaders['train']))
print('No. of validation batches: ', len(dataloaders['val']))
print('No. of test batches: ', len(dataloaders['test']))

No. of training batches:  313
No. of validation batches:  79
No. of test batches:  79


### Load Pretrained Model

In [None]:
model = resnet.make_resnet50(100,32)
checkpoint = torch.load(weights_path, map_location='cuda:0')
model.load_state_dict(checkpoint['state_dict'])

<All keys matched successfully>

### Load Method Config

In [None]:
# with open(config_path, 'r') as f:
#     config_all = yaml.safe_load(f)
#     config = config_all['GENERAL']

config = {
    'w_bits' : 8,
    'a_bits' : 8,
    'reduce_range': True,
    'act_quant': True,
    'max_iter': 2000,
    'max_fev': 2000,
    'calib_bs': 256,
    'calib_size': 1024,
    'seed': 42,
    'gpu_id': 0
}

### Quantization Method: LAPQ

In [None]:
quantizer = quantize.lapq.LAPQ('resnet', dataloaders, **config)
qmodel = quantizer.compress_model(model)

100%|██████████| 79/79 [00:15<00:00,  4.99it/s, acc1=72.5, acc5=91.5]


==> Full Precision Model: acc@1 72.518 | acc@5 91.525


100%|██████████| 79/79 [00:08<00:00,  9.10it/s, acc1=71.8, acc5=91.1]


==> Quantization accuracy before LAPQ: acc@1 71.756 | acc@5 91.149


100%|██████████| 20/20 [01:00<00:00,  3.02s/it, loss=0.0998, p_val=3.9]


==> using p-val : 3.481  with lp-loss : 0.093


100%|██████████| 79/79 [00:11<00:00,  6.91it/s, acc1=72.2, acc5=91.5]


==> Quantization accuracy before optimization: acc@1 72.241 | acc@5 91.535
==> Starting Powell Optimization


100%|██████████| 2000/2000 [04:43<00:00,  7.06it/s, curr_loss=0.092, min_loss=0.092]


==> Optimization completed with status: False
==> Optimized alphas :
 [ 0.32192663  0.4921856   0.44087177  0.9391082   0.95024213  0.35209557
  0.56010642  0.8362113   0.14528238  0.6173208   1.22021341  0.46791642
  0.25910885  0.74059974  0.45257379  0.14750679  0.15402703  1.13943211
  0.17752448  0.38750959  1.09536373  0.26694447  0.21156799  0.70860321
  0.36488586  0.12406421  0.50658086  0.28335763  0.12680141  0.18381498
  0.83998023  0.10583398  0.16331125  0.65832013  0.09808584  0.16091051
  0.84497283  0.12403724  0.20360942  0.679278    0.17891735  0.26773851
  0.54199507  1.34550151  0.21835104  1.16213213  0.40100812  0.04182439
  0.32693615  0.75804108  0.01931836  0.23854705  0.77586533  0.35659174
  1.99467082  1.18234275  0.75984839  0.50286948  0.43017     0.54280295
  0.80808552  0.7176247   0.37142049  0.5562059   0.46602321  0.71515874
  0.30534324  0.37335814  0.67148537  0.62658289  0.61451279  0.52567379
  0.63642817  0.61713725  0.63287643  0.24550927  0.49

100%|██████████| 79/79 [00:08<00:00,  8.95it/s, acc1=72, acc5=91.2]


==> Final LAPQ quantization accuracy: 72.004 | 91.228


In [None]:
print('testing quantized model')
qmodel.to(torch.device('cpu'))
acc1, acc5 = quantizer.test(model=qmodel, dataloader=dataloaders['test'], device=torch.device('cpu'))

testing quantized model


100%|██████████| 79/79 [01:26<00:00,  1.10s/it, acc1=72, acc5=91.2]


In [12]:
print('testing full precision model')
model.to(torch.device('cpu'))
acc1, acc5 = quantizer.test(model=model, dataloader=dataloaders['test'], device=torch.device('cpu'))

testing full precision model


100%|██████████| 79/79 [03:25<00:00,  2.61s/it, acc1=72.5, acc5=91.5]


In [13]:
import os
def print_model_size(model):
    torch.save(model.state_dict(), "temp.p")
    print(f'Size: {os.path.getsize("temp.p")/1e6:.2f} MB')
    os.remove('temp.p')

print_model_size(qmodel)
print_model_size(model)

Size: 23.84 MB
Size: 95.12 MB


In [14]:
torch.save(qmodel.state_dict(), "quantized_res50_c100.pth")