Permute, Quantize, and Fine-tune

This repository contains the source code and compressed models for the paper Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks: https://arxiv.org/abs/2010.15703

Our method compresses the weight matrices of the network layers by

Finding permutations of the weights that result in a functionally-equivalent, yet easier-to-compress network,
Compressing the weights using product quantization [1],
Fine-tuning the codebooks via stochastic gradient descent.

We provide code for compressing and evaluating ResNet-18, ResNet-50 and Mask R-CNN.

Requirements

Our code requires Python 3.6 or later. You also need these additional packages:

Additionally, if you have installed Horovod, you may train ResNet with multiple GPUs, but the code will work with a single GPU even without Horovod.

Data

Our experiments require either ImageNet (for classification) or COCO (for detection/segmentation). You should set up a data directory with the datasets.

<your_data_path>
├── coco
│   ├── annotations   (contains      6 json files)
│   ├── train2017     (contains 118287 images)
│   └── val2017       (contains   5000 images)
└── imagenet
    ├── train         (contains   1000 folders with images)
    └── val           (contains   1000 folders with images)

Then, make sure to update the imagenet_path or coco_path field in the config files to point them to your data.

Training ResNet

Besides making sure your ImageNet path is set up, make sure to also set up your output_path in the config file, or pass them via the command line:

python -m src.train_resnet --config ../config/train_resnet50.yaml

The output_path key inside the config file must specify a directory where all the training output should be saved. This script will create 2 subdirectories, called tensorboard and trained_models, inside of the output_path directory.

Launching a tensorboard with the tensorboard directory will allow you observe the training state and behavior over time.

tensorboard --logdir <your_tensorboard_path> --bind_all --port 6006

The trained_models directory will be populated with checkpoints of the saved model after initialization, and then after every epoch. It will also separately store the "best" of these models (the one that attains the highest validation accuracy).

Training Mask R-CNN

Mask R-CNN (with a ResNet-50 backbone) can be trained by running the command:

python -m src.train_maskrcnn --config ../config/train_maskrcnn.yaml

Once again, you need to specify the output_path and the dataset path in the config file before running this.

Pretrained models

We provide the compressed models we learned from running our code at

../compressed_models

All models provided have been compressed with k = 256 centroids

Model (original top-1)	Regime	Comp. ratio	Model size	Top-1 accuracy (%)
ResNet-18 (69.76%)	Small blocks Large blocks	29x 43x	1.54MB 1.03MB	66.74 63.33
ResNet-50 (76.15%)	Small blocks Large blocks	19x 31x	5.09MB 3.19MB	75.04 72.18
ResNet-50 Semi-Supervised (78.72%)	Small blocks	19x	5.09MB	77.19

We also provide a compressed Mask R-CNN model that attains the following results compared to the uncompressed model:

Model	Size	Comp. Ratio	Box AP	Mask AP
Original Mask R-CNN	169.4 MB	-	37.9	34.6
Compressed Mask R-CNN	6.65 MB	25.5x	36.3	33.5

which you may use as given for evaluation.

Evaluating ResNet

To evaluate ResNet architectures run the following command from the project root:

python -m src.evaluate_resnet

This will evaluate a ResNet-18 with small blocks by default. To evaluate a ResNet-18 with large blocks, use

python -m src.evaluate_resnet \
    --model.compression_parameters.large_subvectors True \
    --model.state_dict_compressed ../compressed_models/resnet18_large.pth

For ResNet-50 with small blocks, use

python -m src.evaluate_resnet \
    --model.arch resnet50 \
    --model.compression_parameters.layer_specs.fc.k 1024 \
    --model.state_dict_compressed ../compressed_models/resnet50(_ssl).pth

You may load the resnet50_ssl.pth model, which has been pretrained on an unsupervised dataset as well.

And for ResNet-50 with large blocks, use

python -m src.evaluate_resnet \
    --model.arch resnet50 \
    --model.compression_parameters.pw_subvector_size 8 \
    --model.compression_parameters.large_subvectors True \
    --model.compression_parameters.layer_specs.fc.k 1024 \
    --model.state_dict_compressed ../compressed_models/resnet50_large.pth

Evaluating Mask R-CNN

Simply run the command:

python -m src.evaluate_maskrcnn

to load and evaluate the appropriate model.

Citation

If you use our code, please cite our work:

@inproceedings{martinez_2020_pqf,
  title={Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks},
  author={Martinez, Julieta and Shewakramani, Jashan and Liu, Ting Wei and B{\^a}rsan, Ioan Andrei and Zeng, Wenyuan and Urtasun, Raquel},
  booktitle={CVPR 2021},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
compressed_models		compressed_models
config		config
imgs		imgs
src		src
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

compressed_models

compressed_models

config

config

imgs

imgs

src

src

Dockerfile

Dockerfile

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Permute, Quantize, and Fine-tune

Contents

Requirements

Data

Training ResNet

Training Mask R-CNN

Pretrained models

Evaluating ResNet

Evaluating Mask R-CNN

Citation

References

[1] Product quantization for nearest neighbor search

[2] And the bit goes down: Revisiting the quantization of neural networks

About

Releases

Packages

Languages

License

una-dinosauria/permute-quantize-finetune

Folders and files

Latest commit

History

Repository files navigation

Permute, Quantize, and Fine-tune

Contents

Requirements

Data

Training ResNet

Training Mask R-CNN

Pretrained models

Evaluating ResNet

Evaluating Mask R-CNN

Citation

References

[1] Product quantization for nearest neighbor search

[2] And the bit goes down: Revisiting the quantization of neural networks

About

Resources

License

Stars

Watchers

Forks

Languages