Skip to content

Model compression by constrained optimization, using the Learning-Compression (LC) algorithm

License

Notifications You must be signed in to change notification settings

tjufan/LC-model-compression

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LC-model-compression is a flexible, extensible software framework that allows a user to do optimal compression, with minimal effort, of a neural network or other machine learning model using different compression schemes. It is based on the Learning-Compression (LC) algorithm, which performs an iterative optimization of the compressed model by alternating a learning (L) step with a compression (C) step. This decoupling of the "machine learning" and "signal compression" aspects of the problem make it possible to use a common optimization and software framework to handle any choice of model and compression scheme; all that is needed to compress model X with compression Y is to call the corresponding algorithms in the L and C steps, respectively. The software fully supports this by design, which makes it flexible and extensible. A number of neural networks and compression schemes are currently supported, and we expect to add more in the future. These include neural networks such as LeNet, ResNet, VGG, NiN, etc. (as well as linear models); and compression schemes such as low-rank and tensor factorization (including automatically learning the layer ranks), various forms of pruning and quantization, and combinations of all of those. For a neural network, the user can choose different compression schemes for different parts of the network.

The LC algorithm is efficient in runtime; it does not take much longer than training the reference, uncompressed model in the first place. The compressed models perform very competitively and allow the user to easily explore the space of prediction accuracy of the model vs compression ratio (which can be defined in terms of memory, inference time, energy or other criteria).

LC-model-compression is written in Python and PyTorch, and has been extensively tested since 2017 in research projects at UC Merced. You can find some of these in the examples below, or in our papers about the LC algorithm.

Features

LC-model-compression supports various compression schemes and allows the user to combine them in a mix-and-match way. Some examples:

  • a single compression per layer (e.g. low-rank compression for layer 1 with maximum rank 5)
  • a single compression over multiple layers (e.g. prune 5% of weights in layer 1 and 3, jointly)
  • mixing multiple compressions (e.g. quantize layer 1 and prune jointly layers 2 and 3)
  • additive combinations of compressions (e.g. represent a layer as a quantized value with an additive sparse correction)

At present, we support the following compression schemes:

Scheme Formulation LC-model-compression Class
Quantization Adaptive quantization (with learned codebook)
Binarization into {-1, 1} and {-c, c}
Ternarization into {-c, 0, c}
AdaptiveQuantization
BinaryQuantization, ScaledBinaryQuantization
ScaledTernaryQuantization
Pruning l0/l1 constraint pruning
l0/l1 penalty pruning
ConstraintL0Pruning, ConstraintL1Pruning
PenaltyL0Pruning, PenaltyL1Pruning
Low-rank Low-rank compression to a given rank
Low-rank with automatic rank selection
LowRank
RankSelection

Examples of use

If you want to compress your own models, you can use the following examples as a guide:

Low-rank AlexNet models

We have made available our low-rank AlexNet models from our CVPR2020 paper.

Installation

We recommend installing the dependencies through conda into a new environment:

conda create -n lc_package python==3.7  
conda install -n lc_package numpy==1.16.2 scipy==1.2.1 scikit-learn==0.20.2  

You will need to install PyTorch v1.1 to the same conda environment. In principle, newer PyTorch versions should work to, however we have not fully tested them. The specific installation instruction might differ from system to system, confirm with official site. On our system we used following:

conda install -n lc_package pytorch==1.1.0 torchvision==0.3.0 cudatoolkit=10.0 -c pytorch 

Once the requirements are installed, within the conda environment:

conda activate lc_package
git clone https://github.com/UCMerced-ML/LC-model-compression
pip install -e ./LC-model-compression

Citation

If you find this code useful, please cite it as:

Yerlan Idelbayev and Miguel Á. Carreira-Perpiñán: 
"A flexible, extensible software framework for model compression based on the LC algorithm".
arXiv:2005.07786, May 15, 2020.
http://arxiv.org/abs/2005.07786

About

Model compression by constrained optimization, using the Learning-Compression (LC) algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages

  • Python 100.0%