Model-Compression-DAQ

This repo implements a new method called DAQ (Divide-and-Quantize) which essentially divides big weight matrices into flexible chunks and quantizes them separately.

Usage

Installation

To install from source and develop locally:

git clone https://github.com/Luccadoremi/Model-Compression-DAQ.git
cd hat
pip install --editable .

Training

1. Train a SuperTransformer (HAT [paper])

The SuperTransformer is a supernet that contains many SubTransformers with weight-sharing. By default, we train WMT tasks on 8 GPUs. Please adjust --update-freq according to GPU numbers (128/x for x GPUs). Note that for IWSLT, we only train on one GPU with --update-freq=1.

python train-our.py --configs=configs/[task_name]/supertransformer/[search_space].yml
# for example
python train-our.py --configs=configs/wmt14.en-de/supertransformer/space0.yml
# another example
CUDA_VISIBLE_DEVICES=0,1,2,3 python train-our.py --configs=configs/wmt14.en-fr/supertransformer/space0.yml --update-freq=32

In the --configs file, SuperTransformer model architecture, SubTransformer search space and training settings are specified.

3. Train a Searched SubTransformer (Training with Quantization Noise for Extreme Model Compression [paper])

For details please check the script.

# run with default arguments 
./train.sh

# for example this will run a subtransformer training with quantization noise
./train.sh our quant_noise

# this will quantized all weights for details check corresponding yml files
./train.sh our post_quant-quant_noise-n5

# to provide model.yml for a dataset, train.sh can be run like following
# ./train.sh <ARCH> <COMMON.YML-TYPE> <GPUs> <DATASET> <MODEL.YML>
./train.sh our post_quant-quant_noise-n5 0,1 iwslt14.de-en HAT_iwslt14deen_titanxp@168.8ms_bleu@34.8.yml

Test BLEU (SacreBLEU) score:

For details please check the script.

# run with default arguments
./test.sh

# Calculate BLEU score for non-quantized model
./test.sh our quant_noise

# Calculate BLEU score for a quantized model (you need to provide quantization config path)
./test.sh our post_quant-quant_noise-n5 configs/iwslt14.de-en/subtransformer/pq-quantization-n5.yml

Dependencies

Python >= 3.6
PyTorch >= 1.0.0
configargparse >= 0.14
New model training requires NVIDIA GPUs and NCCL
sklearn

Roadmap

Use 4 bit to encode assignments using more buckets (for now its 8 bits)
Shared centroids accross the layers
1D weight resampling https://github.com/adefossez/julius/

Licence

This repository is released under the MIT license. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.ipynb_checkpoints		.ipynb_checkpoints
configs		configs
fairseq.egg-info		fairseq.egg-info
fairseq		fairseq
latency_dataset		latency_dataset
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
average_checkpoints.py		average_checkpoints.py
download_model.py		download_model.py
evo_search.py		evo_search.py
generate.py		generate.py
latency_dataset.py		latency_dataset.py
latency_predictor.py		latency_predictor.py
measure_lat.py		measure_lat.py
preprocess.py		preprocess.py
run_cp_checkpoint.py		run_cp_checkpoint.py
run_train_roma.py		run_train_roma.py
score.py		score.py
setup.py		setup.py
test.sh		test.sh
train-fake.py		train-fake.py
train-our.py		train-our.py
train.sh		train.sh

License

YangWang92/Model-Compression-DAQ

Folders and files

Latest commit

History

Repository files navigation

Model-Compression-DAQ

Usage

Installation

Training

1. Train a SuperTransformer (HAT [paper])

3. Train a Searched SubTransformer (Training with Quantization Noise for Extreme Model Compression [paper])

Test BLEU (SacreBLEU) score:

Dependencies

Roadmap

Licence

Acknowledgements

fairseq, HAT

About

Resources

License

Stars

Watchers

Forks

Languages