-
Notifications
You must be signed in to change notification settings - Fork 332
Description
System information
- TensorFlow version (you are using): TF.2.3.0
- Are you willing to contribute it (Yes/No): Yes
Motivation
Sparse (pruned), clustered and quantized models can help to reduce the compressed model sizes for deployment to devices that may be constrained by memory size and processing power. These model optimizations are already supported by the Model Optimization Toolkit and their benefits are already documented elsewhere. Combining these optimizations in a single model can, currently, only be done at the post-training stage, with a subsequent loss of accuracy. Applying fine-tuning to the resulting model, in order to recover the accuracy, will modify the weights so that the sparsity and clustering benefits are lost.
The use case is to be able to perform Quantization-Aware-Training without losing the sparsity or clustering properties of models that have already been optimized by these other techniques, while benefiting from the improved accuracy of the fine-tune training process.
Describe the feature
This feature proposes the addition of new Quantizer classes (as used in Quantization-Aware-Training) along with their associated QuantizeConfig classes. Specifically, the following new Quantizer derived classes should be added:
- Prune Quantizer - To preserve zero weights during the training process.
- Cluster Quantizer - To preserve the number of unique weights (centroids) during training.
- Prune and Cluster Quantizer - To preserve the zero weights and the number of unique weights during training.
- QuantizeConfig derived classes for each of the above.
Describe how the feature helps achieve the use case
Quantizer classes that preserve the sparsity and clustering properties of a model that has already been optimized by these other techniques will allow the fine-tuning of the model during the quantization process, which will help to maintain the accuracy of the optimized model.
Describe how existing APIs don't satisfy your use case (optional if obvious)
It is not currently possible to preserve the sparsity or clustering of a model during the training process.