Skip to content

Conversation

Ruomei
Copy link
Contributor

@Ruomei Ruomei commented Apr 30, 2021

This PR adds support for Pruning-Clustering-preserving Quantization Aware Training (PCQAT). Aiming at preserving the sparsity and unique weights in the output optimized model, fixed pruning masks and the stochastic updates of clustering training variables are enabled during quantization-aware training.

User API:

  preserve_sparsity = True
  quant_aware_annotate_model = quantize.quantize_annotate_model(pruned_clustered_model)
  pcqat_model = quantize.quantize_apply(
      quant_aware_annotate_model,
      scheme=default_8bit_cluster_preserve_quantize_scheme
      .Default8BitClusterPreserveQuantizeScheme(preserve_sparsity))

Main changes:

  • cluster_preserve_integration_test: edge cases (e.g. passing non-pruned models, passing models with uniform weights, etc.), checks for the updates of trainable variables between epochs
  • mnist_prune_cluster_preserve_qat_test: minimal e2e mnist example to show benefits of PCQAT with the most common two types of configurations
  • cluster_preserve_quantize_registry: main implementation of PCQAT algo.

Initial results (pruning sparsity: 50%, number of clusters: 8 (DS-CNN-L), 16 (Mobilenet_v1)):

Model Items Baseline Pruned Model QATed Pruned_Clustered Model PCQATed Model
DS-CNN-L FP32 Top1 Accuracy 95.06% 94.07% (Fake INT8) 94.85% 93.76% (Fake INT8) 94.28%
INT8 full integer quantization 94.35% 93.80% 94.82% 93.21% 94.06%
INT8 .tflite gzip compression (bytes) 506400 -> 425006 506400 -> 317937 507296 -> 424368 506400 -> 205333 507296 -> 201744
Mobilenet_v1 (ImageNet) FP32 Top1 Accuracy 70.98% 70.49% (Fake INT8) 70.88% 67.64% (Fake INT8) 67.80%
INT8 full integer quantization 70.37% 69.85% 70.87% 66.89% 68.63%
INT8 .tflite gzip compression (bytes) 4665552 -> 3886236 4665552 -> 2909148 4569416 -> 3808781 4665552 -> 2013010 4569472 -> 1943957

@google-cla google-cla bot added the cla: yes PR contributor has signed CLA label Apr 30, 2021
@github-actions github-actions bot added technique:clustering Regarding tfmot.clustering.keras APIs and docs technique:qat Regarding tfmot.quantization.keras (for quantization-aware training) APIs and docs labels Apr 30, 2021
Copy link
Contributor

@akarmi akarmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a minor request below.

@Ruomei Ruomei force-pushed the toupstream/pcqat branch from 3268faf to ea49b27 Compare May 24, 2021 16:23
@akarmi akarmi self-requested a review May 25, 2021 08:45
Copy link
Contributor

@akarmi akarmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you.

@akarmi akarmi requested a review from daverim May 27, 2021 09:15
@Ruomei
Copy link
Contributor Author

Ruomei commented Jun 3, 2021

Hi @daverim, recently we have removed the dependency of this PR, so it is ready for review anytime. Could you please take a look when you have time?
Thanks!

Copy link
Collaborator

@daverim daverim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small linting issues.

@Ruomei Ruomei force-pushed the toupstream/pcqat branch from ea49b27 to 0cbc0c2 Compare June 4, 2021 15:01
@daverim daverim added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Jun 7, 2021
@Xhark
Copy link
Member

Xhark commented Jun 10, 2021

Hi, Just curious why Pruned_Clustered Model - Mobilenet_v1 (ImageNet) - INT8 .tflite gzip compression (bytes) is so small? PCQAT model is larger than PC model? or missed a digit for this case?

@wwwind
Copy link
Contributor

wwwind commented Jun 10, 2021

Hi @Xhark Yes, this is the mistake, last digit is missing. The compression ratio was around 2.3 in our experiments.
Thanks for noticing.

@Ruomei
Copy link
Contributor Author

Ruomei commented Jun 10, 2021

Thanks, @Xhark, the number is now updated.

@Ruomei
Copy link
Contributor Author

Ruomei commented Jun 10, 2021

Hi, @daverim and @Xhark, could you please also let us know whether there is anything we can do to help with the failed internal checks shown in this PR?
@akarmi @wwwind for visibility
Thanks all!

@daverim
Copy link
Collaborator

daverim commented Jun 11, 2021 via email

@Ruomei
Copy link
Contributor Author

Ruomei commented Jun 11, 2021

Merging was blocked by build file strict dependencies -- resubmitting with it fixed myself now, should be merged today.

Brill, thanks a lot, David.

@copybara-service copybara-service bot merged commit 812ea04 into tensorflow:master Jun 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR contributor has signed CLA ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. technique:clustering Regarding tfmot.clustering.keras APIs and docs technique:qat Regarding tfmot.quantization.keras (for quantization-aware training) APIs and docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants