Enable differentiable training and update cluster indices #519

Ruomei · 2020-08-17T21:00:41Z

Motivation:

In the current clustering implementation, the original weights of the clustered layers and the cluster indices are not updated during each training step. Tho the training process alters the values of the cluster centroids, due to the changes in other non-clustered layers, which are reflected in the gradients during training, the non-updated original weights will not always match the constantly updated cluster centroids and will create problems during training. In order to fix this issue, we want to update the original weights after the backpropagation. After that, using the updated centroids, the indices should be re-generated in the next training step. In this PR, changes are made and the unit tests for them are created too.

Details of the implementation:
As shown in the figure below, in the forward pass of our current clustering implementation, first, it uses density-based or linear methods to initialize the centroids (c) for the weights of each layer. Then, the original set of weights (W) are grouped into several clusters using the centroid values. Afterward, the association between the weights and the centroids is calculated based on c and W as indices. Finally, for a single cluster, the centroid value will be shared among all the weights and used in the forward pass instead of the original weights.

In the current backpropagation, the clustered weights will get the gradients from the layer being wrapped. These gradients will be fed into the node gather. Then, the gather node groups all the gradients by indices and accumulates them as the gradients of the centroids. However, due to the non-differentiable node tf.math.argmin, no gradients will be calculated for original weights W by automatic differentiation in TensorFlow.

how to update the original weights?
A small modification (gradient approximations using the straight-through estimator [1]) of the training graph is used to override the gradient during backpropagation like this:
clustered_weights = tf.gather(cluster_centroids, indices)*tf.sign(original_weights + 1e+6)
In the forward pass, the multiply in the graph does not change the graph (tf.sign gives out identity matrix) but in the backpropagation, the multiply is changed into add and the tf.sign is changed into identity via tf.custom_gradient. Essentially, the graph becomes:
clustered_weights = tf.gather(cluster_centroids, indices)+tf.identity(original_weights + 1e+6)
In this way, original weights can be updated by the automatic differentiation in TensorFlow.
how to update cluster indices?
Indices are not differentiable themselves and they are calculated only in the forward pass during training. Therefore, they are updated using tf.assign specifically in the forward pass in the call function. This will lead to some extra change for using tf.distribute, which has not been covered in this PR.

Result table:
As shown in the table below, the changes in this PR significantly improve the accuracy when the number of clusters is small and give limited benefit for other configurations.

Model	Number of clusters	tfmot	tfmot+this PR	delta
Mobilenet_v1	full model (all 64)	65.03%	66.65%	1.62%
		3.11 MB	3.06 MB	-0.05 MB
	selective clustering (32 32 32)	49.72%	68%	18.28%
		7.17 MB	6.99 MB	-0.18 MB
	selective clustering (256 256 32)	70.16%	69.32%	-0.84%
		8.32 MB	7.68 MB	-0.64 MB
Mobilenet_v2	full model (all 32)	68.26%	69.09%	0.83%
		2.65 MB	2.64 MB	-0.01 MB
	selective clustering (8 8 8)	35.05%	67.28%	32.23%
		6.25 MB	6.23 MB	-0.02 MB
	selective clustering (16 16 16)	67.10%	70.94%	3.84%
		6.59 MB	6.42 MB	-0.17 MB
	selective clustering (256 256 32)	72.3%	72.30%	0
		7.31 MB	7.18 MB	-0.13 MB
DS-CNN-L	full model (all 32)	94.77%	94.86%	0.09%
		0.33 MB	0.33 MB	0 MB
	full model (all 8)	73.51%	86.83%	13.32%
		0.19 MB	0.19 MB	0 MB

Reference:
[1] Y. Bengio, N. Leonard, and A. Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.

benkli01

Looks good to me.

Ruomei · 2020-08-19T21:19:51Z

Hi @alanchiao and @akarmi, I have just filled in all the results in the description. Could you please take a look at the PR and let me know your thoughts? Also, not sure how long the description should be?
Thanks, @benkli01, for reviewing.

akarmi

Thank you. Looks good to me.

alanchiao · 2020-08-26T19:33:00Z

As noted in the call, I'll take a look at this, but after it merges when I have the time.

akarmi · 2020-09-17T08:24:28Z

@alanchiao, could you check what is holding up this PR please?

alanchiao · 2020-09-17T21:07:26Z

Yes I am.

-- a248f89 by Ruomei Yan <ruomei.yan@arm.com>: Enable differentiable training and update cluster indices COPYBARA_INTEGRATE_REVIEW=#519 from Ruomei:toupstream/enable_differentiable_training a248f89 PiperOrigin-RevId: 333108062

Ruomei · 2020-09-23T14:44:24Z

Closed this PR since it had already been merged by a different commit.

googlebot added the cla: yes PR contributor has signed CLA label Aug 17, 2020

github-actions bot added the technique:clustering Regarding tfmot.clustering.keras APIs and docs label Aug 17, 2020

benkli01 approved these changes Aug 18, 2020

View reviewed changes

Ruomei force-pushed the toupstream/enable_differentiable_training branch from 16ad7a9 to a248f89 Compare August 20, 2020 12:41

akarmi approved these changes Aug 26, 2020

View reviewed changes

akarmi added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Aug 26, 2020

Enable differentiable training and update cluster indices

51d4c22

Ruomei force-pushed the toupstream/enable_differentiable_training branch from a248f89 to 51d4c22 Compare August 27, 2020 16:46

Ruomei mentioned this pull request Sep 4, 2020

Add support for tf.distribute after enabling the update of cluster indices #531

Merged

Ruomei closed this Sep 23, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable differentiable training and update cluster indices #519

Enable differentiable training and update cluster indices #519

Uh oh!

Ruomei commented Aug 17, 2020 •

edited

Loading

Uh oh!

benkli01 left a comment

Uh oh!

Ruomei commented Aug 19, 2020

Uh oh!

akarmi left a comment

Uh oh!

alanchiao commented Aug 26, 2020

Uh oh!

akarmi commented Sep 17, 2020

Uh oh!

alanchiao commented Sep 17, 2020

Uh oh!

Ruomei commented Sep 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Enable differentiable training and update cluster indices #519

Enable differentiable training and update cluster indices #519

Uh oh!

Conversation

Ruomei commented Aug 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

benkli01 left a comment

Choose a reason for hiding this comment

Uh oh!

Ruomei commented Aug 19, 2020

Uh oh!

akarmi left a comment

Choose a reason for hiding this comment

Uh oh!

alanchiao commented Aug 26, 2020

Uh oh!

akarmi commented Sep 17, 2020

Uh oh!

alanchiao commented Sep 17, 2020

Uh oh!

Ruomei commented Sep 23, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ruomei commented Aug 17, 2020 •

edited

Loading