Added kmeans++ initialization to clustering API #443

SaoirseARM · 2020-06-26T12:02:11Z

Added Kmeans++ centroid initialization to the clustering API as an alternative configurable initialization method to density-based, linear and random.
Allowing the heuristic to determine the number of retries (num_retries_per_sample=-1) and a seed of 9 to match closely with the default scikit-learn kmeans++ implementation.

Experimental results:

Model	Initialization	# of clusters	Top-1 accuracy (%)	Configuration	Size of compressed .tflite (MB)
MobileNetV2	Linear	32	69.33	Full model (except DepthwiseConv2D layers)	2.60 Mb
MobileNetV2	kmeans++	32	71.462%	Full model (except DepthwiseConv2D layers)	3.13 Mb
MobileNetV2	Linear	256,256,32	72.31%	Selective, last 3 Conv2D layers	7.00 Mb
MobileNetV2	kmeans++	256,256,32	72.29%	Selective, last 3 Conv2D layers	7.14 Mb
MobileNetV1	Linear	64	66.07%	Full model (except DepthwiseConv2D layers)	2.98 Mb
MobileNetV1	kmeans++	64	70.53%	Full model (except DepthwiseConv2D layers)	4.19 Mb
MobileNetV1	Linear	256,256,32	70.62	Selective, last 3 Conv2D layers	8.42 Mb
MobileNetV1	kmeans++	256,256,32	70.792%	Selective, last 3 Conv2D layers	8.91 Mb
DS-CNN	Linear	32	94.7%	Full model	0.35 Mb
DS-CNN	kmeans++	32	94.7%	Full model	0.3 Mb

MobileNet-V2 Linear vs. Kmeans++ convergence comparison

alanchiao · 2020-06-26T15:55:54Z

@SaoirseARM: for ease of comparison, could you also copy over the prior linear initialization results from here into your PR description? I'm assuming that the results from the current index.md are with linear (@arovir01).

SaoirseARM · 2020-06-26T16:37:37Z

Thanks @alanchiao , I have updated the description above.
Please note, MobilenetV1 results for the full model cannot be directly compared as Kmeans++ was run with 32 clusters rather than 64. I can run this experiment also if needed.

alanchiao · 2020-06-26T17:52:41Z

@akarmi, @arovir01

arovir01 · 2020-06-29T13:47:03Z

MobilenetV1 results for the full model cannot be directly compared as Kmeans++ was run with 32 clusters rather than 64. I can run this experiment also if needed.

I believe we should run the experiment with the same number of clusters to get comparable results. It might be better for consistency to get the results for MobileNetV1 with 32 clusters using the linear initialization, instead of 64 with kmeans++ -- that way all the full model clustering experiments will have 32 clusters across the board.

Ruomei · 2020-06-29T18:58:43Z

@SaoirseARM I suspect we did not use the same scripts for testing mobilenet_v1. Please let me know if you want to talk about this.
@akarmi Sorry that I really don't have enough time to check this change in detail just now. Will get back to it as soon as I can.

Ruomei · 2020-07-01T09:39:43Z

@akarmi We had talked and the tests are in sync now. As mentioned by Stanley yesterday, the speed of convergence and the benefit for memory usage is also worth checking for k-means++. Saoirse is talking to him I believe.

SaoirseARM · 2020-07-20T12:47:39Z

Hi, Updated the description to reflect test result updates and speed of convergence graph. Thanks.

akarmi

Thank you. I think we should also add the relevant API documentation, e.g. to cluster_weights(). It looks good to me otherwise.

As discussed off-line, we are likely to hold off merging this change until after the initial release. In the meantime, @alanchiao, do you have any other comments on this?

alanchiao · 2020-07-20T16:56:41Z

Looks good!

alanchiao · 2020-08-07T17:35:06Z

Please go ahead and attach the ready to pull label. We'll try using that as the process going forward for when the reviewer(s) considers the PR ready to merge.

alanchiao · 2020-08-14T18:02:52Z

@SaoirseARM: I've noticed now that this change didn't come with an update to RELEASE.md (top-level directory) for release notes. Will merge now but please send a PR to update them.
In the future, at least with the current process, anything significant in terms of features / bugs should go there, together with the fixed PRs.

alanchiao · 2020-08-14T18:05:58Z

It'd also be good to follow with documentation changes. Keep in mind though that https://www.tensorflow.org/model_optimization/guide/clustering and other pages currently are not version specific, so the k-means functionality won't be available unless they install from source.

googlebot added the cla: yes PR contributor has signed CLA label Jun 26, 2020

akarmi self-requested a review July 20, 2020 13:47

akarmi suggested changes Jul 20, 2020

View reviewed changes

SaoirseARM force-pushed the toupstream/cluster_kmeans_plus_plus branch from 704f9ab to 5e3cbf8 Compare July 20, 2020 14:56

SaoirseARM force-pushed the toupstream/cluster_kmeans_plus_plus branch 2 times, most recently from 163dea8 to 4042968 Compare July 22, 2020 11:16

Added kmeans++ initialization to clustering API

efe9af4

SaoirseARM force-pushed the toupstream/cluster_kmeans_plus_plus branch from 4042968 to efe9af4 Compare July 22, 2020 11:27

alanchiao self-requested a review August 7, 2020 17:35

alanchiao approved these changes Aug 7, 2020

View reviewed changes

akarmi approved these changes Aug 12, 2020

View reviewed changes

akarmi added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Aug 12, 2020

copybara-service bot merged commit 168c113 into tensorflow:master Aug 14, 2020

MatteoArm mentioned this pull request Aug 19, 2020

Add Anti Zero-Drift functionality for Sparsity-Aware clustering #520

Merged

Added kmeans++ initialization to clustering API #443

Added kmeans++ initialization to clustering API #443

Uh oh!

Conversation

SaoirseARM commented Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alanchiao commented Jun 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SaoirseARM commented Jun 26, 2020

Uh oh!

alanchiao commented Jun 26, 2020

Uh oh!

arovir01 commented Jun 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Ruomei commented Jun 29, 2020

Uh oh!

Ruomei commented Jul 1, 2020

Uh oh!

SaoirseARM commented Jul 20, 2020

Uh oh!

akarmi left a comment

Choose a reason for hiding this comment

Uh oh!

alanchiao commented Jul 20, 2020

Uh oh!

alanchiao commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alanchiao commented Aug 14, 2020

Uh oh!

alanchiao commented Aug 14, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

SaoirseARM commented Jun 26, 2020 •

edited

Loading

alanchiao commented Jun 26, 2020 •

edited

Loading

arovir01 commented Jun 29, 2020 •

edited

Loading

alanchiao commented Aug 7, 2020 •

edited

Loading