Skip to content

Conversation

MatteoArm
Copy link
Contributor

@MatteoArm MatteoArm commented Aug 18, 2020

The changes:

  • Implemented the zero-centroid initialization for all clustering methods
  • Implemented the sparsity masks for forward and backward propagation
  • Added preserve_sparsity class member to ClusterWeights to make
    sparsity preservation optional for all clustering methods
  • Refactored AbstractCentroidsInitialisation to include zero-centroid
    initialization for all init types
  • Added unit tests around the new changes

Description: this PR introduces a feature called Sparsity-Aware Clustering

In the model optimization workflow, clustering typically follows pruning. Therefore, the clustering operation needs to ensure that the level of sparsity is not destroyed during the fine-tuning process.

To avoid sparsity degradation, we introduced two new operations:

  1. We insert sparsity preserving nodes consisting of an element-wise multiplication of the weights with a binary sparsity mask, which ensures that zero-weights stay at zero during re-training.
    The sparsity masks are initialized when the clustering wrapper is created (right after pruning) and kept constant during re-training.
    The sparsity masks are used when the weights are updated during both forward and backward passes.
  2. Sparsity preservation needs to be considered during cluster initialization as well. For that, we introduce a simple and effective novel centroid initialization.
    At first, we set one centroid to zero explicitly to preserve sparsity during clustering. The remaining centroids are then proportionally allocated into the positive and negative intervals using the selected initialization method (linear, density-based, etc.). We refer to this technique as the sparsity-aware centroid initialization.

The idea is that the zero-point centroid will be assigned to the weights that have been set to zero by pruning, while the sparsity masks will keep those zero weights constant throughout re-training.

Implementation details:

A new boolean parameter called "preserve_sparsity" has been added to the clustering API to enable/disable sparsity preservation. If sparsity preservation is desired and has to be enabled during clustering, simply setting it to 'True' will enable both the new zero-centroid initialization and the application of the new sparsity masks during forward and back-propagation.

When the cluster wrapper is applied to a layer, a new sanity check is performed: if sparsity preservation is enabled, the minimum allowed number of clusters has to be 2 (instead of 1) to allow for at least a non-zero centroid other than the newly reserved zero centroid.

If sparsity preservation is enabled, a different approach is followed to initialize the centroids: one centroid is always set to zero, and the remaining number of clusters is proportionally allocated among negative and positive values, depending on the initial weight distribution of the layer.
For example, if the selected number of clusters is 32, the chosen initialization strategy is 'linear', and say 40% of the weights are negative while the remaining are positive, the clusters will be initialized such that 12 centroids will be linearly distributed between the minimum weight value and zero (excluded), followed by zero-centroid, then followed the remaining 19 positive centroid will be linearly distributed between zero (excluded) and the maximum weight value.
After the centroids have been initialized and the weights have been clustered, the sparsity masks are generated based on the distribution of the null clustered weights.
The sparsity masks are then stored in the wrapper to be used later during the re-training process.

Experimental results

MobileNet v2: 50% pruning + 32 clusters (all but depthwise conv2d + linear centroid init + NO sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 72.36% +0.00 13.99 Mb 12.99 Mb
TFLite pruned float 66.37% -5.99 13.99 Mb 7.93 Mb
TFLite pruned, clustered float 68.30% -4.06 13.99 Mb 2.65 Mb

Sparsity and clustering results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_0 50.0 0.0 27
CONV_2D_2 50.0 0.0 25
CONV_2D_3 50.0 0.0 25
CONV_2D_5 50.0 0.0 32
CONV_2D_6 50.0 0.0 29
CONV_2D_8 50.0 0.0 30
CONV_2D_10 50.0 0.0 26
CONV_2D_12 50.0 0.0 30
CONV_2D_13 50.0 0.0 28
CONV_2D_15 50.0 0.0 27
CONV_2D_17 50.0 0.0 32
CONV_2D_19 49.99 0.02 29
CONV_2D_21 50.0 0.0 31
CONV_2D_23 50.0 0.0 28
CONV_2D_24 49.99 0.01 31
CONV_2D_26 49.99 0.03 31
CONV_2D_28 49.99 0.02 30
CONV_2D_30 50.0 0.01 27
CONV_2D_32 50.0 0.01 28
CONV_2D_34 49.99 0.03 28
CONV_2D_36 49.99 0.02 28
CONV_2D_38 49.98 0.03 30
CONV_2D_39 49.98 0.04 27
CONV_2D_41 49.98 0.05 32
CONV_2D_43 49.99 0.02 31
CONV_2D_45 49.97 0.05 31
CONV_2D_47 49.98 0.04 27
CONV_2D_49 49.97 0.06 32
CONV_2D_50 49.95 0.11 31
CONV_2D_52 49.95 0.1 25
CONV_2D_54 49.95 0.1 26
CONV_2D_56 49.95 0.1 30
CONV_2D_58 49.95 0.1 31
CONV_2D_60 49.91 0.19 29
CONV_2D_61 49.86 0.28 27
CONV_2D_63 49.53 0.94 32

Notes: Notice how the sparsity is destroyed by clustering

MobileNet v2: 50% pruning + 32 clusters (all but depthwise conv2d + linear centroid init + sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 72.36% +0.00 13.99 Mb 12.99 Mb
TFLite pruned float 66.37% -5.99 13.99 Mb 7.93 Mb
TFLite pruned, clustered float 64.92% -7.44 13.99 Mb 2.32 Mb

Sparsity and clustering results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_0 50.0 50.0 27
CONV_2D_2 50.0 50.0 26
CONV_2D_3 50.0 50.0 25
CONV_2D_5 50.0 50.0 32
CONV_2D_6 50.0 50.0 29
CONV_2D_8 50.0 50.0 30
CONV_2D_10 50.0 50.0 26
CONV_2D_12 50.0 50.0 30
CONV_2D_13 50.0 50.0 29
CONV_2D_15 50.0 50.0 27
CONV_2D_17 50.0 50.0 32
CONV_2D_19 49.99 49.99 30
CONV_2D_21 50.0 50.0 32
CONV_2D_23 50.0 50.0 29
CONV_2D_24 49.99 49.99 31
CONV_2D_26 49.99 49.99 32
CONV_2D_28 49.99 49.99 31
CONV_2D_30 50.0 50.0 28
CONV_2D_32 50.0 50.0 29
CONV_2D_34 49.99 49.99 27
CONV_2D_36 49.99 49.99 27
CONV_2D_38 49.98 49.98 30
CONV_2D_39 49.98 49.98 28
CONV_2D_41 49.98 49.98 32
CONV_2D_43 49.99 49.99 31
CONV_2D_45 49.97 49.97 31
CONV_2D_47 49.98 49.98 28
CONV_2D_49 49.97 49.97 32
CONV_2D_50 49.95 49.95 31
CONV_2D_52 49.95 49.95 25
CONV_2D_54 49.95 49.95 27
CONV_2D_56 49.95 49.95 30
CONV_2D_58 49.95 49.95 31
CONV_2D_60 49.91 49.91 29
CONV_2D_61 49.86 49.86 27
CONV_2D_63 49.53 49.53 32

Notes: Unlike the previous case, notice how the sparsity is preserved during clustering, while keeping the same amount of clusters

MobileNet v2: 50% pruning + 32 clusters (all but depthwise conv2d + kmeans++ centroid init (#443) + NO sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 72.36% +0.00 13.99 Mb 12.99 Mb
TFLite pruned float 66.37% -5.99 13.99 Mb 7.93 Mb
TFLite pruned, clustered float 66.56% -5.80 13.99 Mb 2.69 Mb

Sparsity results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_0 50.0 0.0 32
CONV_2D_2 50.0 0.0 32
CONV_2D_3 50.0 0.0 32
CONV_2D_5 50.0 0.0 32
CONV_2D_6 50.0 0.0 32
CONV_2D_8 50.0 0.0 32
CONV_2D_10 50.0 0.0 32
CONV_2D_12 50.0 0.0 32
CONV_2D_13 50.0 0.0 32
CONV_2D_15 50.0 0.0 32
CONV_2D_17 50.0 0.0 32
CONV_2D_19 49.99 0.02 32
CONV_2D_21 50.0 0.0 32
CONV_2D_23 50.0 0.0 32
CONV_2D_24 49.99 0.01 32
CONV_2D_26 49.99 0.03 32
CONV_2D_28 49.99 0.02 32
CONV_2D_30 50.0 0.01 32
CONV_2D_32 50.0 0.01 32
CONV_2D_34 49.99 0.03 32
CONV_2D_36 49.99 0.02 32
CONV_2D_38 49.98 0.03 32
CONV_2D_39 49.98 0.04 32
CONV_2D_41 49.98 0.05 32
CONV_2D_43 49.99 0.02 32
CONV_2D_45 49.97 0.05 32
CONV_2D_47 49.98 0.04 32
CONV_2D_49 49.97 0.06 32
CONV_2D_50 49.95 0.11 32
CONV_2D_52 49.95 0.1 32
CONV_2D_54 49.95 0.1 32
CONV_2D_56 49.95 0.1 32
CONV_2D_58 49.95 0.1 32
CONV_2D_60 49.91 0.19 32
CONV_2D_61 49.86 0.28 32
CONV_2D_63 49.53 0.94 32

Notes: Notice how the sparsity is destroyed by clustering

MobileNet v2: 50% pruning + 32 clusters (all but depthwise conv2d + kmeans++ centroid init (#443) + sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 72.36% +0.00 13.99 Mb 12.99 Mb
TFLite pruned float 66.37% -5.99 13.99 Mb 7.93 Mb
TFLite pruned, clustered float 65.91% -6.45 13.99 Mb 2.71 Mb

Sparsity results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_0 50.0 55.09 32
CONV_2D_2 50.0 54.69 32
CONV_2D_3 50.0 50.0 32
CONV_2D_5 50.0 50.0 32
CONV_2D_6 50.0 50.0 32
CONV_2D_8 50.0 50.0 32
CONV_2D_10 50.0 50.03 32
CONV_2D_12 50.0 50.0 32
CONV_2D_13 50.0 50.0 32
CONV_2D_15 50.0 50.0 32
CONV_2D_17 50.0 50.0 32
CONV_2D_19 49.99 49.99 32
CONV_2D_21 50.0 50.07 32
CONV_2D_23 50.0 50.0 32
CONV_2D_24 49.99 49.99 32
CONV_2D_26 49.99 49.99 32
CONV_2D_28 49.99 49.99 32
CONV_2D_30 50.0 50.0 32
CONV_2D_32 50.0 50.01 32
CONV_2D_34 49.99 49.98 32
CONV_2D_36 49.99 49.98 32
CONV_2D_38 49.98 49.98 32
CONV_2D_39 49.98 49.98 32
CONV_2D_41 49.98 49.98 32
CONV_2D_43 49.99 49.99 32
CONV_2D_45 49.97 49.97 32
CONV_2D_47 49.98 49.98 32
CONV_2D_49 49.97 49.97 32
CONV_2D_50 49.95 49.95 32
CONV_2D_52 49.95 49.95 32
CONV_2D_54 49.95 49.95 32
CONV_2D_56 49.95 49.95 32
CONV_2D_58 49.95 49.95 32
CONV_2D_60 49.91 49.91 32
CONV_2D_61 49.86 49.86 32
CONV_2D_63 49.53 49.53 32

Notes: Unlike the previous case, notice how the sparsity is preserved during clustering, while keeping the same amount of clusters

MobileNet v1: 50% pruning + 64 clusters (all but depthwise conv2d + linear centroid init + sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 70.77% +0.00 16.12 Mb 15.70 Mb
TFLite pruned float 66.80% -3.97 16.12 Mb 9.10 Mb
TFLite pruned, clustered float 64.95% -5.82 16.12 Mb 2.63 Mb

Sparsity and clustering results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_0 50.0 50.0 50
CONV_2D_2 50.0 50.0 41
CONV_2D_4 50.0 50.0 45
CONV_2D_5 50.0 50.0 44
CONV_2D_8 49.99 49.99 52
CONV_2D_10 49.98 49.98 56
CONV_2D_12 49.97 49.97 50
CONV_2D_14 49.93 49.93 56
CONV_2D_16 49.91 49.91 61
CONV_2D_18 49.93 49.93 44
CONV_2D_20 49.92 49.92 42
CONV_2D_22 49.92 49.92 53
CONV_2D_24 49.84 49.84 49
CONV_2D_26 49.66 49.66 55
CONV_2D_28 49.65 49.65 59

MobileNet v1: 50% pruning + 64 clusters (all but depthwise conv2d + kmeans++ centroid init (#443) + sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 70.77% +0.00 16.91 Mb 15.70 Mb
TFLite pruned float 66.80% -3.97 16.12 Mb 9.10 Mb
TFLite pruned, clustered float 67.07% -3.70 16.12 Mb 3.35 Mb

Sparsity and clustering results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_0 50.0 50.0 64
CONV_2D_2 50.0 51.56 64
CONV_2D_4 50.0 50.0 64
CONV_2D_5 50.0 50.0 64
CONV_2D_8 49.99 49.99 64
CONV_2D_10 49.98 49.98 64
CONV_2D_12 49.97 49.97 64
CONV_2D_14 49.93 49.93 64
CONV_2D_16 49.91 49.91 64
CONV_2D_18 49.93 49.93 64
CONV_2D_20 49.92 49.92 64
CONV_2D_22 49.92 49.92 64
CONV_2D_24 49.84 49.84 64
CONV_2D_26 49.66 49.66 64
CONV_2D_28 49.65 49.65 64

Notes: Notice how the sparsity is preserved during clustering

DS-CNN-L: 50% pruning + 32 clusters (linear centroid init + sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 95.03% +0.00 1.57 Mb 1.50 Mb
TFLite pruned float 94.55% -0.48 1.57 Mb 0.90 Mb
TFLite pruned, clustered float 94.62% -0.41 1.57 Mb 0.29 Mb

Sparsity and clustering results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_1 50.0 50.0 29
CONV_2D_3 49.97 49.97 31
CONV_2D_5 49.98 49.98 32
CONV_2D_7 49.98 49.98 31
CONV_2D_9 49.96 49.96 32
CONV_2D_11 49.97 49.97 29

DS-CNN-L: 50% pruning + 32 clusters (kmeans++ centroid init (#443) + sparsity-aware clustering)

Accuracy and size results:

Optimization Accuracy Delta Unzipped Zipped
TFLite float 95.03% +0.00 1.57 Mb 1.50 Mb
TFLite pruned float 94.55% -0.48 1.57 Mb 0.90 Mb
TFLite pruned, clustered float 94.35% -0.68 1.57 Mb 0.33 Mb

Sparsity and clustering results:

Layer name Sparsity after pruning Sparsity after clustering Number of unique clusters
CONV_2D_1 50.0 50.0 32
CONV_2D_3 49.97 49.97 32
CONV_2D_5 49.98 49.98 32
CONV_2D_7 49.98 49.98 32
CONV_2D_9 49.96 49.96 32
CONV_2D_11 49.97 49.97 32

Notes: Notice how the sparsity is preserved during clustering

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no PR contributor has not signed CLA label Aug 18, 2020
@github-actions github-actions bot added the technique:clustering Regarding tfmot.clustering.keras APIs and docs label Aug 18, 2020
@MatteoArm
Copy link
Contributor Author

@googlebot I signed it!

@akarmi
Copy link
Contributor

akarmi commented Aug 18, 2020

@googlebot I signed it!

@alanchiao, could you please check the CLA bot again? Matteo has contributed to TF before so the check should be passing.

@MatteoArm
Copy link
Contributor Author

Regarding the CLA, it seemed like there was an issue on our side. Now sorted

@MatteoArm
Copy link
Contributor Author

@googlebot I signed it!

@alanchiao alanchiao added the api-review API review needed label Aug 20, 2020
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes PR contributor has signed CLA and removed cla: no PR contributor has not signed CLA labels Aug 20, 2020
@alanchiao
Copy link

@MatteoArm

It's clear that this PR does enable us to preserve sparsity, which is nice.

Related to #513:
However, I'm looking at the results and trying to see significant evidence that we can achieve a better compression rate for the same accuracy (for the benefits, without considering anything related to latency with specialized kernels) by combining sparsity and clustering, as opposed to more aggressively pruning by itself and aggressively clustering by itself. This isn't something I've seen strong evidence for in any paper yet and it's not obvious here either.

As an example, for pruning + clustered for Mobilenet V2 with sparsity-awareness, we see a 7.44% drop to get to 2.32 Mb or a -6.45% to get to 2.71 MB. However, with just clustering, there are results to ~3% accuracy drop to get to 2.98 MB and <1% drop to get to 3.13MB. It may be interesting to pick an approximate target model size based on what would enable you to fit in a smaller sized memory and compare all three.

Until there is evidence, I suggest creating an experimental API (e.g. experimental.cluster_weights) which we can encourage people to run experiments through and try to create better compressed models given the same accuracy. I imagine the results don't factor in Ruomei's PR also with better gradients.

@alanchiao
Copy link

alanchiao commented Aug 26, 2020

To elaborate on the discussion in the call:

For the approach of creating something under clustering.experimental, it should just require a couple of lines of changes on top of what you have currently. The starting point is the nice property that the path of the public API is independent of where the code itself lives. This definition of the tfmot.clustering.keras is here. For the hypothetical change of moving the existing cluster_scope API into experimental, we could just create another subdirectory at api/clustering/keras/experimental move the cluster_scope import into an init_.py file there, without changing anything outside of core/api.

With the actual code outside of core/api, we can add a thin layer by renaming the current cluster_weights fn to _cluster_weights with all of the flags, and then make the experimental.cluster_weights and cluster_weights just call into that. It's probably better to have experimental.cluster_weights live under python/core/clustering/keras/experimental also, but you do have flexibility.

This approach is very safe and equally maintainable imo. It could be that just adding something like experimental_preserve_sparsity=False to the current stable cluster_weights API suffices, but having a separate function is a better degree of separation and I haven't fully thought through the implications of adding a flag yet. From the documentation perspective, it's nice since people who want to focus on clustering just see that when they go to the API docs from the "Clustering" tutorials.

Independently, we can publicize this in a page under tensorflow.org/model_optimization focused on combining techniques, with a link to a Github issue where we encourage people to try it out / share results and model files and see what the response is.

@akarmi
Copy link
Contributor

akarmi commented Aug 27, 2020

Thanks Alan. Agree. We will enable this feature via an experimental API and reach out to the community to have a go at combining two optimization techniques. In the meantime, we will also continue such experiments ourselves.

@MatteoArm MatteoArm force-pushed the feature/sparsity_aware_clustering branch 2 times, most recently from 996c0a9 to 5aa8748 Compare August 28, 2020 10:54
@MatteoArm
Copy link
Contributor Author

I've updated the changes to make the new feature experimental as per suggestions.
The relevant code is now only a single method in experimental/cluster.py that calls a generic private method _cluster_weights (not part of the API) in the original cluster.py file.
To keep things working for the users not interested in sparsity-aware clustering, the API method calls the private _cluster_weights by forcing the new 'preserve_sparsity' argument to 'False'.
The API init files have been updated accordingly.
One last remark: I kept the the cluster_scope() method where it was (in cluster.py) instead of moving it to experimental/cluster.py as suggested, since I'm not quite sure that that's the proper thing to do. It makes more sense to me that cluster_scope() should be consumed from cluster.py, and not from an experimental file (but maybe I misunderstood the suggestion).

def cluster_weights(to_cluster,
number_of_clusters,
cluster_centroids_init,
preserve_sparsity=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering, since perserve_sparsity is available through experimental.cluster, here preserve_sparsity should be fixed inside the function (not available as function arg).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually a mistake! In fact we don't have it in our internal master. I've now removed the preserve_sparsity from the method's signature.

@MatteoArm MatteoArm force-pushed the feature/sparsity_aware_clustering branch from 5aa8748 to 3236ce4 Compare September 8, 2020 08:28
Copy link
Contributor

@akarmi akarmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Just a minor thing to consider.

def testSparsityIsPreservedDuringTraining(self):
"""Verifies that training a clustered model does not destroy the sparsity of the weights."""
original_model = keras.Sequential([
layers.Dense(5, input_shape=(5,)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we guaranteed to have zeros in this model? It may be beneficial to make initial weights deterministic in this test, with the known number and location of zeros.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest revision. I've set the random seed to a value that guarantees that some of the weights are zero for that test (now there are 5 null-weights at each test run).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@akarmi akarmi added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Sep 9, 2020
@MatteoArm MatteoArm force-pushed the feature/sparsity_aware_clustering branch from 63cb72e to 9936522 Compare September 23, 2020 11:14
@akarmi
Copy link
Contributor

akarmi commented Sep 24, 2020

@nutsiepully, @alanchiao, can you please help progress this PR?

 * Implemented the zero-centroid initialization for all clustering methods
 * Implemented the sparsity masks for forward and backward propagation
 * Added preserve_sparsity class member to ClusterWeights to make
   sparsity preservation optional for all clustering methods
 * Refactored AbstractCentroidsInitialisation to include zero-centroid
   initialization for all init types
 * Added unit tests around the new changes
…rimental)

 * Created new experimental API for sparsity-aware clustering
 * Kept the original API implementation
 * Moved the new feature to a new experimental package, making the
   original implementation private
 * Updated the unit tests accordingly
 * Created init and BUILD files for the new experimental package
…rimental)

 * Fixed the signature of the public cluster_weights method
…rimental)

 * Set the random seed in the sparsity preservation test to a specific value to
   make sure that some of the weights are null
@MatteoArm MatteoArm force-pushed the feature/sparsity_aware_clustering branch from 9936522 to a12e20e Compare October 6, 2020 11:09
@akarmi
Copy link
Contributor

akarmi commented Oct 7, 2020

@nutsiepully, any comments on this change?

@akarmi akarmi requested a review from nutsiepully October 7, 2020 08:50
@akarmi akarmi removed the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Nov 16, 2020
@akarmi
Copy link
Contributor

akarmi commented Nov 16, 2020

@alanchiao, @nutsiepully, please help progress this change. We are happy with it on our side. The remaining bit is to align on how this API should be exposed. Following the comments above, we have exposed this API as experimental. If there are no further comments at the moment, it would be good to merge it and proceed as discussed.

@alanchiao
Copy link

alanchiao commented Dec 1, 2020

Looks good to me given exposure as experimental API (tfmot.clustering.keras.experimental.cluster_weights). The most important thing right now is to gauge community interest in this feature by explicitly soliciting for it in an issue and also seeing if any models produced become useful enough on the accuracy / size curve that they'd practically be used - if it isn't large enough, we can remove it given the experimental nature. If it is, we can finalize how it's exposed.

In other threads, @nutsiepully had suggested exposing features like this not as a part of the individual clustering / quantization APIs, but rather as a module targeted towards different forms of combining techniques.

Suppose we had this module (e.g. tfmot.cascade or tfmot.combine or tfmot.optimize), the main thing that's weird is that for certain use cases, they may end up having to use APIs from
both tfmot.thismodule and tfmot.clustering (e.g. some configuration class like the init enum).

@akarmi
Copy link
Contributor

akarmi commented Dec 2, 2020

Thanks Alan.

In other threads, @nutsiepully had suggested exposing features like this not as a part of the individual clustering / quantization APIs, but rather as a module targeted towards different forms of combining techniques.

Suppose we had this module (e.g. tfmot.cascade or tfmot.combine or tfmot.optimize), the main thing that's weird is that for certain use cases, they may end up having to use APIs from
both tfmot.thismodule and tfmot.clustering (e.g. some configuration class like the init enum).

Yes, it would be good to put all techniques that attempt to combine others in one place. However, I agree, that "mixed" API exposure seems weird, and this particular change demonstrates it well. We will come back to this question a bit later. It will be good to come up with a more elegant solution for it, if possible.

@akarmi akarmi added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Dec 2, 2020
@MatteoArm
Copy link
Contributor Author

Hi @alanchiao,

There is an internal check that failed (the "feedback/copybara" one), it seems about the last commit I pushed, but I'm not sure what's the problem with it. Is it an internal Google procedure or is there anything I can do to help fixing it? Thanks

@MatteoArm
Copy link
Contributor Author

Hi @nutsiepully,

I'm forwarding you the request I made to Alan some days ago, as he has now left the team.
Could you please review these changes and perhaps have a look at what's blocking this PR (the "feedback/copybara" step)? Thanks

@daverim daverim added ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. and removed ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. labels Jan 5, 2021
@MatteoArm
Copy link
Contributor Author

Hi @daverim,

Just a friendly nudge.

These changes have been sitting here untouched for quite a while now, do you think they need any more work or can they be merged?

To recap: the idea was to keep this feature aside, in an "experimental" sub-directory (as they are now), to the users to try it out. If the feedback is positive, we can then consider of adding it to the main API.

Thanks and Regards
Matteo

@copybara-service copybara-service bot merged commit e4a5200 into tensorflow:master Jan 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review API review needed cla: yes PR contributor has signed CLA ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. technique:clustering Regarding tfmot.clustering.keras APIs and docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants