Skip to content

Conversation

Ruomei
Copy link
Contributor

@Ruomei Ruomei commented Mar 31, 2020

This PR adds the Jupyter notebook for clustering.

@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no PR contributor has not signed CLA label Mar 31, 2020
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

@Ruomei
Copy link
Contributor Author

Ruomei commented Mar 31, 2020

@googlebot I signed it!

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

1 similar comment
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes PR contributor has signed CLA and removed cla: no PR contributor has not signed CLA labels Mar 31, 2020
@@ -0,0 +1,536 @@
{
Copy link
Contributor

@TamasArm TamasArm Jun 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is apply_clustering_to_first named as such?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I improve it? The version I updated also has the name apply_clustering_to_dense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new apply_clustering_to_dense name seems good to me!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am confused. What was your thought on apply_clustering_to_first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just didn't understand what "first" stood for in the name. Because it looked like it just applies clustering to the layer if it's in the list of layers to be clustered and leaves it otherwise. So I just didn't quite understand what the 'first' referred to.

@Ruomei
Copy link
Contributor Author

Ruomei commented Jul 6, 2020

@alanchiao @TamasArm @akarmi Nearly all of the review comments are addressed. There are at least two TODOs 1) find the version which I can "pip install" which includes the newly merged PRs; 2) change the way cluster_weights is imported after clustering APIs are exposed to the public.
Thanks a lot for reviewing and please let me know if there is anything else.

@@ -0,0 +1,470 @@
{
Copy link

@alanchiao alanchiao Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "tips" -> "use cases"

Not all of these are tips - some is just a "how to" and the emphasis is still on use cases (do they care about deployment (might not - they could just be doing some training experiments), do they want to checkpoint (might not for quick experiments), ... etc.


Reply via ReviewNB

@@ -0,0 +1,470 @@
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The optimal number of clusters per layer can be found via hyperparameter tuning."

Don't think this tip is useful since it seems like a given since all parameters are things to tune in ML.

The rest are bit more specific to compression specifically.


Reply via ReviewNB

@@ -0,0 +1,686 @@
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

determine if you should use it ->

determine if you should use it (including what's supported)

since some people may consider assume that already want to use weight clustering for its compression benefits but have forgotten

to first consider what's supported (e.g. custom Keras layers).


Reply via ReviewNB

@@ -0,0 +1,686 @@
{
Copy link

@alanchiao alanchiao Jul 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the pruning example with how it doesn't use tf.nn.softmax on the last layer directly and instead SparseCategoricalCrossEntropy with from_logits.

The reasoning is that it's not numerically stable to compute log (softmax(..)) directly, as described in places such as https://blog.feedly.com/tricks-of-the-trade-logsumexp/ . The log part is in the `sparse_categorical_crossentropy' loss, which operates on the softmax from tf.nn.softmax.

SparseCategoricalCrossEntropy(from_logits=True) can use the numerically stable code. In the current code with softmax and log in separate functions, it's hard for the framework to always use the numerically stable part (by fusing them and replacing them with a stable version).

Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This is interesting.

By overcoming numerical instability, do you mean we need to stabilize the softmax function against underflow and overflow? If yes, I agree that we need to use the tensorflow implementation.

Side questions:

1) If we use SparseCategoricalCrossEntropy with from_logits=false combined with the last layer of the model as softmax, it will end up with this clipping in Keras, which generates a bound [1e-7, 1-1e-7] on the output probability. However, will this clipping work if the input values are NaNs already? It should be no? But if yes, is the bound keeping the output of log(softmax(...)) stable?

2) In general, what use cases will be suited for the Keras clipping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1): The probability calculation never generates nans. But it can saturate to 0.0 or 1.0, which will generate nans in the loss calculation, so clipping prevents the nans from being generated.

Also note that in graph mode keras can bypasses the softmax and executes the from_logits=True path, and so the results can be different depending on how you're using the model. That's bad.

re 2): None. It keeps beginners out of trouble, but for any serious application you shouldn't use it. Any example that hits the saturation gets a gradient of zero => is being ignored by the training procedure. For a small number of classes this will be all your worst classified examples (which would be an important signal). For a large number of classes this could be a significant fraction of your data.

If you want to export a model that outputs probabillities, add a softmax layer immediately before exporting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

 The probability calculation never generates nans

What is the probability here?

Nans can be generated after softmax function (when the denominator of the softmax becomes 0), which is the input for keras clipping afaik.

Also note that in graph mode keras can bypasses the softmax and executes the from_logits=True path,

Did not know that. Will look for the code at some point. Thanks!

re 2): 

Agreed. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the probability here?

The softmax output.

denominator of the softmax becomes 0

How do you make the denominator 0? it't the sum of exponentials. Sure some of those exponentials could underflow, but any good implementations subtracts the largest logit from all the logits (logits = logits - max(logits)) So it's impossible for everything to underflow.... unless all your logits are -inf? If you have infs or nans in your logits then you have a different problem.

Right?

Copy link
Contributor Author

@Ruomei Ruomei Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

How do you make the denominator 0?

I was thinking in exp(x) when x is very very negative (not sure whether it has to be -inf), the exp(x) will underflow.

Sure some of those exponentials could underflow, but any good implementations subtracts the largest logit from all the logits

Yes, indeed. And I did not know all implementations already have this. Will check the details myself.

@alanchiao
Copy link

Left a few remaining comments. Looks good to me otherwise and we can merge this as an initial version.
A future PR can update the imports once we have some kind of pip release.

Added @MarkDaoust as a final reviewer. I'm adding the ready to pull label so I can more easily test run this Colab, but will wait until everything is addressed first. Once everything is addressed, please squash the commits.

FYI @lamberta.

@alanchiao alanchiao requested a review from MarkDaoust July 13, 2020 16:51
@alanchiao alanchiao added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Jul 13, 2020
@alanchiao
Copy link

I ran the e2e tutorial on TF 2.2 and saw the following error message. @arovir01. Please address.

Screen Shot 2020-07-13 at 11 45 46 AM

Copy link
Contributor Author

Ruomei commented Jul 14, 2020

Hi Both,

I ran the e2e tutorial on TF 2.2 and saw the following error message. @arovir01. Please address.

Did you use the dev package or the ! pip install -q tensorflow-model-optimizationat the beginning of this notebook?

I tried it locally with the top commit in tfmot (939bed8) with this end-to-end example and it works.

When I tried it in colab it fails because of the package installed at the beginning of the file. The reason is that the latest tfmot master will include the fix for the function strip_clustering(), while the tfmot released package does not have it afaik.

If we decide to use the latest tfmot dev package in this notebook, there is also the change for  'cluster_centroids_init': cluster_config.CentroidInitialization.LINEAR which requires the import:from tensorflow_model_optimization.python.core.clustering.keras import cluster_config I will need to update that.

Copy link

| Did you use the dev package or the ! pip install -q tensorflow-model-optimizationat the beginning of this notebook?

Ah I mistake - I ran it with the latest TFMOT release instead of more recent commit. Looks good to me then.

@@ -0,0 +1,686 @@
{
Copy link
Member

@MarkDaoust MarkDaoust Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've got a TODO here, I guess because of the internal looking import?


Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -0,0 +1,686 @@
{
Copy link
Member

@MarkDaoust MarkDaoust Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading the available docs it's not clear what's happening inside, so I'm really left guessing.

I assume:

  • The clustering tools replace weights with a smaller weight-cluster array, and indices into that array (can't that be gzipped so show the size difference directly?)
  • strip_clustering then resets to the original style of direct-weight array, but using the clustered weights? Either I don't understand:
  • How this all works,
  • Or why strip_clustering is necessary to show the size difference here.

So maybe it could use little more explanation.


Reply via ReviewNB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I'll make the equivalent change on the pruning side.

I think something along the lines of "strip_clustering removes any tf.Variable that clustering only needs during training, which would otherwise add to model size during serving" would work. This avoids any references to implementation details (e.g. the wrapper)

@@ -0,0 +1,686 @@
{
Copy link
Member

@MarkDaoust MarkDaoust Jul 20, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I'm getting an error in colab:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-20-01035994e382> in <module>()
      1 clustered_tflite_file = '/tmp/clustered_mnist.tflite'
      2 converter = tf.lite.TFLiteConverter.from_keras_model(final_model)
----> 3 tflite_clustered_model = converter.convert()
      4 with open(clustered_tflite_file, 'wb') as f:
      5   f.write(tflite_clustered_model)

3 frames
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/convert_to_constants.py in _get_tensor_data(func)
215 data = map_index_to_variable[idx].numpy()
216 else:
--> 217 data = val_tensor.numpy()
218 tensor_data[tensor_name] = {
219 "data": data,

AttributeError: 'Tensor' object has no attribute 'numpy'



Reply via ReviewNB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ruomei: you can install the new package at

"pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple tensorflow-model-optimization==0.4.0.dev2"

For testing.

@MarkDaoust
Copy link
Member

Also, don't forget to add this to the _book.yaml, or it won't be correctly visible on the site.

You could also add the overview at the same time. https://www.tensorflow.org/model_optimization/guide/clustering at the same time

@Ruomei
Copy link
Contributor Author

Ruomei commented Jul 21, 2020

All done, thanks @alanchiao @MarkDaoust
I will squash the commits once everything looks okay.

@Ruomei
Copy link
Contributor Author

Ruomei commented Jul 21, 2020

Also, don't forget to add this to the _book.yaml, or it won't be correctly visible on the site.

You could also add the overview at the same time. https://www.tensorflow.org/model_optimization/guide/clustering at the same time

Thanks. @arovir01 was addressing this today afaik.

@alanchiao
Copy link

alanchiao commented Jul 21, 2020

All done, thanks @alanchiao Mark Daoust
I will squash the commits once everything looks okay.

Looks good to me Ruomei. Thanks! Once Mark finishes the review with you and you've squashed it, I'll merge it.

@Ruomei Ruomei force-pushed the toupstream/clustering_jupyter_notebook branch from 19e9550 to e616f48 Compare July 22, 2020 12:56
@Ruomei
Copy link
Contributor Author

Ruomei commented Jul 22, 2020

Once Mark finishes the review

@alanchiao @MarkDaoust Somehow I did not see this line. The commits are now squashed but it is fine. I can do it again if necessary.

Copy link
Member

@MarkDaoust MarkDaoust left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think all my concerns are addressed.

@@ -0,0 +1,686 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re 1): The probability calculation never generates nans. But it can saturate to 0.0 or 1.0, which will generate nans in the loss calculation, so clipping prevents the nans from being generated.

Also note that in graph mode keras can bypasses the softmax and executes the from_logits=True path, and so the results can be different depending on how you're using the model. That's bad.

re 2): None. It keeps beginners out of trouble, but for any serious application you shouldn't use it. Any example that hits the saturation gets a gradient of zero => is being ignored by the training procedure. For a small number of classes this will be all your worst classified examples (which would be an important signal). For a large number of classes this could be a significant fraction of your data.

If you want to export a model that outputs probabillities, add a softmax layer immediately before exporting.

@alanchiao alanchiao added ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. and removed ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. labels Jul 22, 2020
@copybara-service copybara-service bot merged commit 9b12529 into tensorflow:master Jul 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes PR contributor has signed CLA ready to pull Working to get PR submitted to internal repository, after which merging to Github happens.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants