Skip to content

Conversation

benkli01
Copy link
Contributor

@benkli01 benkli01 commented Oct 1, 2020

Re-factor the clustering example for a better readability and overview. These changes were originally meant for #508 which had to be abandoned because of technical issues.

@googlebot googlebot added the cla: yes PR contributor has signed CLA label Oct 1, 2020
@github-actions github-actions bot added the technique:clustering Regarding tfmot.clustering.keras APIs and docs label Oct 1, 2020
# Train model
model = train_model(model, x_train, y_train, x_test, y_test)
# Cluster and fine-tune model
clustered_model = cluster_model(model, x_train, y_train, x_test, y_test)
Copy link
Contributor

@Ruomei Ruomei Oct 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question than a comment:
Have you tried to apply the clustering wrapper on the inference graph rather than the training graph of a small example like mnist? If yes, what does the curve of the training loss look like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "inference graph" and "training graph". Could you explain this a bit further?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, yes. An inference graph usually refers to a training graph after modifications targeted at inference. For instance, you can take a look at a TensorFlow legacy tool explaining some typical graph transforms (e.g. fold_batch_norms, merge_duplicate_nodes, remove_control_dependencies). I will write a summary on the internal site soon.

In our application context, I was just curious whether clustering behaves differently on a toy model, when tested with an inference graph rather than a training graph, in a similar way to a realistic model. If so, it could help our future debugging for upcoming new features with all the nice visualization you have done. We can check this later and this is not essential for this PR.

@akarmi
Copy link
Contributor

akarmi commented Oct 2, 2020

Just a note of status: while testing this, discovered that clustering_callbacks.py is not included into the pip package. I will review this again after the fix for that problem is available.

@akarmi akarmi self-requested a review October 2, 2020 08:40
@wwwind
Copy link
Contributor

wwwind commented Oct 2, 2020

We don't demonstrate in this example the benefits of clustering: the model size has been reduced.

@benkli01
Copy link
Contributor Author

benkli01 commented Oct 2, 2020

We don't demonstrate in this example the benefits of clustering: the model size has been reduced.

Unless the model is really extremely small clustering should still be beneficial.

I just ran a small test with the example. It looks like we still get a decent compression ratio despite the reduced model size. Here are some model sizes at different stages of the example (note: clustered_model is the unstripped clustered model):

362832 clustered_model.h5
247566 clustered_model.zip
271400 model.h5
233869 model.zip
98144 stripped_model.h5
13804 stripped_model.zip

@wwwind
Copy link
Contributor

wwwind commented Oct 2, 2020

@benkli01 Thanks!
Sorry, I didn't explain myself clearly in my previous comment - I think it is worth to add model size metrics to the example, because it shows the main purpose of the clustering: model vs stripped_model.
The mentioned numbers above are very good !

@benkli01
Copy link
Contributor Author

benkli01 commented Oct 2, 2020

@wwwind Ah ok, sorry. I misunderstood your message. Presenting the benefits in the example makes sense to me. I would add this in a separate PR though? What do you think @akarmi ?

Copy link
Contributor

@akarmi akarmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Showing the compression benefits of clustering would add a new "feature" to this sample, and is not the purpose of this PR, if I understand it correctly.

On the other hand, I think in pruning samples a dedicated end-to-end example is used to show off the effect on the model size. However, if I recall it correctly, it has been replaced by the end-to-end tutorial and should probably be deprecated/removed. @alanchiao, what are your thoughts - should we remove the end-to-end example from pruning, or add a similar one to clustering?

@akarmi akarmi added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Oct 2, 2020
@alanchiao
Copy link

I'd agree that the end-to-end tutorials and comprehensive guides (for whatever model optimization technique) mostly replace the need for the python scripts.

  • If you're an end user trying to play with it, it's much easier to use the Colab / Jupyter environment (which also provides a way of downloading the resulting models).
  • If you're a developer, creating or modifying unit tests can also give a sense of how changes to the library affect behavior.

There are cases when a script may be easier for a developer (to share early stage examples such as the current pruning mnist_e2e.py for sparse TFLite kernels, or to iterate on a non-easily testable feature such as a Tensorboard visualization).

It would make sense to remove certain aspects of the examples (e.g. there isn't much of a benefit in pruning to enumerate all model types) and perhaps rename examples/ to scripts/ to deprecate them as things for end users to use and mostly keep them for the convenience of developers in some cases.

What do you all think? It's been a while since I've personally used anything under examples/ but I know there has been a minor amount of usage elsewhere.

@copybara-service copybara-service bot merged commit b894380 into tensorflow:master Oct 5, 2020
@wwwind
Copy link
Contributor

wwwind commented Oct 6, 2020

Hi @alanchiao,

Thanks for your comment.

It would be nice to have a uniform experience across the techniques, because currently if I install tfmot, then I have a directory examples/ with only quantization/ subdirectory with scripts. It will be good to have none or scripts for all techniques.
If we include all these scripts in the package, then they should be included for CI, so that these scripts are not out-of-dated scripts - it increases the cost of their maintaining.

As a developer, for debugging purpose, personally, I use integration tests as they are much faster.
For experiments, as you pointed out, Jupyter notebooks are more convenient.

Re: discussion regarding examples in the tfmot package:
@akarmi @Ruomei @psunn @benkli01

@alanchiao
Copy link

alanchiao commented Oct 6, 2020

@wwwind: yes the experience should be uniform. Including the examples/quantization was a mistake - people are looking to install the library as opposed to the examples themselves. core/api doesn't seem to import any of the symbols from there, so there is probably some accidental extra BUILD line (or spurious dependency) on examples/quantization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes PR contributor has signed CLA ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. technique:clustering Regarding tfmot.clustering.keras APIs and docs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants