Skip to content

Conversation

arovir01
Copy link
Contributor

@arovir01 arovir01 commented May 11, 2020

This PR alters the configuration object used in weight clustering in accordance with the RFC proposal. It contains the following changes:

  • Added enum CentroidInitialization to capture possible centroid initialization methods
  • Configuration object modified to use this enum to specify the centroid initialization technique to be used (instead of the string-based descriptor used until this point)
  • Updated serialization and unit tests accordingly

@googlebot googlebot added the cla: yes PR contributor has signed CLA label May 11, 2020
@akarmi
Copy link
Contributor

akarmi commented May 12, 2020

Thanks @arovir01!

@nutsiepully and @alanchiao, please review this change, introduced following the discussion at the clustering RFC design review.

@alanchiao alanchiao self-requested a review May 18, 2020 16:12
@alanchiao
Copy link

alanchiao commented May 20, 2020

@arovir01

It'd be good to first consider this in the context of something like the feature request template here and more clearly define the motivation.

This change would ultimately expose four more symbols:

  1. LinearCentroidsInitialisation
  2. RandomCentroidsInitialisation
  3. DensityBasedCentroidsInitialisation
  4. AbstractCentroidsInitialisation with __init__, get_cluster_centroids and public weights and number_of_clusters attributes that would be subject to backwards-compatibility requirements.

So from the template, what's the motivation of this, relative to the original string-based parameter for configuration?

Has there been explicit interest to extend this list by others, with initialization methods that perform better than before for their use cases? And are there other types of configuration besides initialization that you foresee, which maintaining these new symbols with backwards-compatibility requirements would make harder?

If it's just for experimentation purposes, for now, we can always suggest that people fork the code and modify the implementation of one of the existing initializations to their arbitrary algorithm, to see how it can do better, until there is wider interest. There can even be a Github issue tracking this to wait for people to express interest.

Would an enum be better if there is just an interest to avoid strings?

@arovir01 arovir01 force-pushed the toupstream/new_clustering_params branch from 910ea99 to 57dcd9f Compare May 29, 2020 11:35
* Added enum CentroidInitialization to be used in clustering params
  to specify centroid initialization method (instead of a string)
* Updated serialization and unit tests accordingly
@arovir01 arovir01 force-pushed the toupstream/new_clustering_params branch from 57dcd9f to 2e70a9e Compare May 29, 2020 12:18
@arovir01
Copy link
Contributor Author

@alanchiao

Changed to enum as requested. Is there anything else I should change?

@alanchiao
Copy link

API change itself looks good.

  1. Is there a way to update CentroidsInitializerFactory so that init_is_supported checks if the init_method is one of the enums, instead of maintaining a second copy of the initializers list?
  2. Please update the PR description to reflect the most recent changes. Particularly since it's an API change, the motivation for the change should be clear.

@alanchiao
Copy link

Getting another person to also take a look at this and will formalize how many people should generally be reviewing smaller public API changes, according to the ownership RFC, going forward.

@arovir01
Copy link
Contributor Author

arovir01 commented Jun 17, 2020

1. Is there a way to update `CentroidsInitializerFactory` so that `init_is_supported` checks if the `init_method` is one of the enums, instead of maintaining a second copy of the initializers list?

We must keep the _initialisers dict in CentroidsInitializerFactory, as that is where the links between the enum values and the implementation classes are defined. I can't really think of a way of doing this differently without exposing the implementation classes in the public API (which would in essence be equivalent to the pre-enum solution from my previous commit).

2. Please update the PR description to reflect the most recent changes. Particularly since it's an API change, the motivation for the change should be clear.

Done.

@alanchiao
Copy link

We must keep the _initialisers dict in CentroidsInitializerFactory, as that is where the links between the enum values and the implementation classes are defined. I can't really think of a way of doing this differently without exposing the implementation classes in the public API (which would in essence be equivalent to the pre-enum solution from my previous commit).

Got it.

@nutsiepully
Copy link
Contributor

The change looks good to me on it's own. I don't particularly see a benefit in moving from strings to enums, since python doesn't have much type safety.

But if you prefer enum, that works just as well. Like @alanchiao said, this changes the API surface so just ensure any clients are aware of it.

Copy link

@alanchiao alanchiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. @arovir01: please take one more look and will merge after you @ me.

@arovir01
Copy link
Contributor Author

Approving. @arovir01: please take one more look and will merge after you @ me.

@nutsiepully I agree, it is not a huge improvement, but I prefer enums over strings even in Python. I find it a cleaner practice.

@alanchiao I think we can proceed with the merge, there's no further change we wish to make to the code at this stage.

@alanchiao alanchiao added the ready to pull Working to get PR submitted to internal repository, after which merging to Github happens. label Jun 19, 2020
@copybara-service copybara-service bot merged commit a396089 into tensorflow:master Jun 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes PR contributor has signed CLA ready to pull Working to get PR submitted to internal repository, after which merging to Github happens.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants