Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename extract_cluster and extract_cluster_assignments #27

Closed
kbodwin opened this issue Mar 18, 2022 · 3 comments
Closed

rename extract_cluster and extract_cluster_assignments #27

kbodwin opened this issue Mar 18, 2022 · 3 comments

Comments

@kbodwin
Copy link
Collaborator

kbodwin commented Mar 18, 2022

As I work with these, I don't like "extract" as the verb - especially because we might be considering extraction style clustering methods in the future...

Being nitpicky, I also thing extract_cluster should be wordier. (Is the "cluster" the centroids? The observations?).

For the sake of starting conversation, I'll propose:

get_cluster_centers - with future optional arguments for different notions of centers beyond geometric means

I'm also still on the fence as to whether extract_cluster_assignments needs to exist at all. In tidymodels, the training data is labelled using predict. Since we want predict to always return the original cluster assignments, it may be all we need. Although the dedicated function does avoid duplicate computation of cluster assignments, I suppose...

@EmilHvitfeldt
Copy link
Member

I picked the extract_ verb because we already have a number of extract_*() functions in the tidymodels. extract_mold(), extract_fit_parsnip() etc, etc. I would be okay switching if we can find a more appropriate verb.

Being nitpicky, I also thing extract_cluster should be wordier. (Is the "cluster" the centroids? The observations?).

Agree! right now it returns the centroids.

For now I like to keep extract_cluster_assignments() around because it removed the need for duplicate calculations. Especially since the model objects have that information embedded in them. It seems weird not to have a way to extract it.
That being said, the result of extract_cluster_assignments(model) and predict(model, training_data) should always be the same.

@kbodwin
Copy link
Collaborator Author

kbodwin commented Mar 19, 2022

Hmmm, that's a good argument for keeping extract_, but then we'd need a new word for when you identify a single cluster on each run of the algorithm. I'll mull this over.

Re: cluster assignments - I definitely take your point, and I know I've advocated for an easier way to extract info from supervised models too, so I guess this falls under that umbrella.

@github-actions
Copy link

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants