Clustering: Add a step_kmeans() #77

mdancho84 · 2021-03-17T14:41:02Z

Love embed. It would be super awesome if there was a step_kmeans() or step_cluster() that added cluster assignments to a data frame.

Why?

Cluster assignments are super important for segmentation. K-Means and similar algorithms (e.g. K-modes) can help us to identify customer groups.

Embed

Embed is a good spot for this. step_umap() is a similar algorithm that I often use in combination with K-Means.

Let me know what you think.

Thanks, Matt

The text was updated successfully, but these errors were encountered:

topepo · 2021-03-17T18:11:03Z

This was previously discussed in tidymodels/recipes#399; I wasn't sold on what the poster wanted to return and they added a step function to their own package.

You might also want to take a look at tidymodels/planning#12. For non-preprocessing needs, I think that @kbodwin's thoughts are spot-on.

It would be good as long as the output is a factor variable that denotes the cluster that the sample belongs to. For new data, we can use a nearest centroid (or mediod) approach to assign new samples (but this is dependent on the clustering method).

I'd support this but don't have the time to do it; you'd have to start a PR.

topepo · 2021-04-16T14:56:37Z

I'll close but please add a PR if this is important.

github-actions · 2021-05-01T01:05:21Z

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

juliasilge added the feature a feature request or enhancement label Mar 17, 2021

topepo closed this as completed Apr 16, 2021

github-actions bot locked and limited conversation to collaborators May 1, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustering: Add a step_kmeans() #77

Clustering: Add a step_kmeans() #77

mdancho84 commented Mar 17, 2021

topepo commented Mar 17, 2021

topepo commented Apr 16, 2021

github-actions bot commented May 1, 2021

Clustering: Add a step_kmeans() #77

Clustering: Add a step_kmeans() #77

Comments

mdancho84 commented Mar 17, 2021

Why?

Embed

topepo commented Mar 17, 2021

topepo commented Apr 16, 2021

github-actions bot commented May 1, 2021