-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kmeans and kmedoids step functions #399
Comments
A PR would be good but it might be better in the A few things:
|
Thanks for the feedback.
Feel free to let me know your preference about which package is the best fit for these. They are most similar to Best regards. |
The suggested changes have been made in this commit. |
But not for new data (that was not involved in the analysis that generated the clusters). In your example, I assumed that, since you are doing clustering, the main output of that analysis would be the qualitative cluster membership (as opposed to returning functions of the centroids). I see the projection that you are doing instead but I'm not sure if I would associate with the output of clustering. I'll look at it more in a few days. |
You might be thinking about the more common application of clustering in which cases/samples are the clustering units (things being clustered). I know I was when initially learning about these approaches. Here, it is the other way around. The variables are the clustering units – note kmeans/pam applied to the transposed training matrices in the implementations. The approaches only require that a new dataset has the same set of variables as the training dataset; the cases can and usually will differ. Put another way, cluster membership is in terms of the variables rather than the cases. Below is the rec <- recipe(rating ~ ., data = attitude)
kmeans_rec <- rec %>%
step_center(all_predictors()) %>%
step_scale(all_predictors()) %>%
step_kmeans(all_predictors(), num_comp = 3)
kmeans_prep <- prep(kmeans_rec, training = attitude)
bake(kmeans_prep, attitude[1:10, ])
bake(kmeans_prep, attitude)[1:10, ] |
It looks like this was implemented here |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
@topepo : I've written a couple step functions (described below) that you might consider for inclusion in the package.
step_kmeans
: conversion of numeric variables to a reduced set by averaging within a k-means cluster partitioning of them.step_kmedoids
: conversion of numeric variables to a reduced set by selecting the medoids from a k-medoids cluster partitioning.Both are dimension reduction techniques that can be viewed as projections to a reduced number of components (
num_comp
), likestep_pca
.k_medoids
can additionally be viewed as variable selection, likestep_corr
.The source code is available here.
Feel free to let me know if you are open to a PR for these and, if so, whether you have any questions on or suggested changes to the implementations.
The text was updated successfully, but these errors were encountered: