Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse pca steps #83

Merged
merged 18 commits into from
Jul 3, 2021
Merged

Sparse pca steps #83

merged 18 commits into from
Jul 3, 2021

Conversation

topepo
Copy link
Member

@topepo topepo commented Jun 8, 2021

@dgrtwo

Steps for two types of sparse PCA; step_pca_sparse() can zero out values from the loading matrix anywhere. step_pca_sparse_bayes() does the same but is encouraged to do so in a way that can eliminate all the loadings for a predictor.

All feedback welcome.

An example:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(embed)
tidymodels_prefer()
data(pd_speech)
rec_sparse <-
  recipe(pd_speech) %>%
  update_role(-class, new_role = "predictor") %>%
  update_role(class, new_role = "outcome") %>%
  step_zv(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_pca_sparse(all_numeric_predictors(), id = "pca", predictor_prop = 1/2) %>%
  prep()

tidy(rec_sparse, id = "pca") %>%
  select(-id) %>%
  ggplot(aes(x = component, y = terms, fill = value)) +
  geom_raster() +
  scale_fill_gradient2() +
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  ) +
  ggtitle("irlba")

rec_sparse_bayes <- 
  recipe(pd_speech) %>% 
  update_role(-class, new_role = "predictor") %>% 
  update_role(class, new_role = "outcome") %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  step_pca_sparse_bayes(all_numeric_predictors(), id = "pca") %>% 
  prep()

tidy(rec_sparse_bayes, id = "pca") %>% 
  select(-id) %>% 
  ggplot(aes(x = component, y = terms, fill = value)) +
  geom_raster() + 
  scale_fill_gradient2() + 
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  ) + 
  ggtitle("VBsparsePCA")

Created on 2021-06-07 by the reprex package (v2.0.0)

@juliasilge
Copy link
Member

Related to #73 and #82

@dgrtwo
Copy link

dgrtwo commented Jun 8, 2021

Excited to see this progress!

It looks like this is solving the problem of sparse PCA with regularization. I think this is important, but I'd suggest distinguishing it from my suggestion in #82 of truncated PCA. Truncated PCA is in my view an easier problem; it doesn't need hyperparameters like predictor_prop since it doesn't do regularization; it's just a (much) faster way to approximate the first num_comp components.

(This is IMO a good recipe step! But I agree with Alex's comment in #82 that this doesn't solve the problem of step_pca's prohibitive slowness)

@topepo topepo marked this pull request as ready for review June 27, 2021 02:18
@topepo topepo requested a review from hfrick June 27, 2021 02:18
R/pca_sparse.R Outdated Show resolved Hide resolved
R/pca_sparse.R Outdated Show resolved Hide resolved
R/pca_sparse.R Outdated Show resolved Hide resolved
R/pca_sparse.R Outdated Show resolved Hide resolved
R/pca_sparse.R Show resolved Hide resolved
R/pca_sparse_bayes.R Outdated Show resolved Hide resolved
R/pca_sparse_bayes.R Outdated Show resolved Hide resolved
R/pca_sparse_bayes.R Outdated Show resolved Hide resolved
R/pca_sparse_bayes.R Show resolved Hide resolved
R/tunable.R Show resolved Hide resolved
topepo and others added 2 commits June 30, 2021 11:05
Co-authored-by: Hannah Frick <hfrick@users.noreply.github.com>
Co-authored-by: Hannah Frick <hfrick@users.noreply.github.com>
@hfrick
Copy link
Member

hfrick commented Jun 30, 2021

looks good! tidy.step_pca_sparse_bayes() is still defined twice (once in R/pca_sparse.R and once in R/pca_sparse_bayes.R), no?

@topepo topepo merged commit 6bc4c8f into master Jul 3, 2021
@topepo topepo deleted the sparse-pca branch July 3, 2021 19:35
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 18, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants