Sparse pca steps #83

topepo · 2021-06-08T02:05:18Z

Steps for two types of sparse PCA; step_pca_sparse() can zero out values from the loading matrix anywhere. step_pca_sparse_bayes() does the same but is encouraged to do so in a way that can eliminate all the loadings for a predictor.

All feedback welcome.

An example:

library(tidymodels)
#> Registered S3 method overwritten by 'tune':
#>   method                   from   
#>   required_pkgs.model_spec parsnip
library(embed)
tidymodels_prefer()

data(pd_speech)

rec_sparse <-
  recipe(pd_speech) %>%
  update_role(-class, new_role = "predictor") %>%
  update_role(class, new_role = "outcome") %>%
  step_zv(all_predictors()) %>%
  step_normalize(all_numeric_predictors()) %>%
  step_pca_sparse(all_numeric_predictors(), id = "pca", predictor_prop = 1/2) %>%
  prep()

tidy(rec_sparse, id = "pca") %>%
  select(-id) %>%
  ggplot(aes(x = component, y = terms, fill = value)) +
  geom_raster() +
  scale_fill_gradient2() +
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  ) +
  ggtitle("irlba")

rec_sparse_bayes <- 
  recipe(pd_speech) %>% 
  update_role(-class, new_role = "predictor") %>% 
  update_role(class, new_role = "outcome") %>% 
  step_zv(all_predictors()) %>% 
  step_normalize(all_numeric_predictors()) %>% 
  step_pca_sparse_bayes(all_numeric_predictors(), id = "pca") %>% 
  prep()

tidy(rec_sparse_bayes, id = "pca") %>% 
  select(-id) %>% 
  ggplot(aes(x = component, y = terms, fill = value)) +
  geom_raster() + 
  scale_fill_gradient2() + 
  theme(
    axis.title.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank()
  ) + 
  ggtitle("VBsparsePCA")

^{Created on 2021-06-07 by the reprex package (v2.0.0)}

juliasilge · 2021-06-08T03:00:45Z

Related to #73 and #82

dgrtwo · 2021-06-08T04:45:10Z

Excited to see this progress!

It looks like this is solving the problem of sparse PCA with regularization. I think this is important, but I'd suggest distinguishing it from my suggestion in #82 of truncated PCA. Truncated PCA is in my view an easier problem; it doesn't need hyperparameters like predictor_prop since it doesn't do regularization; it's just a (much) faster way to approximate the first num_comp components.

(This is IMO a good recipe step! But I agree with Alex's comment in #82 that this doesn't solve the problem of step_pca's prohibitive slowness)

R/pca_sparse.R

R/pca_sparse_bayes.R

R/tunable.R

Co-authored-by: Hannah Frick <hfrick@users.noreply.github.com>

hfrick · 2021-06-30T17:50:13Z

looks good! tidy.step_pca_sparse_bayes() is still defined twice (once in R/pca_sparse.R and once in R/pca_sparse_bayes.R), no?

github-actions · 2021-07-18T01:01:11Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

topepo added 9 commits February 23, 2021 10:36

some initial work on sparse PCA

ac2511a

some PCA refinements

88ff0d0

document data

49e0a5c

clean up docs

4e11c17

increased robustness of test case

ec8c1e9

keep_original_cols addition

489c780

expand num_comp default to 10

7e4b47a

doc updates

eeacc66

more to suggests

dfcd52f

topepo added 4 commits June 26, 2021 16:52

Merge branch 'master' into sparse-pca

a09cd81

un-do attempts to standardize PCA loading signs

225630c

doc and testing updates

4bb050b

use absolute values for PCA loading comparison

ce938ca

topepo marked this pull request as ready for review June 27, 2021 02:18

topepo requested a review from hfrick June 27, 2021 02:18

hfrick reviewed Jun 30, 2021

View reviewed changes

topepo and others added 2 commits June 30, 2021 11:05

Update R/pca_sparse.R

80c96f6

Co-authored-by: Hannah Frick <hfrick@users.noreply.github.com>

Apply suggestions from code review

22418a0

Co-authored-by: Hannah Frick <hfrick@users.noreply.github.com>

topepo added 2 commits July 3, 2021 12:14

fied tidy method for sparse_pca

4feb771

rearrange tunable docs

589f877

topepo mentioned this pull request Jul 3, 2021

new parameter functions tidymodels/dials#168

Closed

fix man files

c67daf3

topepo merged commit 6bc4c8f into master Jul 3, 2021

topepo deleted the sparse-pca branch July 3, 2021 19:35

github-actions bot locked and limited conversation to collaborators Jul 18, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sparse pca steps #83

Sparse pca steps #83

topepo commented Jun 8, 2021

juliasilge commented Jun 8, 2021

dgrtwo commented Jun 8, 2021

hfrick commented Jun 30, 2021

github-actions bot commented Jul 18, 2021

Sparse pca steps #83

Sparse pca steps #83

Conversation

topepo commented Jun 8, 2021

juliasilge commented Jun 8, 2021

dgrtwo commented Jun 8, 2021

hfrick commented Jun 30, 2021

github-actions bot commented Jul 18, 2021