-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use irlba's truncated SVD to speed up step_pca #82
Comments
Related to #73 We are definitely interested in functionality like this! This is mostly implemented already so we'll get a draft PR ready and would love some feedback on it and/or more contributions. We are fairly sure we want to include this in embed, along with a Bayesian implementation of sparse PCA. |
In general I'm a firm believer that PCA should default to a truncated SVD implementation (either |
Also cc @topepo https://github.com/DataSlingers/MoMA is a high quality sparse PCA implementation by Michael Weylandt (of the high quality |
This did issue get resolved in #83 or should it be kept open for more step variants? |
I don't think this is resolved, since
But maybe this issue belongs in the recipes package, since that's where step_pca lives? |
I'd add an alternate PCA step here. Those package dependencies are a pita and I'd keep them here. |
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue. |
step_pca
is very useful, but is slow and memory-intensive when run on more than a few hundred features, even ifnum_comp
is much smaller than p. (In my experience this makes it especially time-intensive to tune thenum_comp
training parameter, which requires running the SVD preparation step many times).As a solution, this step could use the irlba package for truncated SVD, which is much faster and more memory efficient when the number of components is small compared to p.
I could imagine the step either automatically using irlba when num_comp is far smaller than p, or doing so only when the user requests something like
truncated = TRUE
, but in any case it would be very helpful!Reproducible example, if we were trying to build a model to identify which Jane Austen book a line of text came from:
The text was updated successfully, but these errors were encountered: