Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy.prcomp() returns incorrect result with method = "rotation" #923

Closed
clauswilke opened this issue Sep 3, 2020 · 5 comments
Closed

tidy.prcomp() returns incorrect result with method = "rotation" #923

clauswilke opened this issue Sep 3, 2020 · 5 comments

Comments

@clauswilke
Copy link

clauswilke commented Sep 3, 2020

When using tidy() on a prcomp object with method = "rotation" to extract the rotation matrix, the output that is returned is incorrect. The problem is that the PCs are numbered incorrectly. We would expect each variable in the original dataset to be paired exactly once with each PC. Instead, the first variable is paired multiple times with the first PC, the second variable is paired multiple times with the second PC, and so on. Reprex follows below.

library(broom)
library(dplyr)
library(tidyr)

iris_pca <- iris %>%
  select(-Species) %>%
  scale() %>%
  prcomp()

# output generated by tidy.prcomp()
tidy(iris_pca, matrix = "rotation")
#> # A tibble: 16 x 3
#>    column          PC   value
#>    <chr>        <dbl>   <dbl>
#>  1 Sepal.Length     1  0.521 
#>  2 Sepal.Width      2 -0.377 
#>  3 Petal.Length     3  0.720 
#>  4 Petal.Width      4  0.261 
#>  5 Sepal.Length     1 -0.269 
#>  6 Sepal.Width      2 -0.923 
#>  7 Petal.Length     3 -0.244 
#>  8 Petal.Width      4 -0.124 
#>  9 Sepal.Length     1  0.580 
#> 10 Sepal.Width      2 -0.0245
#> 11 Petal.Length     3 -0.142 
#> 12 Petal.Width      4 -0.801 
#> 13 Sepal.Length     1  0.565 
#> 14 Sepal.Width      2 -0.0669
#> 15 Petal.Length     3 -0.634 
#> 16 Petal.Width      4  0.524

# expected output would be something like this
iris_pca$rotation %>%
  as.data.frame() %>%
  mutate(column = row.names(.)) %>%
  pivot_longer(PC1:PC4) %>%
  arrange(name, column)
#> # A tibble: 16 x 3
#>    column       name    value
#>    <chr>        <chr>   <dbl>
#>  1 Petal.Length PC1    0.580 
#>  2 Petal.Width  PC1    0.565 
#>  3 Sepal.Length PC1    0.521 
#>  4 Sepal.Width  PC1   -0.269 
#>  5 Petal.Length PC2   -0.0245
#>  6 Petal.Width  PC2   -0.0669
#>  7 Sepal.Length PC2   -0.377 
#>  8 Sepal.Width  PC2   -0.923 
#>  9 Petal.Length PC3   -0.142 
#> 10 Petal.Width  PC3   -0.634 
#> 11 Sepal.Length PC3    0.720 
#> 12 Sepal.Width  PC3   -0.244 
#> 13 Petal.Length PC4   -0.801 
#> 14 Petal.Width  PC4    0.524 
#> 15 Sepal.Length PC4    0.261 
#> 16 Sepal.Width  PC4   -0.124

Created on 2020-09-03 by the reprex package (v0.3.0)

@clauswilke
Copy link
Author

I suspect a fix would be to write label = rep(labels, each = ncomp) here:

label = rep(labels, times = ncomp),

Though that assumes a specific behavior of tidyr::pivot_longer() and may break in the future. A safer implementation would add the variable names to the data frame before pivoting, as shown in my reprex above.

clauswilke added a commit to clauswilke/broom that referenced this issue Sep 4, 2020
@simonpcouch
Copy link
Collaborator

The thoroughness is much appreciated!

Related to #924 and #910. Closing in favor of #910.

@clauswilke
Copy link
Author

Thanks! Will there be another broom release his fall? I'm hoping to use this feature in a class I'll be teaching next spring.

@simonpcouch
Copy link
Collaborator

I'm hoping to wrap up patches from 0.7.0 in the next week or two and then will move towards revdepchecks! I'm hesitant to give any explicit timeline, as I've only had one go through the broom release process, but a fall release is definitely the goal. :-)

@github-actions
Copy link

github-actions bot commented Mar 7, 2021

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Mar 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants