When using tidy() on a prcomp object with method = "rotation" to extract the rotation matrix, the output that is returned is incorrect. The problem is that the PCs are numbered incorrectly. We would expect each variable in the original dataset to be paired exactly once with each PC. Instead, the first variable is paired multiple times with the first PC, the second variable is paired multiple times with the second PC, and so on. Reprex follows below.
library(broom)
library(dplyr)
library(tidyr)
iris_pca <- iris %>%
select(-Species) %>%
scale() %>%
prcomp()
# output generated by tidy.prcomp()
tidy(iris_pca, matrix = "rotation")
#> # A tibble: 16 x 3
#> column PC value
#> <chr> <dbl> <dbl>
#> 1 Sepal.Length 1 0.521
#> 2 Sepal.Width 2 -0.377
#> 3 Petal.Length 3 0.720
#> 4 Petal.Width 4 0.261
#> 5 Sepal.Length 1 -0.269
#> 6 Sepal.Width 2 -0.923
#> 7 Petal.Length 3 -0.244
#> 8 Petal.Width 4 -0.124
#> 9 Sepal.Length 1 0.580
#> 10 Sepal.Width 2 -0.0245
#> 11 Petal.Length 3 -0.142
#> 12 Petal.Width 4 -0.801
#> 13 Sepal.Length 1 0.565
#> 14 Sepal.Width 2 -0.0669
#> 15 Petal.Length 3 -0.634
#> 16 Petal.Width 4 0.524
# expected output would be something like this
iris_pca$rotation %>%
as.data.frame() %>%
mutate(column = row.names(.)) %>%
pivot_longer(PC1:PC4) %>%
arrange(name, column)
#> # A tibble: 16 x 3
#> column name value
#> <chr> <chr> <dbl>
#> 1 Petal.Length PC1 0.580
#> 2 Petal.Width PC1 0.565
#> 3 Sepal.Length PC1 0.521
#> 4 Sepal.Width PC1 -0.269
#> 5 Petal.Length PC2 -0.0245
#> 6 Petal.Width PC2 -0.0669
#> 7 Sepal.Length PC2 -0.377
#> 8 Sepal.Width PC2 -0.923
#> 9 Petal.Length PC3 -0.142
#> 10 Petal.Width PC3 -0.634
#> 11 Sepal.Length PC3 0.720
#> 12 Sepal.Width PC3 -0.244
#> 13 Petal.Length PC4 -0.801
#> 14 Petal.Width PC4 0.524
#> 15 Sepal.Length PC4 0.261
#> 16 Sepal.Width PC4 -0.124
Created on 2020-09-03 by the reprex package (v0.3.0)
When using
tidy()on aprcompobject withmethod = "rotation"to extract the rotation matrix, the output that is returned is incorrect. The problem is that the PCs are numbered incorrectly. We would expect each variable in the original dataset to be paired exactly once with each PC. Instead, the first variable is paired multiple times with the first PC, the second variable is paired multiple times with the second PC, and so on. Reprex follows below.Created on 2020-09-03 by the reprex package (v0.3.0)