New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cross_df() is extremely slow compared to expand.grid() and crossing() #406
Comments
|
I'm not sure where they belong. Would data frame stuff belong to vctrs? |
A profvis run suggests major overhead due to garbage collection. |
|
I think the idea was to take up less memory when crossing atomic vectors with a filtering function. We should revisit. |
Code to profile: k <- 11
x <- rep(list(1:3), k) %>% setNames(LETTERS[1:k])
profvis::profvis(purrr::cross(x)) |
how tidyr::crossing is implemented, i got the feeling that is really slow with large data sets. |
the problem is still here. suppressPackageStartupMessages(library(tidyverse))
system.time(cross_df(list(a=1:300, b=1:300)))
#> user system elapsed
#> 0.81 0.01 0.83
system.time(expand.grid(a=1:300, b=1:300))
#> user system elapsed
#> 0 0 0 I wanted to create an issue but found this old one. |
maybe it's a strange suggestion but it would be great to mark somehow such functions which have known efficiency problems. Kind of "questioning" label but about efficiency. I lost a couple of hours just to know that the current implementation of |
A turtle label? We should probably just deprecate |
@lionel- "We should probably just deprecate cross_df() in favour of tidyr::expand_grid()" |
Probably not. |
We'll likely deprecate |
The
cross_df()
function seems to be extremely slow compared to theexpand.grid()
function from thebase
package and thecrossing()
function from thetidyr
package. Example:As you see,
cross_df()
takes almost a minute performing basically the same operation thatexpand.grid()
andcrossing()
do in less than a second. (The output of the three functions are essentially the same, though they differ in returning a tibble or a data frame, and in the order of the rows.)The text was updated successfully, but these errors were encountered: