-
Notifications
You must be signed in to change notification settings - Fork 421
Closed
Labels
featurea feature request or enhancementa feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"pivot rectangular data to different "shapes"
Description
The fill() function after a group_by(), especially if the number of groups is large, is more than 10x slower than mutate() with na.locf(), from the zoo package, yet gives identical results. Maybe I'm missing something and there is another way to peform this same operation?
library(dplyr)
library(tidyr)
library(zoo)
library(tibble)
n <- 1e6
df <- tibble(a = sample(paste("id", 1:(n/4)), n, replace = T),
b = sample(c("2012", "2013", "2014"), n, replace = T),
c = sample(c("NA", "A", "B", "C"), n, replace = T))
t1 <- system.time(df1 <-
df %>%
arrange(a, b) %>%
group_by(a) %>%
mutate(c = na.locf(c, na.rm=F))
)
print(t1)
#> user system elapsed
#> 21.45 0.06 21.81
t2 <- system.time(df2 <-
df %>%
arrange(a, b) %>%
group_by(a) %>%
fill(c)
)
print(t2)
#> user system elapsed
#> 313.37 0.47 316.75
print(identical(df1, df2))
#> [1] TRUECreated on 2018-12-07 by the reprex package (v0.2.1)
Metadata
Metadata
Assignees
Labels
featurea feature request or enhancementa feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"pivot rectangular data to different "shapes"