Skip to content

Performance of fill() after group_by() #520

@albertotb

Description

@albertotb

The fill() function after a group_by(), especially if the number of groups is large, is more than 10x slower than mutate() with na.locf(), from the zoo package, yet gives identical results. Maybe I'm missing something and there is another way to peform this same operation?

library(dplyr)
library(tidyr)
library(zoo)
library(tibble)

n <- 1e6
df <- tibble(a = sample(paste("id", 1:(n/4)), n, replace = T),
             b = sample(c("2012", "2013", "2014"), n, replace = T),
             c = sample(c("NA", "A", "B", "C"), n, replace = T))


t1 <- system.time(df1 <-
                    df %>% 
                      arrange(a, b) %>% 
                      group_by(a) %>% 
                      mutate(c = na.locf(c, na.rm=F))
                  )

print(t1)
#>    user  system elapsed 
#>   21.45    0.06   21.81

t2 <- system.time(df2 <-
                    df %>%
                      arrange(a, b) %>% 
                      group_by(a) %>%
                      fill(c)
                  )
print(t2)
#>    user  system elapsed 
#>  313.37    0.47  316.75
print(identical(df1, df2))
#> [1] TRUE

Created on 2018-12-07 by the reprex package (v0.2.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions