You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently select() always takes a deep copy. We can instead drop columns by reference if an implicit or explicit copy has already occurred in the pipe chain.
As I think about this there would also have to be some consideration for select() being able to reorder columns. The remove_vars case might also need a call to setcolorder().
Ex: df %>% mutate() %>% select(z, x)
Also would have to account for when columns are renamed.
And some benchmarks:
pacman::p_load(dplyr, dtplyr, stringi, data.table)
data_size<-10000000df<- tibble(a= sample(stri_rand_strings(100, 4), data_size, TRUE),
b= sample(stri_rand_strings(100, 4), data_size, TRUE),
c= sample(1:100, data_size, TRUE)) %>%
lazy_dt()
remove_vars<-dtplyr:::remove_varsbench::mark(
old=df %>%
mutate(d=1) %>%
select(a, b, d) %>%
as.data.table(),
new=df %>%
mutate(d=1) %>%
remove_vars("c") %>%
as.data.table(),
check=FALSE, iterations=30
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.#> # A tibble: 2 × 6#> expression min median `itr/sec` mem_alloc `gc/sec`#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>#> 1 old 406.1ms 461.7ms 2.12 728MB 2.19#> 2 new 51.3ms 58.9ms 10.2 267MB 4.09
Currently
select()
always takes a deep copy. We can instead drop columns by reference if an implicit or explicit copy has already occurred in the pipe chain.Waiting on #366
The text was updated successfully, but these errors were encountered: