-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mutate and summarize speed related to parentheses #6681
Comments
If you have an example of a 15x slowdown, we'd like to see it |
I think I've isolated this particular slowdown to: library(rlang)
bench::mark(
as_label(quo((1 + 1)))
)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 as_label(quo((1 + 1))) 73.6µs 78µs 11784. 128KB 14.4
bench::mark(
as_label(quo(1 + 1))
)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 as_label(quo(1 + 1)) 1.34ms 1.66ms 599. 495KB 17.3 Being called in Line 91 in e00f546
This actually happens to be the same exact slowdown as #6674, CC @lionel- From I'm under the impression that we only pay this cost 1 time per expression in We can use the opt-out temporary global option that @lionel- is going to make for the |
It also happens with library(rlang)
library(dplyr, warn.conflicts = FALSE)
# Another expression that does it (it's the infix `+` operator causing the issue)
bench::mark(
as_label(quo((x + 1))),
as_label(quo(x + 1)),
check = FALSE
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 as_label(quo((x + 1))) 73.69µs 89.9µs 10482. 128KB 16.7
#> 2 as_label(quo(x + 1)) 1.44ms 1.86ms 527. 591KB 14.9
# Noticeable on small data
df <- tibble(x = 1:5 + 0L)
bench::mark(
slow = mutate(df, y = x + 1),
fast = mutate(df, y = (x + 1))
)
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slow 2.77ms 3.37ms 303. 1.78MB 17.6
#> 2 fast 1.43ms 1.71ms 574. 22.41KB 12.8
# Mostly goes away at any scale
df <- tibble(
x = 1:50000000 + 0L,
g = sample(10, length(x), replace = TRUE)
)
bench::mark(
slow = mutate(df, y = x + 1, .by = g),
fast = mutate(df, y = (x + 1), .by = g),
iterations = 10
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 slow 1.33s 1.4s 0.708 1.99GB 1.42
#> 2 fast 1.32s 1.39s 0.720 1.99GB 1.44 Created on 2023-02-07 with reprex v2.0.2.9000 |
Here's an example with a larger hit. The more terms there are on the RHS, the bigger the slowdown. df <- tibble::tibble(x = 1:10000)
bench::mark(
b1 = dplyr::summarize(
df,
res = (mean(x) + mean(x) + mean(x) + mean(x) + mean(x) + mean(x) + mean(x) + mean(x))
),
b2 = dplyr::summarize(
df,
res = mean(x) + mean(x) + mean(x) + mean(x) + mean(x) + mean(x) + mean(x) + mean(x)
)
)
#> # A tibble: 2 x 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 b1 2.1ms 2.21ms 433. 3.8MB 15.2
#> 2 b2 54ms 53.95ms 18.5 698.2KB 167. |
Ah that is very helpful thanks! |
I've added a private option to disable the slow part of |
Using
dplyr 1.1.0
andvctrs 0.5.2
, I'm noticing speed issues withmutate
andsummarize
related to parentheses on the RHS.On
dplyr 1.0.10
withvctrs 0.5.0
, all give the same result, ~1.5 ms. I'm usingtibble 3.1.8
for both tests. With more complicated aggregation expressions, I've seen 15x slowdowns.The text was updated successfully, but these errors were encountered: