Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map* functions are slow when .f is a string #820

Closed
wch opened this issue Mar 5, 2021 · 4 comments · Fixed by #899
Closed

map* functions are slow when .f is a string #820

wch opened this issue Mar 5, 2021 · 4 comments · Fixed by #899
Labels
feature a feature request or enhancement pluck 🍐
Milestone

Comments

@wch
Copy link
Member

wch commented Mar 5, 2021

For example, map_chr() is 10x slower than vapply():

library(purrr)
x <- lapply(1:10000, function(name) list(a = "A"))

microbenchmark::microbenchmark(
  map_chr(x, 'a'),
  vapply(x, `[[`, 'a', FUN.VALUE = "")
)
#> Unit: milliseconds
#>                                  expr       min        lq      mean    median        uq      max neval
#>                       map_chr(x, "a") 28.079812 30.615371 31.434516 31.281444 31.856803 42.96536   100
#>  vapply(x, `[[`, "a", FUN.VALUE = "")  3.080842  3.346405  3.826315  3.640201  3.894849 13.50321   100

A bit slower would be OK, but this is way slower than the base alternative. For my particular use case, I was surprised to find that a single map_chr call was the most expensive part of my code (taking about 78% of the time); after switching to vapply, it was much faster.

@mgirlich
Copy link
Contributor

mgirlich commented Apr 7, 2021

I reported basically the same issue (#749) some time ago (although I didn't compare to base).
The biggest issue is the call to list2() in pluck(). But even after that it still takes 3 to 4 times as long.

library(purrr)
x <- lapply(1:10000, function(name) list(a = "A"))

pluck_impl <- purrr:::pluck_impl
pluck2 <- function(.x, index, .default = NULL) {
  .Call(
    pluck_impl,
    x = .x,
    index = index,
    missing = .default,
    strict = FALSE
  )
}

microbenchmark::microbenchmark(
  purrr = map_chr(x, 'a'),
  purrr2 = map_chr(x, pluck2, index = list('a')),
  base = vapply(x, `[[`, 'a', FUN.VALUE = "")
)
#> Unit: milliseconds
#>    expr       min        lq      mean    median        uq       max neval
#>   purrr 23.668865 25.666429 27.828165 27.157205 29.682467 36.531673   100
#>  purrr2  9.491911 10.316131 11.255159 11.192060 12.000602 14.780147   100
#>    base  2.400959  2.751333  3.118299  3.052053  3.271311  5.760484   100

Created on 2021-04-07 by the reprex package (v2.0.0)

@hadley hadley added feature a feature request or enhancement pluck 🍐 labels Aug 23, 2022
@hadley
Copy link
Member

hadley commented Aug 27, 2022

I think the root cause is that [[ is 10x faster than pluck():

library(purrr)
x <- lapply(1:10000, function(.) list(a = "A"))

bench::mark(
  pluck(x, 500, "a"),
  x[[500]][["a"]],
)
#> # A tibble: 2 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 pluck(x, 500, "a")   2.88µs   3.21µs   300832.    7.77KB     30.1
#> 2 x[[500]][["a"]]       250ns    293ns  3106710.        0B      0

Created on 2022-08-27 by the reprex package (v2.0.1)

I'm not sure how much we can do here given that [[ gets a boost from being .Primitive.

@hadley
Copy link
Member

hadley commented Aug 27, 2022

OTOH most of the time we call as_mapper() once and pluck many many times, so maybe there's some way to do the dots expansion once. Maybe we could have pluck_internal() that as_mapper() uses? That would give a nice boost for relatively little work. — Oh that's exactly what @mgrlich suggested. I think it's worth doing.

@hadley hadley added this to the 0.4.0 milestone Aug 27, 2022
hadley added a commit that referenced this issue Aug 28, 2022
@hadley
Copy link
Member

hadley commented Aug 28, 2022

With changes in #899:

library(purrr)
x <- lapply(1:1000, function(.) list(a = "A"))
bench::mark(
  map_chr(x, 'a'),
  map_chr(x, `[[`, "a"),
  vapply(x, `[[`, 'a', FUN.VALUE = "")
)
#> # A tibble: 3 × 6
#>   expression                                min   median `itr/sec` mem_alloc
#>   <bch:expr>                           <bch:tm> <bch:tm>     <dbl> <bch:byt>
#> 1 map_chr(x, "a")                        1.11ms   1.16ms      838.   60.16KB
#> 2 map_chr(x, `[[`, "a")                553.25µs 593.62µs     1643.  194.49KB
#> 3 vapply(x, `[[`, "a", FUN.VALUE = "") 219.29µs 230.46µs     4266.    7.86KB
#> # … with 1 more variable: `gc/sec` <dbl>

Created on 2022-08-28 by the reprex package (v2.0.1)

So still not amazing, but twice as fast is nothing to sneeze at. I'm a little suprised that map_chr() with [[ is so much slower than vapply(), especially given that it's all written in C.

@hadley hadley closed this as completed in 469f257 Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement pluck 🍐
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants