Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summarising logical variables is very slow #3189

Closed
vpanfilov opened this issue Nov 4, 2017 · 2 comments
Closed

Summarising logical variables is very slow #3189

vpanfilov opened this issue Nov 4, 2017 · 2 comments
Assignees

Comments

@vpanfilov
Copy link

vpanfilov commented Nov 4, 2017

Hello! I found out that summarising logical variable over a grouped tibble somehow takes A LOT more time than summarising double variable. Don't really sure if is is a dplyr problem or not, but this situation seems counter-intuitive for me.

Here is a simple examble, where summarising logical variable is 100 times slower than double:

library(dplyr)
library(microbenchmark)

# create test tibble with logical and double variables
testTibble <- tibble(v = 1:10000 %>% rep(5),
                     a = 1,
                     b = TRUE,
                     c = 1L) %>%
  group_by(v)

testTibble %>% glimpse()
#> Observations: 50,000
#> Variables: 4
#> $ v <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 1...
#> $ a <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
#> $ b <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ...
#> $ c <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

microbenchmark(
  # summarizing double variable
  testTibble %>% summarise(s = sum(a)),
  # summarizing logical variable <- ISSUE IS HERE
  testTibble %>% summarise(s = sum(b)),
  # summarizing integer variable
  testTibble %>% summarise(s = sum(c)),
  # compare with simple sums without grouping
  testTibble %>% pull(a) %>% sum(),
  testTibble %>% pull(b) %>% sum(),
  testTibble %>% pull(c) %>% sum()
)
#> Unit: milliseconds
#>                                  expr        min         lq       mean
#>  testTibble %>% summarise(s = sum(a))   1.872467   2.049959   2.217249
#>  testTibble %>% summarise(s = sum(b)) 268.395932 272.521836 277.941838
#>  testTibble %>% summarise(s = sum(c))   1.938750   2.108582   2.296201
#>      testTibble %>% pull(a) %>% sum()   3.053631   3.165402   3.328143
#>      testTibble %>% pull(b) %>% sum()   3.101226   3.165035   3.315196
#>      testTibble %>% pull(c) %>% sum()   3.066030   3.157654   3.329642
#>      median         uq        max neval
#>    2.169340   2.300528   3.591446   100
#>  273.824589 279.503995 335.747118   100
#>    2.195381   2.336596   6.696211   100
#>    3.233151   3.363945   4.896729   100
#>    3.251053   3.381626   5.105280   100
#>    3.253313   3.391950   4.732548   100
Session info
devtools::session_info()
#> Session info -------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.4.2 (2017-09-28)
#>  system   x86_64, linux-gnu           
#>  ui       X11                         
#>  language (EN)                        
#>  collate  ru_RU.UTF-8                 
#>  tz       Europe/Moscow               
#>  date     2017-11-05
#> Packages -----------------------------------------------------------------
#>  package        * version date       source        
#>  assertthat       0.2.0   2017-04-11 CRAN (R 3.4.2)
#>  backports        1.1.1   2017-09-25 CRAN (R 3.4.2)
#>  base           * 3.4.2   2017-09-28 local         
#>  bindr            0.1     2016-11-13 CRAN (R 3.4.2)
#>  bindrcpp       * 0.2     2017-06-17 CRAN (R 3.4.2)
#>  colorspace       1.3-2   2016-12-14 CRAN (R 3.4.2)
#>  compiler         3.4.2   2017-09-28 local         
#>  datasets       * 3.4.2   2017-09-28 local         
#>  devtools         1.13.3  2017-08-02 CRAN (R 3.4.2)
#>  digest           0.6.12  2017-01-27 CRAN (R 3.4.2)
#>  dplyr          * 0.7.4   2017-09-28 CRAN (R 3.4.2)
#>  evaluate         0.10.1  2017-06-24 CRAN (R 3.4.2)
#>  ggplot2          2.2.1   2016-12-30 CRAN (R 3.4.2)
#>  glue             1.2.0   2017-10-29 CRAN (R 3.4.2)
#>  graphics       * 3.4.2   2017-09-28 local         
#>  grDevices      * 3.4.2   2017-09-28 local         
#>  grid             3.4.2   2017-09-28 local         
#>  gtable           0.2.0   2016-02-26 CRAN (R 3.4.2)
#>  htmltools        0.3.6   2017-04-28 CRAN (R 3.4.2)
#>  knitr            1.17    2017-08-10 CRAN (R 3.4.2)
#>  lazyeval         0.2.1   2017-10-29 CRAN (R 3.4.2)
#>  magrittr         1.5     2014-11-22 CRAN (R 3.4.2)
#>  memoise          1.1.0   2017-04-21 CRAN (R 3.4.2)
#>  methods        * 3.4.2   2017-09-28 local         
#>  microbenchmark * 1.4-2.1 2015-11-25 CRAN (R 3.4.2)
#>  munsell          0.4.3   2016-02-13 CRAN (R 3.4.2)
#>  pkgconfig        2.0.1   2017-03-21 CRAN (R 3.4.2)
#>  plyr             1.8.4   2016-06-08 CRAN (R 3.4.2)
#>  R6               2.2.2   2017-06-17 CRAN (R 3.4.2)
#>  Rcpp             0.12.13 2017-09-28 CRAN (R 3.4.2)
#>  rlang            0.1.2   2017-08-09 CRAN (R 3.4.2)
#>  rmarkdown        1.6     2017-06-15 CRAN (R 3.4.2)
#>  rprojroot        1.2     2017-01-16 CRAN (R 3.4.2)
#>  scales           0.5.0   2017-08-24 CRAN (R 3.4.2)
#>  stats          * 3.4.2   2017-09-28 local         
#>  stringi          1.1.5   2017-04-07 CRAN (R 3.4.2)
#>  stringr          1.2.0   2017-02-18 CRAN (R 3.4.2)
#>  tibble           1.3.4   2017-08-22 CRAN (R 3.4.2)
#>  tools            3.4.2   2017-09-28 local         
#>  utils          * 3.4.2   2017-09-28 local         
#>  withr            2.1.0   2017-11-01 CRAN (R 3.4.2)
#>  yaml             2.1.14  2016-11-12 CRAN (R 3.4.2)
@krlmlr
Copy link
Member

krlmlr commented Dec 12, 2017

Thanks. This is probably because we currently don't have a hybrid implementation for sum() for logical input. I think this would be very useful to have.

@lock
Copy link

lock bot commented Oct 30, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Oct 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants