Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] lead() inside summarise() gives incorrect results #1434

Closed
npjc opened this issue Oct 1, 2015 · 4 comments
Closed

[BUG] lead() inside summarise() gives incorrect results #1434

npjc opened this issue Oct 1, 2015 · 4 comments
Assignees
Milestone

Comments

@npjc
Copy link
Contributor

@npjc npjc commented Oct 1, 2015

Calling lead() inside summarise() does not behave as expected:

library(dplyr)
mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n(),
            leadn = lead(n))
#> Source: local data frame [3 x 3]
#> 
#>     cyl     n leadn
#>   (dbl) (int) (int)
#> 1     4    11     7
#> 2     6     7     1
#> 3     8    14     1
mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n(),
            leadcyl = lead(cyl))
#> Source: local data frame [3 x 3]
#> 
#>     cyl     n leadcyl
#>   (dbl) (int)   (dbl)
#> 1     4    11       6
#> 2     6     7       6
#> 3     8    14       4

mutate() after summarise() gives as intended:

mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n()) %>% 
  mutate(leadn = lead(n))
#> Source: local data frame [3 x 3]
#> 
#>     cyl     n leadn
#>   (dbl) (int) (int)
#> 1     4    11     7
#> 2     6     7    14
#> 3     8    14    NA
mtcars %>% 
  group_by(cyl) %>% 
  summarise(n = n()) %>% 
  mutate(leadcyl = lead(cyl))
#> Source: local data frame [3 x 3]
#> 
#>     cyl     n leadcyl
#>   (dbl) (int)   (dbl)
#> 1     4    11       6
#> 2     6     7       8
#> 3     8    14      NA

provenance:

devtools::session_info()
#> Session info --------------------------------------------------------------
#>  setting  value                       
#>  version  R version 3.2.2 (2015-08-14)
#>  system   x86_64, darwin13.4.0        
#>  ui       RStudio (0.99.700)          
#>  language (EN)                        
#>  collate  en_CA.UTF-8                 
#>  tz       America/Los_Angeles         
#>  date     2015-09-30
#> Packages ------------------------------------------------------------------
#>  package    * version    date       source                         
#>  assertthat   0.1        2013-12-06 CRAN (R 3.2.0)                 
#>  clipr        0.1.1      2015-09-04 CRAN (R 3.2.0)                 
#>  colorspace   1.2-6      2015-03-11 CRAN (R 3.2.0)                 
#>  DBI          0.3.1      2014-09-24 CRAN (R 3.2.0)                 
#>  devtools     1.9.1      2015-09-11 CRAN (R 3.2.0)                 
#>  digest       0.6.8      2014-12-31 CRAN (R 3.2.0)                 
#>  dplyr      * 0.4.3.9000 2015-09-27 Github (hadley/dplyr@dd10ccd)  
#>  evaluate     0.8        2015-09-18 CRAN (R 3.2.0)                 
#>  formatR      1.2.1      2015-09-18 CRAN (R 3.2.0)                 
#>  ggplot2      1.0.1      2015-03-17 CRAN (R 3.2.0)                 
#>  gtable       0.1.2      2012-12-05 CRAN (R 3.2.0)                 
#>  htmltools    0.2.6      2014-09-08 CRAN (R 3.2.0)                 
#>  knitr        1.11       2015-08-14 CRAN (R 3.2.2)                 
#>  lazyeval     0.1.10     2015-01-02 CRAN (R 3.2.0)                 
#>  magrittr     1.5        2014-11-22 CRAN (R 3.2.0)                 
#>  MASS         7.3-43     2015-07-16 CRAN (R 3.2.2)                 
#>  memoise      0.2.1      2014-04-22 CRAN (R 3.2.0)                 
#>  munsell      0.4.2      2013-07-11 CRAN (R 3.2.0)                 
#>  plyr         1.8.3      2015-06-12 CRAN (R 3.2.0)                 
#>  proto        0.3-10     2012-12-22 CRAN (R 3.2.0)                 
#>  R6           2.1.1      2015-08-19 CRAN (R 3.2.0)                 
#>  Rcpp         0.12.1     2015-09-10 CRAN (R 3.2.0)                 
#>  reprex       0.0.0.9001 2015-09-26 Github (jennybc/reprex@1d6584a)
#>  reshape2     1.4.1      2014-12-06 CRAN (R 3.2.0)                 
#>  rmarkdown    0.8        2015-08-30 CRAN (R 3.2.2)                 
#>  scales       0.3.0      2015-08-25 CRAN (R 3.2.0)                 
#>  stringi      0.5-5      2015-06-29 CRAN (R 3.2.0)                 
#>  stringr      1.0.0      2015-04-30 CRAN (R 3.2.0)
@npjc npjc changed the title [BUG] lead() inside summaries() gives incorrect results [BUG] lead() inside summarise() gives incorrect results Oct 1, 2015
@romainfrancois romainfrancois self-assigned this Oct 1, 2015
@romainfrancois romainfrancois added this to the 0.5 milestone Oct 1, 2015
@hadley
Copy link
Member

@hadley hadley commented Oct 1, 2015

Using lead() inside summarise doesn't make a lot of sense to me - are you sure you don't want nth()?

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Oct 1, 2015

There's too issues here I guess. The internal lead is confused and gives garbage. That I can handle.

> mtcars %>% group_by(cyl) %>% summarise(n = n(), lead_n = lead(n))
Source: local data frame [3 x 3]

    cyl     n lead_n
  (dbl) (int)  (int)
1     4    11      7
2     6     7  32697
3     8    14  32697

But also, as @hadley says, this does not make much sense. Even if it worked, it would always get one n per evaluation. Consider this :

> lead_ <- function(x){ message("x") ; print(x); message( "lead(x)" ); print(lead(x)); message("---"); lead(x) }
>
> mtcars %>% group_by(cyl) %>% summarise(n = n(), lead_n = lead_(n))
x
[1] 11
lead(x)
[1] NA
---
x
[1] 7
lead(x)
[1] NA
---
x
[1] 14
lead(x)
[1] NA
---
Source: local data frame [3 x 3]

    cyl     n lead_n
  (dbl) (int)  (lgl)
1     4    11     NA
2     6     7     NA
3     8    14     NA

which makes sense:

> lead(7)
[1] NA
> lead(11)
[1] NA
> lead(14)
[1] NA

@npjc
Copy link
Contributor Author

@npjc npjc commented Oct 1, 2015

@hadley @romainfrancois you are correct that it doesn't really make sense (I actually discovered this by mistake -- I wasn't trying to do it). But I figured I would file this observation here regardless, as it did give 'garbage'.

I hope this is ok!

@romainfrancois
Copy link
Member

@romainfrancois romainfrancois commented Nov 1, 2015

Yes thanks, we definitely want to know that kind of things.

@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants