Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistency between dplyr and dbplyr summarise_at when variable names reused #3547

Closed
tslumley opened this issue Apr 27, 2018 · 6 comments
Closed
Labels
bug an unexpected problem or unintended behavior

Comments

@tslumley
Copy link

When summarise_at uses a variable in .vars that it also uses in .funs, the result is different in dplyr and dbplyr, and I think the dbplyr result is the right one -- though there's a case for just making it an error.

library(dplyr)
library(MonetDBLite)
library(DBI)
mydb <- dbConnect(MonetDBLite::MonetDBLite())
DBI::dbWriteTable(mydb, "mtcarsdb", mtcars)
mtcars.db = tbl(mydb, "mtcarsdb")

## wrong
mtcars %>% summarise_at(vars(cyl,disp,hp,drat), funs(sum(. *  cyl * mpg)))
## right
mtcars.db %>% summarise_at(vars(cyl,disp,hp,drat), funs(sum(. *  cyl * mpg)))
## also right
mtcars %>% mutate(cyl1=cyl) %>% summarise_at(vars(cyl,disp,hp,drat), funs(sum(. *  cyl1 * mpg)))

# intended result
with(mtcars, cbind(cyl * cyl * mpg,
                     disp * cyl * mpg,
                     hp * cyl * mpg,
                     drat * cyl * mpg) %>% colSums())

It looks as though in R the variable cyl is updated before it is used for the 2nd through 4th sums, but in SQL it isn't.

@romainfrancois
Copy link
Member

romainfrancois commented Apr 28, 2018

🤔 I can see how this is confusing, but this is by design, i.e. the summarise_at call essentially generate that:

library(dplyr)
mtcars %>% 
  summarise( 
    cyl   = sum(cyl *  cyl * mpg), 
    disp  = sum(disp *  cyl * mpg), 
    hp    = sum(hp *  cyl * mpg), 
    drat  = sum(drat *  cyl * mpg)
  )
#>       cyl       disp         hp     drat
#> 1 23197.6 2985648964 1957012170 55216714

So the first expression overwrites cyl and this is what subsequent expressions use. One way around it would be to move it to last:

library(dplyr)
mtcars %>% 
  summarise_at(vars(disp,hp,drat,cyl), funs(sum(. * cyl * mpg))) 
#>       disp       hp     drat     cyl
#> 1 859455.2 546183.6 13246.21 23197.6

or first create the cyl*mpg as a separate variable in a mutate.

library(dplyr)

mtcars %>% 
  mutate( ..cylmpg.. = cyl * mpg) %>% 
  summarise_at(vars(cyl,disp,hp,drat), funs(sum(. * ..cylmpg..))) 
#>       cyl     disp       hp     drat
#> 1 23197.6 859455.2 546183.6 13246.21

Another approach could be to give a name to the function:

library(dplyr)
mtcars %>% 
  summarise_at(vars(cyl,disp,hp,drat), funs(f = sum(. * cyl * mpg))) 
#>     cyl_f   disp_f     hp_f   drat_f
#> 1 23197.6 859455.2 546183.6 13246.21

@hadley any thoughts on the issue ?

@romainfrancois
Copy link
Member

Related to this test case #138

test_that("summarise can refer to variables that were just created (#138)", {
  res <- summarise(tbl_df(mtcars), cyl1 = mean(cyl), cyl2 = cyl1 + 1)
  expect_equal(res$cyl2, mean(mtcars$cyl) + 1)

  gmtcars <- group_by(tbl_df(mtcars), am)
  res <- summarise(gmtcars, cyl1 = mean(cyl), cyl2 = cyl1 + 1)
  res_direct <- summarise(gmtcars, cyl2 = mean(cyl) + 1)
  expect_equal(res$cyl2, res_direct$cyl2)
})

@hadley
Copy link
Member

hadley commented Apr 28, 2018

It’s definitely intended behaviour - this matches how summarise and tibble work (which you might not like, but we’re definitely locked into now).

It is supposed to behave the same way in dbplyr too - I’ll have to look into what’s going wrong.

@tslumley
Copy link
Author

I can cope with it either way, but having it different in dplyr and dbplyr is more of a problem.

@hadley hadley added the bug an unexpected problem or unintended behavior label May 20, 2018
@ghost ghost deleted a comment from krlmlr Jun 25, 2018
@ghost
Copy link

ghost commented Jun 25, 2018

This issue was moved by krlmlr to tidyverse/dbplyr/issues/116.

@ghost ghost closed this as completed Jun 25, 2018
@lock
Copy link

lock bot commented Dec 22, 2018

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

@lock lock bot locked and limited conversation to collaborators Dec 22, 2018
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants