-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
inconsistency between dplyr and dbplyr summarise_at when variable names reused #3547
Comments
🤔 I can see how this is confusing, but this is by design, i.e. the library(dplyr)
mtcars %>%
summarise(
cyl = sum(cyl * cyl * mpg),
disp = sum(disp * cyl * mpg),
hp = sum(hp * cyl * mpg),
drat = sum(drat * cyl * mpg)
)
#> cyl disp hp drat
#> 1 23197.6 2985648964 1957012170 55216714 So the first expression overwrites library(dplyr)
mtcars %>%
summarise_at(vars(disp,hp,drat,cyl), funs(sum(. * cyl * mpg)))
#> disp hp drat cyl
#> 1 859455.2 546183.6 13246.21 23197.6 or first create the library(dplyr)
mtcars %>%
mutate( ..cylmpg.. = cyl * mpg) %>%
summarise_at(vars(cyl,disp,hp,drat), funs(sum(. * ..cylmpg..)))
#> cyl disp hp drat
#> 1 23197.6 859455.2 546183.6 13246.21 Another approach could be to give a name to the function: library(dplyr)
mtcars %>%
summarise_at(vars(cyl,disp,hp,drat), funs(f = sum(. * cyl * mpg)))
#> cyl_f disp_f hp_f drat_f
#> 1 23197.6 859455.2 546183.6 13246.21 @hadley any thoughts on the issue ? |
Related to this test case #138 test_that("summarise can refer to variables that were just created (#138)", {
res <- summarise(tbl_df(mtcars), cyl1 = mean(cyl), cyl2 = cyl1 + 1)
expect_equal(res$cyl2, mean(mtcars$cyl) + 1)
gmtcars <- group_by(tbl_df(mtcars), am)
res <- summarise(gmtcars, cyl1 = mean(cyl), cyl2 = cyl1 + 1)
res_direct <- summarise(gmtcars, cyl2 = mean(cyl) + 1)
expect_equal(res$cyl2, res_direct$cyl2)
}) |
It’s definitely intended behaviour - this matches how summarise and tibble work (which you might not like, but we’re definitely locked into now). It is supposed to behave the same way in dbplyr too - I’ll have to look into what’s going wrong. |
I can cope with it either way, but having it different in dplyr and dbplyr is more of a problem. |
This issue was moved by krlmlr to tidyverse/dbplyr/issues/116. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
When
summarise_at
uses a variable in.vars
that it also uses in.funs
, the result is different in dplyr and dbplyr, and I think the dbplyr result is the right one -- though there's a case for just making it an error.It looks as though in R the variable
cyl
is updated before it is used for the 2nd through 4th sums, but in SQL it isn't.The text was updated successfully, but these errors were encountered: