-
Notifications
You must be signed in to change notification settings - Fork 181
mutate()
is incorrectly inlined after distinct()
#1119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Moved to dbplyr |
mutate()
is incorrectly inlined after distinct()
More minimal reprex library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
memdb_frame(x = 1:2) %>%
distinct(x) %>%
mutate(x = 0) %>%
collect()
#> # A tibble: 1 × 1
#> x
#> <dbl>
#> 1 0 Created on 2023-02-02 with reprex v2.0.2 |
@vadim-cherepanov Thanks for filing the issue. Should be fixed in the dev version 😄 |
Thank you for taking care of this. I am just wondering what changes behind the scene caused this inlining of mutate, and are we sure the same changes did not break something else in other cases. |
The queries generated by dbplyr used to consist of a lot of subqueries. For version 2.3.0 we tried to reduce the number of subqueries. This improves the readability, often quite a lot, but can also result in faster queries (e.g. in the case of multiple joins). |
I just updated to dplyr v1.1.0 + dbplyr 2.3.0, and my code broke. Upon investigation it seems that now a wrong SQL query is generated. Namely, in my case, I have
distinct
prior to thegroup_by
block.If I add
compute
between them, i.e. the result ofdistinct
is actually computed and stored in a temporary table,n()
returns values as expected.This is the SQL query generated by dplyr v1.1.0 + dbplyr 2.3.0:
Note how it extended
distinct
onto the subsequent query.And this one is generated (as expected) by dplyr v1.0.10 + dbplyr 2.2.1:
The text was updated successfully, but these errors were encountered: