-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Query text is not deterministic due to generated subquery aliases #336
Comments
This also causes the package's tests to fail when re-run a second time, if the package is not reloaded to reset the global alias counter first. |
You can temporarily set the library(tidyverse)
library(dbplyr)
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
lazy_frame(a = 1) %>%
select(b = a) %>%
mutate(c = b)
#> <SQL>
#> SELECT `b`, `b` AS `c`
#> FROM (SELECT `a` AS `b`
#> FROM `df`) `dbplyr_001`
lazy_frame(a = 1) %>%
select(b = a) %>%
mutate(c = b)
#> <SQL>
#> SELECT `b`, `b` AS `c`
#> FROM (SELECT `a` AS `b`
#> FROM `df`) `dbplyr_002`
rlang::with_options(
dbplyr_table_num = 0,
lazy_frame(a = 1) %>% select(b = a) %>% mutate(c = b) %>% print()
)
#> <SQL>
#> SELECT `b`, `b` AS `c`
#> FROM (SELECT `a` AS `b`
#> FROM `df`) `dbplyr_001` Created on 2020-05-19 by the reprex package (v0.3.0) |
What if we just reset the counter prior at the start of query generation? I think that would be relatively simple to do, and shouldn't have far reaching effects. |
Also need to combine |
Ok, the real problem is that |
Some database engines like Redshift rely on exact query text matches for result caching. Currently, dbplyr will generate a rolling ID for subquery aliases that means any query comprising a subquery will consistently result in cache misses. See below for a small repro case.
Last year, I created a minimal fork to fix this for an internal use-case but I'd like to upstream this properly :) This basically just set all user-inaccessible subquery aliases to
"a"
, but this probably isn't ideal, and I imagine using some dbplyr-specific string similar to the sequential ones would be more sensible. That said, this has been running in a production setting for over a year with no issues that I'm aware of.Personally I'm only really familiar with the Postgres/Redshift implementation of the spec and so I'm not sure what effect this could have on other DB engines. I'm happy to create a PR myself with a revised change if people think this is reasonable, but is it worth making it specific to each backend, or perhaps gated behind an option? I'd appreciate any suggestions or thoughts here.
Created on 2019-07-14 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: