Some database engines like Redshift rely on exact query text matches for result caching. Currently, dbplyr will generate a rolling ID for subquery aliases that means any query comprising a subquery will consistently result in cache misses. See below for a small repro case.
Last year, I created a minimal fork to fix this for an internal use-case but I'd like to upstream this properly :) This basically just set all user-inaccessible subquery aliases to "a", but this probably isn't ideal, and I imagine using some dbplyr-specific string similar to the sequential ones would be more sensible. That said, this has been running in a production setting for over a year with no issues that I'm aware of.
Personally I'm only really familiar with the Postgres/Redshift implementation of the spec and so I'm not sure what effect this could have on other DB engines. I'm happy to create a PR myself with a revised change if people think this is reasonable, but is it worth making it specific to each backend, or perhaps gated behind an option? I'd appreciate any suggestions or thoughts here.
pg_table<- lazy_frame(data.frame(x=1, y=2), simulate_postgres())
#> <SQL>#> SELECT *#> FROM (SELECT *#> FROM `df`#> WHERE (`x` = 1.0)) `dbplyr_001`#> WHERE (`y` = 2.0)pg_table %>%
#> <SQL>#> SELECT *#> FROM (SELECT *#> FROM `df`#> WHERE (`x` = 1.0)) `dbplyr_002`#> WHERE (`y` = 2.0)
Ok, the real problem is that unique_name() should really be unique_subquery_name(), it's the subquery names that need to be reset, not the table names (which are primarily used for temporary tables, i.e. testing).