Skip to content

dbplyr >= 2.3.0 incorrectly combines distinct() and select(), giving wrong results #1141

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tilo-aok opened this issue Feb 8, 2023 · 1 comment · Fixed by #1142
Closed
Milestone

Comments

@tilo-aok
Copy link

tilo-aok commented Feb 8, 2023

Since dbplyr 2.3.0, combinations of select() and distinct() are combined to erronous sql-queries, giving incorrect results.

The queries df |> select() |> distinct() and df |> distinct() |> select() should produce different sql queries, instead they are the same. Both queries would result in output x=1:

library(tidyverse)
library(dbplyr)
#> 
#> Attache Paket: 'dbplyr'
#> Die folgenden Objekte sind maskiert von 'package:dplyr':
#> 
#>     ident, sql
lazy_frame(x = 1, y = 1:2) |> select(x) |> distinct() |> print()
#> <SQL>
#> SELECT DISTINCT `x`
#> FROM `df`
lazy_frame(x = 1, y = 1:2) |> distinct() |> select(x) |> print()
#> <SQL>
#> SELECT DISTINCT `x`
#> FROM `df`

Before version 2.3.0, the correct output was as follows. The first query would result in x=1, the second query would result in x=c(1, 1)

library(tidyverse)
library(dbplyr)
#> 
#> Attache Paket: 'dbplyr'
#> Die folgenden Objekte sind maskiert von 'package:dplyr':
#> 
#>     ident, sql
lazy_frame(x = 1, y = 1:2) |> select(x) |> distinct() |> print()
#> <SQL>
#> SELECT DISTINCT `x`
#> FROM `df`
lazy_frame(x = 1, y = 1:2) |> distinct() |> select(x) |> print()
#> <SQL>
#> SELECT `x`
#> FROM (
#>   SELECT DISTINCT *
#>   FROM `df`
#> ) `q01`

I consider this issue to break the functionality of dbplyr

@mgirlich
Copy link
Collaborator

mgirlich commented Feb 9, 2023

Thanks for filing this issue. It is now fixed in the dev version (together with a couple of other issues with incorrect SQL). You can install it via devtools::install_github("tidyverse/dbplyr").

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants