Skip to content

.keep_all=TRUE breaks distinct for MS SQL Server #1053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
carlganz opened this issue Nov 23, 2022 · 5 comments · Fixed by #1267
Closed

.keep_all=TRUE breaks distinct for MS SQL Server #1053

carlganz opened this issue Nov 23, 2022 · 5 comments · Fixed by #1267
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Milestone

Comments

@carlganz
Copy link

It seems to work with SQLite but the use of row_number in distinct(.keep_all=TRUE) isn't quite right for MS SQL Server

library(dplyr)
library(dbplyr)

### need to setup your own SQL Server connection
con <- DBI::dbConnect(odbc::odbc)
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars")

### distinct works as expected
mtcars2 %>% distinct()

### gives error "The function 'ROW_NUMBER' must have an OVER clause with ORDER BY"
mtcars2 %>% distinct(carb, .keep_all = TRUE)
@hadley
Copy link
Member

hadley commented Dec 5, 2022

Minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

lf <- lazy_frame(x = 1, y = 2, con = simulate_mysql())

lf |> distinct(x)
#> <SQL>
#> SELECT DISTINCT `x`
#> FROM `df`
lf |> distinct(x, .keep_all = TRUE)
#> <SQL>
#> SELECT `x`, `y`
#> FROM (
#>   SELECT *, ROW_NUMBER() OVER (PARTITION BY `x`) AS `q01`
#>   FROM `df`
#> ) `q01`
#> WHERE (`q01` = 1)

Created on 2022-12-04 with reprex v2.0.2

@hadley
Copy link
Member

hadley commented Dec 5, 2022

I suspect this will work in very few places since ROW_NUMBER() will always need an ORDER BY. Maybe we should just make this an error?

@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Dec 5, 2022
@mgirlich
Copy link
Collaborator

Apparently, it works in most databases according to modern SQL - of course it doesn't work in MS SQL and Oracle DB... If we want to support this, we could add a dummy window order. @hadley what do you think?

@hadley
Copy link
Member

hadley commented Apr 26, 2023

Yeah, it would be nice to add a nice dummy window order, I think.

@mgirlich
Copy link
Collaborator

@carlganz As I don't have an SQL Server database I can't test this. Can you install the PR version via remotes::install_github("tidyverse/dbplyr#1267") and test whether this now works?

@mgirlich mgirlich added this to the 2.4.0 milestone Jun 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants