Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ORDER BY is ignored in subqueries without LIMIT" without a subquery #562

Closed
andrew-schulman opened this issue Dec 15, 2020 · 6 comments
Closed
Labels
reprex needs a minimal reproducible example

Comments

@andrew-schulman
Copy link

andrew-schulman commented Dec 15, 2020

With dplyr 1.0.3 and dbplyr 2.0.0, I create a query:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
dbpool <- pool::dbPool(RPostgres::Postgres(), ...)
query <- dplyr::tbl(dbpool, dbplyr::ident_q('echo_dfr.x_cwa_limits')) %>%
dplyr::select(external_permit_nmbr) %>%
dplyr::distinct() %>%
dplyr::arrange(external_permit_nmbr)

This works fine, and gives the SQL I expect:

dplyr::show_query(query)
#> <SQL>
#> SELECT DISTINCT "external_permit_nmbr"
#> FROM echo_dfr.x_cwa_limits
#> ORDER BY "external_permit_nmbr"

So far, so good. But when I run the query, I get a warning about a subquery:

result <- dplyr::pull(query)
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> i Do you need to move arrange() later in the pipeline or use window_order() instead?

This doesn't seem to make sense, because the generated SQL doesn't include a subquery. But indeed the result isn't sorted:

result[1:5]
#> [1] "WY0050091" "IL0034592" "WVG414531" "VAG110338" "WVG415225"

If I submit the above SQL to Postgres directly using pool::dbGetQuery(), without using dbplyr, it works as expected:

result <- pool::dbGetQuery(dbpool, 
'SELECT DISTINCT "external_permit_nmbr" FROM echo_dfr.x_cwa_limits ORDER BY "external_permit_nmbr"')

The result is sorted, and I don't get a warning.

@andrew-schulman
Copy link
Author

andrew-schulman commented Dec 15, 2020

I just verified that I get the same warning when the database driver is odbc::odbc(), although the generated SQL is slightly different (table name is in parentheses):

dplyr::show_query(query)
#> <SQL>
#> SELECT DISTINCT "EXTERNAL_PERMIT_NMBR"
#> FROM (echo_dfr.x_cwa_limits) 
#> ORDER BY "EXTERNAL_PERMIT_NMBR"
result <- dplyr::pull(query)
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> i Do you need to move arrange() later in the pipeline or use window_order() instead?

@hadley
Copy link
Member

hadley commented Jan 19, 2021

Could you please make a reprex, following the advice in https://dbplyr.tidyverse.org/articles/reprex.html?

@hadley hadley added the reprex needs a minimal reproducible example label Jan 19, 2021
@hadley
Copy link
Member

hadley commented Jan 21, 2021

Ok, here's a reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = "test",
  host = "localhost",
  user = ""
)

db <- copy_to(con, data.frame(x = 1:2), "test")

db2 <- db %>% 
  select(x) %>% 
  distinct() %>% 
  arrange(x)

db2 %>% collect()
#> # A tibble: 2 x 1
#>       x
#>   <int>
#> 1     1
#> 2     2
db2 %>% pull()
#> Warning: ORDER BY is ignored in subqueries without LIMIT
#> ℹ Do you need to move arrange() later in the pipeline or use window_order() instead?
#> [1] 1 2

Created on 2021-01-21 by the reprex package (v0.3.0.9001)

The problem seems to be specifically with pull()

@hadley
Copy link
Member

hadley commented Jan 21, 2021

And the problem is that pull() itself does a select to ensure that you're only retrieving a single column. I think the easiest fix is for pull() not to do that additional select when there's already only one column in the result.

@hadley
Copy link
Member

hadley commented Jan 21, 2021

Fixed in e2c383a

@hadley hadley closed this as completed Jan 21, 2021
@andrew-schulman
Copy link
Author

Wow. Thanks. I was going to find time today to make a reprex...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
reprex needs a minimal reproducible example
Projects
None yet
Development

No branches or pull requests

2 participants