-
Notifications
You must be signed in to change notification settings - Fork 171
-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSSQL connection. Errors in dplyr select() after arrange #94
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@hadley commented on Nov 2, 2017, 8:47 PM UTC: Minimal reprex: library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mf <- copy_to(con, data.frame(x = 1:5, y = 5:1), name = "test")
mf %>%
arrange(x) %>%
select(y) %>%
show_query()
#> <SQL>
#> SELECT `y`
#> FROM (SELECT *
#> FROM `test`
#> ORDER BY `x`)
DBI::dbGetQuery(con, "SELECT y FROM test ORDER BY x")
#> y
#> 1 5
#> 2 4
#> 3 3
#> 4 2
#> 5 1 Ideally this would only generate one query because conceptually the select happens after the arrange. I think that implies we can fix this issue by reordering present <- c(
where = length(x$where) > 0,
group_by = length(x$group_by) > 0,
having = length(x$having) > 0,
select = !identical(x$select, sql("*")),
distinct = x$distinct,
order_by = length(x$order_by) > 0,
limit = !is.null(x$limit)
) Currently And indeed if we move SELECT `y`
FROM `test`
ORDER BY `x` It remains to consider if this is actually correct - i.e. are there situations when this change would yield invalid SQL |
@hadley commented on Nov 2, 2017, 9:13 PM UTC: Ah I think the problem with performing this optimisation is this query: memdb_frame(x = 1:2) %>%
arrange(x) %>%
mutate(x = -x) This should return
which yields But this is a memdb_frame(x = 1:2, y = 3:2) %>%
arrange(x) %>%
select(x = y) %>%
show_query()
#> SELECT `y` AS `x`
#> FROM `bmvfznmfws`
#> ORDER BY `x` |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Reprex of original problem: library(DBI)
library(dplyr)
con <- dbConnect(odbc::odbc(), "SQL Server", database = "airontime")
x <- tbl(con, "airlines")
x %>%
arrange(carrier) %>%
select(name) %>%
head()
#> Error in new_result(connection@ptr, statement) :
#> nanodbc/nanodbc.cpp:1344: HY000: The ORDER BY clause is invalid in views, inline functions,
#> derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is
#> also specified., Statement(s) could not be prepared., Because this is the generated SQL: SELECT TOP 6 "name" FROM
(SELECT * FROM "airlines" ORDER BY "carrier") "jmausnbmvw" so another option might be to try and push |
Note that the problem also occurs for other queries: x %>%
arrange(carrier) %>%
select(name) %>%
mutate(name = substr(name, 1, 1)) %>%
collect() i.e. it's the |
Having read through the MS SQL docs, I don't think there's anything the dbplyr can do about this — you just need to make sure that It would be nice if we could give a better error message here, but there's no easy way to do it, and given that no one else has commented on this issue, it seems unlikely to be a common problem. |
@pssguy commented on Aug 29, 2017, 10:22 PM UTC:
I am attempting to use an MSSQL connection and hitting this issue
I first replicate the example from the dbplyr intro
This works fine
Now with an MSSQL connection
Reversing the select and arrange commands
This works in a simple example but I will sometimes need to arrange data prior to other processes in a pipe
When I look at the problem code it does not exactly replicate error i.e no mention of Top 1000
Trying several alternatives in SQL
something that looks like error code
replacing top 1000 in sub-query produces the desired output
Not sure if this is an error or just something that has not yet been addressed for MSSQL.
p.s. Why no issues option under dbplyr?
This issue was moved by hadley from tidyverse/dplyr/issues/3062.
The text was updated successfully, but these errors were encountered: