MSSQL connection. Errors in dplyr select() after arrange #3062

pssguy · 2017-08-29T22:22:15Z

I am attempting to use an MSSQL connection and hitting this issue

I first replicate the example from the dbplyr intro

library(odbc)
library(DBI)

 library(tidyverse) 
 library(dbplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
DBI::dbWriteTable(con, "iris", iris)
iris2 <- tbl(con, "iris")

iris2 %>% 
  arrange(Species) %>% 
  select(Sepal.Length)

This works fine

Now with an MSSQL connection

con2 <- dbConnect(odbc::odbc(), DSN = "premier")
DBI::dbWriteTable(con2, "iris", iris, overwrite=TRUE)
iris9 <- tbl(con2, "iris")

test1 <-iris9 %>% 
  arrange(Species) %>% 
  select(Sepal.Length)

test1

# Error: <SQL> 'SELECT TOP 1000 "Sepal.Length" AS "Sepal.Length" FROM (SELECT * FROM "iris" ORDER BY "Species") "odeiuzmtqh"' nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.

So it appears as though error has a conflict re use of TOP

Reversing the select and arrange commands

test2 <-iris9 %>% 
  select(Sepal.Length) %>% 
  arrange(Species) 

test2 # No error

This works in a simple example but I will sometimes need to arrange data prior to other processes in a pipe

When I look at the problem code it does not exactly replicate error i.e no mention of Top 1000

test1 %>% show_query()
# <SQL>
# SELECT "Sepal.Length" AS "Sepal.Length"
# FROM (SELECT *
# FROM "iris"
# ORDER BY "Species") "omlcgfsrjt"

Trying several alternatives in SQL


SELECT "Sepal.Length" AS "Sepal.Length"
 FROM (SELECT *
 FROM "iris"
 ORDER BY "Species") "omlcgfsrjt"

nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. 
Failed to execute SQL chunk

something that looks like error code


SELECT TOP 1000 "Sepal.Length" AS "Sepal.Length"
 FROM (SELECT *
 FROM "iris"
 ORDER BY "Species") "omlcgfsrjt"

  nanodbc/nanodbc.cpp:1587: 42000: [Microsoft][SQL Server Native Client 11.0][SQL Server]The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified. 
Failed to execute SQL chunk

replacing top 1000 in sub-query produces the desired output


SELECT  "Sepal.Length" AS "Sepal.Length"
 FROM (SELECT TOP 1000 *
 FROM "iris"
 ORDER BY "Species") "omlcgfsrjt"

Not sure if this is an error or just something that has not yet been addressed for MSSQL.

p.s. Why no issues option under dbplyr?

The text was updated successfully, but these errors were encountered:

edgararuiz-zz · 2017-09-06T00:46:40Z

Hi @pssguy , it looks like this is a nuance of the queries MSSQL accepts. In order for it to work, the query from test1 that originally translates to this:

# <SQL>
# SELECT "Sepal.Length" AS "Sepal.Length"
# FROM (SELECT *
# FROM "iris"
# ORDER BY "Species") "omlcgfsrjt"

Should actually translate to this:

# <SQL>
# SELECT "Sepal.Length" AS "Sepal.Length"
# FROM "iris"
# ORDER BY "Species"

We will need to figure a way to optimize this query.

pssguy · 2017-09-07T22:48:05Z

@edgararuiz Thanks. Not just optimizing, of course. Currently an error is thrown

edgararuiz-zz · 2017-09-07T23:17:42Z

Right, optimize was probably the wrong word choice, I meant we'll need to figure out a way to merge the two SQL query layers to prevent the error from happening.

imanuelcostigan · 2017-10-18T21:04:56Z

@edgararuiz you may want to look at RSQLServer's sql_select() method which deals with a number of SQL Server's SELECT idiosyncracies.

edgararuiz-zz · 2017-10-19T00:43:29Z

Thanks @imanuelcostigan ! I'll take a look

hadley · 2017-10-23T16:41:27Z

@edgararuiz do you want to work on a PR for this issue?

edgararuiz-zz · 2017-10-23T16:51:52Z

Yes, I'll be happy to

hadley · 2017-11-02T20:47:53Z

Minimal reprex:

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE) 

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
mf <- copy_to(con, data.frame(x = 1:5, y = 5:1), name = "test")

mf %>%
  arrange(x) %>%
  select(y) %>%
  show_query()
#> <SQL>
#> SELECT `y`
#> FROM (SELECT *
#> FROM `test`
#> ORDER BY `x`)

DBI::dbGetQuery(con, "SELECT y FROM test ORDER BY x")
#>   y
#> 1 5
#> 2 4
#> 3 3
#> 4 2
#> 5 1

Ideally this would only generate one query because conceptually the select happens after the arrange.

I think that implies we can fix this issue by reordering select_query_clauses(), which currently looks like this:

  present <- c(
    where =    length(x$where) > 0,
    group_by = length(x$group_by) > 0,
    having =   length(x$having) > 0,
    select =   !identical(x$select, sql("*")),
    distinct = x$distinct,
    order_by = length(x$order_by) > 0,
    limit    = !is.null(x$limit)
  )

Currently select comes before arrange when really it should come afterwards.

And indeed if we move select to the end then we get:

SELECT `y`
FROM `test`
ORDER BY `x`

It remains to consider if this is actually correct - i.e. are there situations when this change would yield invalid SQL

hadley · 2017-11-02T21:13:32Z

Ah I think the problem with performing this optimisation is this query:

memdb_frame(x = 1:2) %>%
    arrange(x) %>%
    mutate(x = -x)

This should return c(-1, -2), but if we collapse the query as described above we generate:

SELECT -`x` AS `x`
FROM `gatlqlicge`
ORDER BY `x`

which yields c(-2, -1) because the ORDER BY clause uses aliases defined in SELECT (as described in https://sqlbolt.com/lesson/select_queries_order_of_execution). This means that this optimisation is not possible in general.

But this is a mutate() and the motivation issue is a select(). Can we perform the optimisation at a higher level? I think the answer is no, because select()s can rename variables and this SQL would still be incorrect:

  memdb_frame(x = 1:2, y = 3:2) %>%
    arrange(x) %>%
    select(x = y) %>%
    show_query()
#> SELECT `y` AS `x`
#> FROM `bmvfznmfws`
#> ORDER BY `x`

hadley · 2017-11-02T21:18:06Z

But maybe we can just do the optimisation when the select doesn't create any aliases

hadley · 2017-11-02T21:33:19Z

I tried that and couldn't get it to work 😢

ghost · 2018-06-07T23:40:36Z

This issue was moved by hadley to tidyverse/dbplyr/issues/94.

lock · 2018-12-05T00:38:29Z

This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/

pssguy changed the title ~~MSSQL connection. Errors in dplyr:select after arrange~~ MSSQL connection. Errors in dplyr select() after arrange Aug 29, 2017

hadley added bug an unexpected problem or unintended behavior database labels Oct 23, 2017

hadley added the wip work in progress label Oct 23, 2017

edgararuiz-zz mentioned this issue Oct 25, 2017

Fixes MS SQL issue of select after arrange tidyverse/dbplyr#46

Closed

hadley removed the wip work in progress label Jun 7, 2018

ghost mentioned this issue Jun 7, 2018

MSSQL connection. Errors in dplyr select() after arrange tidyverse/dbplyr#94

Closed

ghost deleted a comment from hadley Jun 7, 2018

ghost closed this as completed Jun 7, 2018

lock bot locked and limited conversation to collaborators Dec 5, 2018

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSSQL connection. Errors in dplyr select() after arrange #3062

MSSQL connection. Errors in dplyr select() after arrange #3062

pssguy commented Aug 29, 2017 •

edited

Loading

edgararuiz-zz commented Sep 6, 2017

pssguy commented Sep 7, 2017

edgararuiz-zz commented Sep 7, 2017

imanuelcostigan commented Oct 18, 2017

edgararuiz-zz commented Oct 19, 2017

hadley commented Oct 23, 2017

edgararuiz-zz commented Oct 23, 2017

hadley commented Nov 2, 2017

hadley commented Nov 2, 2017 •

edited

Loading

hadley commented Nov 2, 2017

hadley commented Nov 2, 2017

ghost commented Jun 7, 2018

lock bot commented Dec 5, 2018

MSSQL connection. Errors in dplyr select() after arrange #3062

MSSQL connection. Errors in dplyr select() after arrange #3062

Comments

pssguy commented Aug 29, 2017 • edited Loading

edgararuiz-zz commented Sep 6, 2017

pssguy commented Sep 7, 2017

edgararuiz-zz commented Sep 7, 2017

imanuelcostigan commented Oct 18, 2017

edgararuiz-zz commented Oct 19, 2017

hadley commented Oct 23, 2017

edgararuiz-zz commented Oct 23, 2017

hadley commented Nov 2, 2017

hadley commented Nov 2, 2017 • edited Loading

hadley commented Nov 2, 2017

hadley commented Nov 2, 2017

ghost commented Jun 7, 2018

lock bot commented Dec 5, 2018

pssguy commented Aug 29, 2017 •

edited

Loading

hadley commented Nov 2, 2017 •

edited

Loading