Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr::arrange() with the .by_group=TRUE parameter set produces SQL with an error #115

Closed
ghost opened this issue Jun 25, 2018 · 2 comments
Assignees
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL wip work in progress
Milestone

Comments

@ghost
Copy link

ghost commented Jun 25, 2018

@CerebralMastication commented on Apr 15, 2018, 1:56 PM UTC:

This is a cross post from the RStats Community, where I initially posted it: https://community.rstudio.com/t/dplyr-arrange-by-group-true-fails-with-sql-backend/7232

I've either got a misunderstanding or a bug... I think it's a bug.

It seems that dplyr::arrange() with the .by_group=TRUE parameter set produces SQL with an error. Here's how to reprex it:

On the DB (Redshift in my case) set up a dummy table:

drop TABLE sandbox.testorder; 

CREATE TABLE sandbox.testorder (
    grp varchar(255),
    n  DOUBLE PRECISION
);

INSERT INTO sandbox.testorder (grp , n) VALUES ('a',3.3);
INSERT INTO sandbox.testorder (grp , n) VALUES ('a',1.1);
INSERT INTO sandbox.testorder (grp , n) VALUES ('b',2.2);
INSERT INTO sandbox.testorder (grp , n) VALUES ('b',4.4);

Then from R (presuming a connection to the DB called con and already loaded dbplyr

testorder  <- tbl(con, "testorder")

testorder %>%
  group_by( grp ) %>%
  arrange( n, .by_group=TRUE) ->
out_test

show_query(out_test)

which generates the following SQL:

SELECT *
FROM "testorder"
ORDER BY "n", TRUE

which fails if I try to collect(out_test) with the following error:

Error in postgresqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not Retrieve the result : ERROR:  non-integer constant in ORDER BY

the rub seems to be the ,TRUE there at the end. If I remove it, I get the followable runnable SQL:

SELECT *
FROM "testorder"
ORDER BY "n"

My guess is that the routine that generates the SQL has a glitch. Looks like it's just passing , TRUE instead of adding in the group by variables.

This issue was moved by krlmlr from tidyverse/dplyr/issues/3515.

@ghost
Copy link
Author

ghost commented Jun 25, 2018

@hadley commented on May 20, 2018, 1:43 PM UTC:

Minimal reprex

library(dplyr, warn.conflicts = FALSE)
lf <- dbplyr::lazy_frame(x = 1, y = 1, src = dbplyr::simulate_dbi())

lf %>%
  group_by(x) %>%
  arrange(y, .by_group = TRUE) %>%
  show_query()
#> <SQL> SELECT *
#> FROM "df"
#> ORDER BY "y", TRUE

Created on 2018-05-20 by the reprex package (v0.2.0).

@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Jan 2, 2019
@hadley
Copy link
Member

hadley commented Jan 2, 2019

@edgararuiz do you want to take this one? It should require adding an additional argument to arrange.tbl_lazy(), saving the boolean in the op_arrange object, and then tweaking op_sort.op_arrange()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL wip work in progress
Projects
None yet
Development

No branches or pull requests

2 participants