Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sql_vector or add possibility to treat over() differently #1060

Closed
tomasgreif opened this issue Apr 7, 2015 · 2 comments
Closed

Extend sql_vector or add possibility to treat over() differently #1060

tomasgreif opened this issue Apr 7, 2015 · 2 comments
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@tomasgreif
Copy link

Some SQL engines (namely Oracle) do not support currently generated SQL for window functions where more partitions or order by expressions are used, for example:

win_test <- data.frame(A = c('a', 'a', 'b', 'b', 'c', 'c'),
                       B = c('d', 'd', 'd', 'e', 'e', 'e'),
                       C = c(1:6)
                       )
copy_to(my_db, win_test, name = 'WIN_TEST')
win_test_ora <- tbl(my_db, from = 'WIN_TEST')

x <- mutate(group_by(win_test_ora, A, B), 
            leadf  = order_by(c(A, C), lead(A, 1, 'x'))

)

This generates SQL:

<SQL>
SELECT "A", "B", "C", "leadf"
FROM (SELECT "A", "B", "C", LEAD("A", 1.0, 'x') OVER (PARTITION BY ("A", "B")  ORDER BY ("A", "C") AS "leadf"
FROM "WIN_TEST") "_W17"

The only issue is that parentheses are used to enclose each set of expressions. This performed by sql_vector function.

@mwillumz
Copy link

mwillumz commented Apr 9, 2015

Greenplum acts the same way. My current approach is to fork dplyr on github and modify the sql_vector() to parens=FALSE for the partition and order vectors. As you mention this seems to be the only issue. Not ideal but I've not been able to find a better temporary fix. Also, Greenplum is built off of Postgres so I'm a bit surprised that the syntax is different. Would postgres work without the parentheses?

mtcars2 <- copy_to(src_postgres(), mtcars)

Test <- group_by(mtcars2,cyl,gear) %>%
  arrange(mpg) %>%
  mutate(low=first(mpg))

Test$query

Test

But if I add parens=FALSE via fixInNamespace(over,'dplyr') the following runs as expected.

mtcars2 <- copy_to(src_postgres(), mtcars)

Test <- group_by(mtcars2,cyl,gear) %>%
  arrange(mpg) %>%
  mutate(low=first(mpg))

Test$query

@hadley hadley added this to the 0.5 milestone May 19, 2015
@piccolbo
Copy link

Add HiveQL and SparkSQL to the list of engines who can not handle the extra parentheses.

@hadley hadley added bug an unexpected problem or unintended behavior SQL labels Oct 22, 2015
@hadley hadley closed this as completed in fe0d396 Mar 10, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants