Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend sql_vector or add possibility to treat over() differently #1060

Closed
tomasgreif opened this issue Apr 7, 2015 · 2 comments
Closed

Extend sql_vector or add possibility to treat over() differently #1060

tomasgreif opened this issue Apr 7, 2015 · 2 comments
Labels
Milestone

Comments

@tomasgreif
Copy link

@tomasgreif tomasgreif commented Apr 7, 2015

Some SQL engines (namely Oracle) do not support currently generated SQL for window functions where more partitions or order by expressions are used, for example:

win_test <- data.frame(A = c('a', 'a', 'b', 'b', 'c', 'c'),
                       B = c('d', 'd', 'd', 'e', 'e', 'e'),
                       C = c(1:6)
                       )
copy_to(my_db, win_test, name = 'WIN_TEST')
win_test_ora <- tbl(my_db, from = 'WIN_TEST')

x <- mutate(group_by(win_test_ora, A, B), 
            leadf  = order_by(c(A, C), lead(A, 1, 'x'))

)

This generates SQL:

<SQL>
SELECT "A", "B", "C", "leadf"
FROM (SELECT "A", "B", "C", LEAD("A", 1.0, 'x') OVER (PARTITION BY ("A", "B")  ORDER BY ("A", "C") AS "leadf"
FROM "WIN_TEST") "_W17"

The only issue is that parentheses are used to enclose each set of expressions. This performed by sql_vector function.

@mwillumz
Copy link

@mwillumz mwillumz commented Apr 9, 2015

Greenplum acts the same way. My current approach is to fork dplyr on github and modify the sql_vector() to parens=FALSE for the partition and order vectors. As you mention this seems to be the only issue. Not ideal but I've not been able to find a better temporary fix. Also, Greenplum is built off of Postgres so I'm a bit surprised that the syntax is different. Would postgres work without the parentheses?

mtcars2 <- copy_to(src_postgres(), mtcars)

Test <- group_by(mtcars2,cyl,gear) %>%
  arrange(mpg) %>%
  mutate(low=first(mpg))

Test$query

Test

But if I add parens=FALSE via fixInNamespace(over,'dplyr') the following runs as expected.

mtcars2 <- copy_to(src_postgres(), mtcars)

Test <- group_by(mtcars2,cyl,gear) %>%
  arrange(mpg) %>%
  mutate(low=first(mpg))

Test$query

@hadley hadley added this to the 0.5 milestone May 19, 2015
@piccolbo
Copy link

@piccolbo piccolbo commented Aug 12, 2015

Add HiveQL and SparkSQL to the list of engines who can not handle the extra parentheses.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants