Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow calls to some window function without frame/partition/order #1061

Closed
tomasgreif opened this issue Apr 7, 2015 · 5 comments
Closed

Allow calls to some window function without frame/partition/order #1061

tomasgreif opened this issue Apr 7, 2015 · 5 comments
Labels
Milestone

Comments

@tomasgreif
Copy link

@tomasgreif tomasgreif commented Apr 7, 2015

Currently, it is not possible to call window functions without one of frame, partition, or order. However, for many database systems (namely PostgreSQL, and Oracle), such calls are valid (and often useful), for example:

select avg(number) over() from table
select min(number) over() from table
select max(number) over() from table
select sum(number) over() from table
select count(number) over() from table

All types of win_recycled functions as defined in translate-sql-base.r can be called with no frame, partition, and order clause. Note that according to (SQL:2003)[http://en.wikipedia.org/wiki/Select_%28SQL%29#Window_function] both partition and order clause are optional.

@hadley hadley added this to the 0.5 milestone May 19, 2015
@krlmlr
Copy link
Member

@krlmlr krlmlr commented Nov 12, 2015

Which dplyr code is supposed to produce these SQL constructs?

@tomasgreif
Copy link
Author

@tomasgreif tomasgreif commented Nov 12, 2015

In PostgreSQL

create table dual (dummy character varying);
insert into dual values ('X');
select * from dual;

In R:

> my_db <- src_postgres(host = "10.81.130.1", user = "tgreif", password = "tgreif", port = 6632, db = 'new_clients')
> my_tbl <- tbl(my_db, "dual")
> mutate(my_tbl, min_dual = min(dummy)) %>% show_query()
<SQL>
SELECT "dummy", "min_dual"
FROM (SELECT "dummy", min("dummy") OVER (ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS "min_dual"
FROM "dual") AS "zzz10"
> my_tbl <- mutate(my_tbl, min_dual = min(dummy))
> collect(my_tbl)
Source: local data frame [1 x 2]

  dummy min_dual
  (chr)    (chr)
1     X        X

In PostgreSQL database, this works. However, in Oracle database it does not, claiming that "missing ORDER by expression in the window specification". Using select min(dummy) over() from dual; it works both in PostgreSQL and Oracle. The window specification ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is not necessary (and from my point of view it should not be allowed even in PostgreSQL - there is no inherent order of rows in db table and therefore you cannot tell what rows are preceding/following without explicit order by expression).

@hadley
Copy link
Member

@hadley hadley commented Mar 10, 2016

@tomasgreif this should be a straightforward fix at the over() level.

Do you think it's a reasonable principle to throw an error if frame is specified but order is not?

@hadley
Copy link
Member

@hadley hadley commented Mar 10, 2016

@tomasgreif do you think it's reasonable to error if the frame is specified, but the order is not? (I'll at least warn)

@hadley
Copy link
Member

@hadley hadley commented Mar 10, 2016

Currently this works without a warning:

tbl %>%
  group_by(g) %>%
  filter(cumsum(x) > 3)

@hadley hadley closed this in 19ae0ce Mar 10, 2016
@lock lock bot locked as resolved and limited conversation to collaborators Jun 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants