Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

last() and nth() don't work correctly for SQLite and perhaps other databases #366

Closed
krlmlr opened this issue Oct 8, 2019 · 3 comments · Fixed by #372
Closed

last() and nth() don't work correctly for SQLite and perhaps other databases #366

krlmlr opened this issue Oct 8, 2019 · 3 comments · Fixed by #372
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL

Comments

@krlmlr
Copy link
Member

krlmlr commented Oct 8, 2019

Window frame window_frame() is currently ignored for last() and nth():

library(tidyverse)
library(dbplyr)
tbl <- memdb_frame(a = 4:1, g = rep(1:2, 2))

print.tbl_dbi <- function(x, ...) {
  message(sql_render(x))
  NextMethod()
}

# first() seems to work.
tbl %>%
  arrange(a) %>%
  group_by(g) %>%
  mutate(l = first(a))
#> SELECT `a`, `g`, FIRST_VALUE(`a`) OVER (PARTITION BY `g` ORDER BY `a`) AS `l`
#> FROM (SELECT *
#> FROM `dbplyr_001`
#> ORDER BY `a`)
#> # Source:     lazy query [?? x 3]
#> # Database:   sqlite 3.29.0 [:memory:]
#> # Groups:     g
#> # Ordered by: a
#>       a     g     l
#>   <int> <int> <int>
#> 1     2     1     2
#> 2     4     1     2
#> 3     1     2     1
#> 4     3     2     1

# last() doesn't:
tbl %>%
  group_by(g) %>%
  arrange(a) %>%
  mutate(l = last(a))
#> SELECT `a`, `g`, LAST_VALUE(`a`) OVER (PARTITION BY `g` ORDER BY `a`) AS `l`
#> FROM (SELECT *
#> FROM `dbplyr_001`
#> ORDER BY `a`)
#> # Source:     lazy query [?? x 3]
#> # Database:   sqlite 3.29.0 [:memory:]
#> # Groups:     g
#> # Ordered by: a
#>       a     g     l
#>   <int> <int> <int>
#> 1     2     1     2
#> 2     4     1     4
#> 3     1     2     1
#> 4     3     2     3

# We need "ROWS BETWEEN CURRENT AND UNBOUNDED FOLLOWING":
tbl %>%
  mutate(l = sql(!!win_over(sql("LAST_VALUE(a)"), "g", "a", c(0, Inf), con = tbl$src$con)))
#> SELECT `a`, `g`, LAST_VALUE(a) OVER (PARTITION BY `g` ORDER BY `a` ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS `l`
#> FROM `dbplyr_001`
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.29.0 [:memory:]
#>       a     g     l
#>   <int> <int> <int>
#> 1     2     1     4
#> 2     4     1     4
#> 3     1     2     3
#> 4     3     2     3

Created on 2019-10-08 by the reprex package (v0.3.0)

@hannesmuehleisen: Do you know if the default range specification for window functions is standardized across databases? SQLite has:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS

which explains why the example fails, but does it necessarily fail for other databases? Should we always pass an unbounded range to mimic dplyr semantics?

@hannes
Copy link
Contributor

hannes commented Oct 9, 2019

@krlmlr Don't know about the standard, but at least SQLite, Postgres, SQL Server and DuckDB default to RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

@krlmlr
Copy link
Member Author

krlmlr commented Oct 9, 2019

Thanks. So we really need RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING for last(), and RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for nth() ?

@krlmlr
Copy link
Member Author

krlmlr commented Oct 9, 2019

For last() it seems we can use first() and desc():

library(tidyverse)
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

tbl <- memdb_frame(a = 4:1, g = rep(1:2, 2))

print.tbl_dbi <- function(x, ...) {
  message(sql_render(x))
  NextMethod()
}

tbl %>%
  group_by(g) %>%
  mutate(l = first(a, order_by = desc(a)))
#> SELECT `a`, `g`, FIRST_VALUE(`a`) OVER (PARTITION BY `g` ORDER BY `a` DESC) AS `l`
#> FROM `dbplyr_001`
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.29.0 [:memory:]
#> # Groups:   g
#>       a     g     l
#>   <int> <int> <int>
#> 1     4     1     4
#> 2     2     1     4
#> 3     3     2     3
#> 4     1     2     3

Created on 2019-10-09 by the reprex package (v0.3.0)

@hadley hadley added feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL labels Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement func trans 🌍 Translation of individual functions to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants