Skip to content

last() and nth() don't work correctly for SQLite and perhaps other databases #366

@krlmlr

Description

@krlmlr

Window frame window_frame() is currently ignored for last() and nth():

library(tidyverse)
library(dbplyr)
tbl <- memdb_frame(a = 4:1, g = rep(1:2, 2))

print.tbl_dbi <- function(x, ...) {
  message(sql_render(x))
  NextMethod()
}

# first() seems to work.
tbl %>%
  arrange(a) %>%
  group_by(g) %>%
  mutate(l = first(a))
#> SELECT `a`, `g`, FIRST_VALUE(`a`) OVER (PARTITION BY `g` ORDER BY `a`) AS `l`
#> FROM (SELECT *
#> FROM `dbplyr_001`
#> ORDER BY `a`)
#> # Source:     lazy query [?? x 3]
#> # Database:   sqlite 3.29.0 [:memory:]
#> # Groups:     g
#> # Ordered by: a
#>       a     g     l
#>   <int> <int> <int>
#> 1     2     1     2
#> 2     4     1     2
#> 3     1     2     1
#> 4     3     2     1

# last() doesn't:
tbl %>%
  group_by(g) %>%
  arrange(a) %>%
  mutate(l = last(a))
#> SELECT `a`, `g`, LAST_VALUE(`a`) OVER (PARTITION BY `g` ORDER BY `a`) AS `l`
#> FROM (SELECT *
#> FROM `dbplyr_001`
#> ORDER BY `a`)
#> # Source:     lazy query [?? x 3]
#> # Database:   sqlite 3.29.0 [:memory:]
#> # Groups:     g
#> # Ordered by: a
#>       a     g     l
#>   <int> <int> <int>
#> 1     2     1     2
#> 2     4     1     4
#> 3     1     2     1
#> 4     3     2     3

# We need "ROWS BETWEEN CURRENT AND UNBOUNDED FOLLOWING":
tbl %>%
  mutate(l = sql(!!win_over(sql("LAST_VALUE(a)"), "g", "a", c(0, Inf), con = tbl$src$con)))
#> SELECT `a`, `g`, LAST_VALUE(a) OVER (PARTITION BY `g` ORDER BY `a` ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS `l`
#> FROM `dbplyr_001`
#> # Source:   lazy query [?? x 3]
#> # Database: sqlite 3.29.0 [:memory:]
#>       a     g     l
#>   <int> <int> <int>
#> 1     2     1     4
#> 2     4     1     4
#> 3     1     2     3
#> 4     3     2     3

Created on 2019-10-08 by the reprex package (v0.3.0)

@hannesmuehleisen: Do you know if the default range specification for window functions is standardized across databases? SQLite has:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW EXCLUDE NO OTHERS

which explains why the example fails, but does it necessarily fail for other databases? Should we always pass an unbounded range to mimic dplyr semantics?

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementfunc trans 🌍Translation of individual functions to SQL

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions