Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when using last() in a grouped mutate() #1063

Closed
moodymudskipper opened this issue Dec 5, 2022 · 1 comment · Fixed by #1064
Closed

Issue when using last() in a grouped mutate() #1063

moodymudskipper opened this issue Dec 5, 2022 · 1 comment · Fixed by #1064

Comments

@moodymudskipper
Copy link

From: https://stackoverflow.com/questions/74677042/how-to-use-last-when-mutating-by-group-with-dbplyr

Are these expected ? I would expect a grouped last() to always return a single value along the window, and I would expect the 2nd case to reliably give me the result I'd have with a local data frame.

library(dbplyr)
library(dplyr, w = F)
remote_data <- memdb_frame(
  grp = c(2, 2, 2, 1, 3, 1, 1),
  win = c("B", "C", "A", "B", "C", "A", "C"),
  id = c(1,3,5,7,2,4,6),
)

remote_data %>% 
  arrange(grp, win) %>% 
  group_by(grp) %>% 
  mutate(last_id = last(id)) %>% 
  ungroup() %>% 
  print() %>% 
  show_query()
#> # Source:     SQL [7 x 4]
#> # Database:   sqlite 3.39.4 [:memory:]
#> # Ordered by: grp, win
#>     grp win      id last_id
#>   <dbl> <chr> <dbl>   <dbl>
#> 1     1 A         4       4
#> 2     1 B         7       7
#> 3     1 C         6       6
#> 4     2 A         5       5
#> 5     2 B         1       1
#> 6     2 C         3       3
#> 7     3 C         2       2
#> <SQL>
#> SELECT
#>   *,
#>   LAST_VALUE(`id`) OVER (PARTITION BY `grp` ORDER BY `grp`, `win`) AS `last_id`
#> FROM `dbplyr_001`
#> ORDER BY `grp`, `win`

remote_data %>% 
  arrange(grp, win) %>% 
  group_by(grp) %>% 
  window_order(win) %>% 
  mutate(last_id = last(id)) %>% 
  ungroup() %>% 
  print() %>% 
  show_query()
#> # Source:     SQL [7 x 4]
#> # Database:   sqlite 3.39.4 [:memory:]
#> # Ordered by: win
#>     grp win      id last_id
#>   <dbl> <chr> <dbl>   <dbl>
#> 1     1 A         4       4
#> 2     1 B         7       7
#> 3     1 C         6       6
#> 4     2 A         5       5
#> 5     2 B         1       1
#> 6     2 C         3       3
#> 7     3 C         2       2
#> <SQL>
#> SELECT *, LAST_VALUE(`id`) OVER (PARTITION BY `grp` ORDER BY `win`) AS `last_id`
#> FROM `dbplyr_001`
#> ORDER BY `grp`, `win`

remote_data %>% 
  arrange(grp, win) %>% 
  group_by(grp) %>% 
  window_order(grp) %>% 
  mutate(last_id = last(id)) %>% 
  ungroup() %>% 
  print() %>% 
  show_query()
#> # Source:     SQL [7 x 4]
#> # Database:   sqlite 3.39.4 [:memory:]
#> # Ordered by: grp
#>     grp win      id last_id
#>   <dbl> <chr> <dbl>   <dbl>
#> 1     1 A         4       6
#> 2     1 B         7       6
#> 3     1 C         6       6
#> 4     2 A         5       5
#> 5     2 B         1       5
#> 6     2 C         3       5
#> 7     3 C         2       2
#> <SQL>
#> SELECT *, LAST_VALUE(`id`) OVER (PARTITION BY `grp` ORDER BY `grp`) AS `last_id`
#> FROM `dbplyr_001`
#> ORDER BY `grp`, `win`
@mgirlich
Copy link
Collaborator

mgirlich commented Dec 6, 2022

I just learned/realised that the default window frame depends on whether ORDER BY is omitted or not:

  • no ORDER BY: ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
  • with ORDER BY: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW

Therefore, the current translation works for first() but not for last().

Thanks for opening this issue 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants