Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request for time series function (lead, lag, rolling) #604

Closed
yiugn opened this issue Feb 21, 2021 · 7 comments
Closed

Request for time series function (lead, lag, rolling) #604

yiugn opened this issue Feb 21, 2021 · 7 comments

Comments

@yiugn
Copy link

yiugn commented Feb 21, 2021

As I tested dbplyr, dbplyr doesn't provided functions required for time series analysis, such as leading, lagging, and rolling (rolling mean, rolling std..etc).

Please support these functions in dbplyr

@hadley
Copy link
Member

hadley commented Apr 1, 2021

It does support these functions. If you're having problems using them, I'd recommend starting with a reprex.

@hadley hadley closed this as completed Apr 1, 2021
@richy1996
Copy link

@yiugn did you ever figure out how to do a rolling function in dbplyr?

If not, @hadley I'm trying to accomplish the same thing myself, looking at this StackOverflow post Rolling mean, standard deviation in dbplyr, is that the intended way to use rolling functions in dbplyr?

A sample SQL query line I'm trying to replicate in dbplyr is:

AVG(close) OVER (PARTITION BY close, ORDER BY date ASC ROWS 19 PRECEDING) as sma20

An equivalent w/ tidyquant is:

tq_mutate(select = close,
           mutate_fun = runMean,
           n = 20,
           col_rename = 'sma20')

Greatly appreciate your help on this.

@mgirlich
Copy link
Collaborator

Have a look at the article on function translation. You need group_by(), window_order(), and window_frame() for your translation.

@richy1996
Copy link

Care to expand on that at all? sorry I'm not gleaning much from that article

@mgirlich
Copy link
Collaborator

Here are some examples to get you started

library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)

lazy_frame(group = 1, date = 1, x = 1) %>% 
  mutate(sma20 = mean(x, na.rm = TRUE))
#> <SQL>
#> SELECT `group`, `date`, `x`, AVG(`x`) OVER () AS `sma20`
#> FROM `df`

lazy_frame(group = 1, date = 1, x = 1) %>% 
  group_by(group) %>% 
  mutate(sma20 = mean(x, na.rm = TRUE))
#> <SQL>
#> SELECT `group`, `date`, `x`, AVG(`x`) OVER (PARTITION BY `group`) AS `sma20`
#> FROM `df`

lazy_frame(group = 1, date = 1, x = 1) %>% 
  window_frame(-19) %>% 
  mutate(sma20 = mean(x, na.rm = TRUE))
#> Warning: Windowed expression `AVG(`x`)` does not have explicit order.
#> ℹ Please use `arrange()` or `window_order()` to make deterministic.
#> <SQL>
#> SELECT
#>   `group`,
#>   `date`,
#>   `x`,
#>   AVG(`x`) OVER (ROWS BETWEEN 19 PRECEDING AND UNBOUNDED FOLLOWING) AS `sma20`
#> FROM `df`

Created on 2022-05-12 by the reprex package (v2.0.1)

Does that help you?

@richy1996
Copy link

richy1996 commented May 12, 2022

Absolutely, thanks a lot!

@vektor8891
Copy link

vektor8891 commented Jul 22, 2022

@mgirlich The function translation approach doesn't work for sd(). Here is a reprex:

# setup
library(dplyr, warn.conflicts = FALSE)

con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con, mtcars)
mtcars2 <- tbl(con, "mtcars") %>% 
  dbplyr::window_order(mpg) %>% 
  dplyr::group_by(cyl) %>% 
  dbplyr::window_frame(from = -1, to = 0)

# rolling mean works
dplyr::mutate(mtcars2, RollingMeanMpg = mean(mpg, na.rm = TRUE))

# rolling sd doesn't work
dplyr::mutate(mtcars2, RollingSDevMpg = sd(mpg, na.rm = TRUE))

The error message is: Error: STDEV() may not be used as a window function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants