-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Named window expressions #631
Conversation
* make two translation passes in `sql_build.op_select()` * register windows in `win_over()` * add `window` to `select_query()` * suport `window` in `print.select_query()` and `sql_render.select_query()` * add `sql_clause_window()` * add argument `window` to `sql_query_select.*()`
Could you give me a couple of examples of before and after? I want to think about this a bit. |
ExamplesI formatted the examples a bit for better readability and comparison. Window used only oncelf <- lazy_frame(
col1 = runif(3),
col2 = runif(3),
col3 = runif(3),
col4 = runif(3),
part = c("a", "a", "b"),
ord = 3:1
) %>%
group_by(part) %>%
window_order(ord)
lf %>%
transmute(
across(c(col1), sum, na.rm = TRUE),
across(c(col3), ~ order_by(desc(ord), cumsum(.x)))
) Before = After SELECT
SUM(`col1`) OVER (PARTITION BY `part`) AS `col1`,
SUM(`col3`) OVER (PARTITION BY `part` ORDER BY `ord` DESC ROWS UNBOUNDED PRECEDING) AS `col3`,
FROM `df` Two windowslf %>%
transmute(
across(c(col1, col2), sum, na.rm = TRUE),
across(c(col3, col4), ~ order_by(desc(ord), cumsum(.x)))
) Before SELECT
SUM(`col1`) OVER (PARTITION BY `part`) AS `col1`,
SUM(`col2`) OVER (PARTITION BY `part`) AS `col2`,
SUM(`col3`) OVER (PARTITION BY `part` ORDER BY `ord` DESC ROWS UNBOUNDED PRECEDING) AS `col3`,
SUM(`col4`) OVER (PARTITION BY `part` ORDER BY `ord` DESC ROWS UNBOUNDED PRECEDING) AS `col4`
FROM `df` After SELECT SUM(`col1`) OVER `win1` AS `col1`,
SUM(`col2`) OVER `win1` AS `col2`,
SUM(`col3`) OVER `win2` AS `col3`,
SUM(`col4`) OVER `win2` AS `col4`
FROM `df`
WINDOW `win1` AS (PARTITION BY `part`),
`win2` AS (PARTITION BY `part` ORDER BY `ord` DESC ROWS UNBOUNDED PRECEDING) mode.com examplelazy_frame(
start_terminal = "Munich",
duration_seconds = 3913L,
start_time = "2012-01-08 11:36:09"
) %>%
filter(start_time < "2012-01-08") %>%
group_by(start_terminal) %>%
# Bug: `window_order()` does not work for `ntile()`
# window_order(duration_seconds) %>%
transmute(
start_terminal,
quartile = ntile(duration_seconds, n = 4),
quintile = ntile(duration_seconds, n = 5),
percentile = ntile(duration_seconds, n = 100)
) Before SELECT `start_terminal`,
NTILE(4) OVER (PARTITION BY `start_terminal` ORDER BY `duration_seconds`) AS `quartile`,
NTILE(5) OVER (PARTITION BY `start_terminal` ORDER BY `duration_seconds`) AS `quintile`,
NTILE(100) OVER (PARTITION BY `start_terminal` ORDER BY `duration_seconds`) AS `percentile`
FROM `df`
WHERE (`start_time` < '2012-01-08') After SELECT `start_terminal`,
NTILE(4) OVER `win1` AS `quartile`,
NTILE(5) OVER `win1` AS `quintile`,
NTILE(100) OVER `win1` AS `percentile`
FROM `df`
WHERE (`start_time` < '2012-01-08')
WINDOW `win1` AS (PARTITION BY `start_terminal` ORDER BY `duration_seconds`) |
It would be nice if the user could name the window but that is probably difficult to understand/use (e.g. in example 2 with WINDOW `win_part_1` AS (PARTITION BY `part`),
`win_part_order_1` AS (PARTITION BY `part` ORDER BY `ord` DESC ROWS UNBOUNDED PRECEDING) or WINDOW `win_agg_1` AS (PARTITION BY `part`),
`win_over_1` AS (PARTITION BY `part` ORDER BY `ord` DESC ROWS UNBOUNDED PRECEDING) so that one can more easily distinguish between |
Were you thinking including the variable names or the component names? (It's hard to tell from your example). I'd have a mild preference for including variable names, but that does raise the question of what do for long variable names or compound expressions. I think I'd be more tempted just to stick with indices than something that requires a bunch of work (since most people won't read the generated SQL, I don't think we should invest too much effort here). |
Now I realize that my original idea doesn't work anyway, so I would also just stick with simple numbering. |
Do you want to try and get this into the release too? |
Conflicts: R/backend-access.R R/backend-mssql.R R/backend-oracle.R R/backend-teradata.R R/db-sql.R R/sql-clause.R R/verb-select.R
Would be nice to also have this in the release but I don't mind if it doesn't make it into the release. |
R/lazy-select-query.R
Outdated
win_reset() | ||
win_register_deactivate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should be in an on.exit()
/defer()
for safety
R/lazy-select-query.R
Outdated
named_windows <- win_register_names() | ||
if (nrow(named_windows) > 0 && supports_window_clause(con)) { | ||
# need to translate again and use registered windows names | ||
win_register_deactivate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this come immediately after translate_select_sql()
?
Fixes #624.
After I also wished for named windows a couple of times I had a first go at it. The idea is relatively simple:
In
sql_build.op_select()
have (possibly) two runs of translating the variables. In the first runwin_over()
registers every window it uses. After this first translation create a name for every window that is used more than once. Then translate the variables again but this time use the window name if one is available.Example SQL
Created on 2021-04-06 by the reprex package (v2.0.0)
Do you like the overall approach?
check window alias support for
[X] oracle: documentationThis was pointing to the wrong place.With a method
supports_window_clause.<backend>()
it would be quite simple to only use named windows if they are supported.