Hi, this is my first issue so hopefully I have provided enough of a reprex to work with - please let me know if I can provide additional information.
Brief description of the problem
Rendering a query for Oracle from dplyr, I rename columns in a SELECT statement before doing a GROUP BY. The SELECT is tricky because I rename my grouping column to take the name of one of the non-grouping columns (which is also renamed). This appears to confuse dbplyr: it groups on what should be the non-grouping column and drops my intended grouping column.
Expected output
I would expect the SQL for dbplyr to handle the renaming before the grouping, and thus to group by (and retain when summarizing) the just-renamed column. This is the behavior observed in the non-remote dplyr example on mtcars below.
Please note that as far as I can tell, the query ran as expected with version 2.1.1 of dbplyr.
library(tidyverse)
library(odbc)
library(DBI)
library(dbplyr)
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
library(dplyr)
packageVersion("tidyverse")
#> [1] '1.3.1'
packageVersion("odbc")
#> [1] '1.3.2'
packageVersion("DBI")
#> [1] '1.1.1'
packageVersion("dbplyr")
#> [1] '2.2.1'
packageVersion("dplyr")
#> [1] '1.0.9'
# Query run on Oracle DB
con <- DBI::dbConnect(odbc::odbc(),
Driver = "Oracle in OraClient64_home1",
DBQ = *redacted*,
UID = *redacted*,
PWD = *redacted*)
licsfull <- tbl(con, in_schema("GLBL_MAIN", "G_MATL_LIC")) %>%
select(MATL_ID = REF_MATL_ID, JOIN_ID = MATL_ID, VALID_TO_DT) %>%
group_by(MATL_ID) %>%
summarise(JOIN_ID = max(JOIN_ID, na.rm = TRUE),
VALID_TO_DT = max(VALID_TO_DT, na.rm = TRUE))
# The generated SQL drops the desired grouping column and joins on a different column
licsfull %>% show_query()
#> <SQL>
#> SELECT MAX("JOIN_ID") AS "JOIN_ID", MAX("VALID_TO_DT") AS "VALID_TO_DT"
#> FROM (
#> SELECT "REF_MATL_ID" AS "MATL_ID", "MATL_ID" AS "JOIN_ID", "VALID_TO_DT"
#> FROM ("GLBL_MAIN"."G_MATL_LIC")
#> ) "q01"
#> GROUP BY "JOIN_ID"
# Note that this is different from the behavior in vanilla dplyr on a local tibble:
mtcars %>%
select(vs = cyl, new_vs = vs, gear) %>%
group_by(vs) %>%
summarise(new_vs = max(new_vs),
gear = max(gear))
#> # A tibble: 3 x 3
#> vs new_vs gear
#> <dbl> <dbl> <dbl>
#> 1 4 1 5
#> 2 6 1 5
#> 3 8 0 5
Created on 2022-06-30 by the reprex package (v2.0.1)
Hi, this is my first issue so hopefully I have provided enough of a reprex to work with - please let me know if I can provide additional information.
Brief description of the problem
Rendering a query for Oracle from
dplyr, I rename columns in a SELECT statement before doing a GROUP BY. The SELECT is tricky because I rename my grouping column to take the name of one of the non-grouping columns (which is also renamed). This appears to confusedbplyr: it groups on what should be the non-grouping column and drops my intended grouping column.Expected output
I would expect the SQL for
dbplyrto handle the renaming before the grouping, and thus to group by (and retain when summarizing) the just-renamed column. This is the behavior observed in the non-remotedplyrexample onmtcarsbelow.Please note that as far as I can tell, the query ran as expected with version 2.1.1 of
dbplyr.Created on 2022-06-30 by the reprex package (v2.0.1)