Closed
Description
Hello--it appears that across in dbplyr returns different results from dplyr when both these happen...
- a formula contains another variable from the data (e.g.
~ .x / mpg
) - an early operation in across overrides the variable's name (e.g. result gets named
mpg
)
When this happens, operations on other selected variables use the overridden column, rather than the original (see this dplyr test, which I think would fail for dbplyr).
This is shown in the reprex below. It looks like the issue is a subquery is created during the across..
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# dplyr ----
cars_small <- mtcars %>%
select(mpg, cyl, gear) %>%
head(2)
# all columns divided by original mpg
cars_small %>%
mutate(across(c(mpg, cyl, gear), ~ .x / mpg))
#> mpg cyl gear
#> Mazda RX4 1 0.2857143 0.1904762
#> Mazda RX4 Wag 1 0.2857143 0.1904762
# dbplyr ----
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
copy_to(con, cars_small)
# mpg re-assigned in first operation, then used in later ones
q <- tbl(con, "cars_small") %>%
mutate(across(everything(), ~ .x / mpg))
q
#> # Source: SQL [2 x 3]
#> # Database: sqlite 3.38.5 [:memory:]
#> mpg cyl gear
#> <dbl> <dbl> <dbl>
#> 1 1 6 4
#> 2 1 6 4
show_query(q)
#> <SQL>
#> SELECT `mpg`, `cyl` / `mpg` AS `cyl`, `gear` / `mpg` AS `gear`
#> FROM (
#> SELECT `mpg` / `mpg` AS `mpg`, `cyl`, `gear`
#> FROM `cars_small`
#> )
Created on 2022-09-28 by the reprex package (v2.0.1)