Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

across support seems very limited #534

Closed
mgirlich opened this issue Nov 9, 2020 · 8 comments · Fixed by #578
Closed

across support seems very limited #534

mgirlich opened this issue Nov 9, 2020 · 8 comments · Fixed by #578
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL

Comments

@mgirlich
Copy link
Collaborator

mgirlich commented Nov 9, 2020

the translation of across() seems to be currently quite limited and buggy. Basically, it only seems to work with a single function.

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

lf <- lazy_frame(g = 1, a = 1, b = 2, c = 3)

# works fine
mutate(lf, across(everything(), mean))
#> <SQL>
#> SELECT AVG(`g`) OVER () AS `g`, AVG(`a`) OVER () AS `a`, AVG(`b`) OVER () AS `b`, AVG(`c`) OVER () AS `c`
#> FROM `df`

# works fine
mutate(lf, across(everything(), list(mean = mean)))
#> <SQL>
#> SELECT AVG(`g`) OVER () AS `g`, AVG(`a`) OVER () AS `a`, AVG(`b`) OVER () AS `b`, AVG(`c`) OVER () AS `c`
#> FROM `df`

# self defined functions do not error but also don't get translated
# this is a bit dangerous as one might not notice this directly
mutate(lf, across(everything(), function(x) mean(x)))
#> <SQL>
#> SELECT `g`, `a`, `b`, `c`
#> FROM `df`

# formulas error
mutate(lf, across(everything(), ~ mean(.x)))
#> Error: Unsupported `.fns` for dbplyr::across()

# multiple functions result in an error
mutate(lf, across(everything(), list(mean = mean, sd = sd)))
#> Error: Result 2 must be a single string, not NULL of length 0

Created on 2020-11-09 by the reprex package (v0.3.0)

Session info
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value                       
#>  version  R version 4.0.2 (2020-06-22)
#>  os       macOS Catalina 10.15.7      
#>  system   x86_64, darwin17.0          
#>  ui       X11                         
#>  language (EN)                        
#>  collate  en_US.UTF-8                 
#>  ctype    en_US.UTF-8                 
#>  tz       UTC                         
#>  date     2020-11-09                  
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date       lib source        
#>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.0.0)
#>  backports     1.2.0   2020-11-02 [1] CRAN (R 4.0.2)
#>  blob          1.2.1   2020-01-20 [1] CRAN (R 4.0.0)
#>  callr         3.5.1   2020-10-13 [1] CRAN (R 4.0.2)
#>  cli           2.1.0   2020-10-12 [1] CRAN (R 4.0.2)
#>  crayon        1.3.4   2017-09-16 [1] CRAN (R 4.0.0)
#>  DBI           1.1.0   2019-12-15 [1] CRAN (R 4.0.0)
#>  dbplyr      * 2.0.0   2020-11-03 [1] CRAN (R 4.0.2)
#>  desc          1.2.0   2018-05-01 [1] CRAN (R 4.0.0)
#>  devtools      2.3.2   2020-09-18 [1] CRAN (R 4.0.2)
#>  digest        0.6.27  2020-10-24 [1] CRAN (R 4.0.2)
#>  dplyr       * 1.0.2   2020-08-18 [1] CRAN (R 4.0.1)
#>  ellipsis      0.3.1   2020-05-15 [1] CRAN (R 4.0.0)
#>  evaluate      0.14    2019-05-28 [1] CRAN (R 4.0.0)
#>  fansi         0.4.1   2020-01-08 [1] CRAN (R 4.0.0)
#>  fs            1.5.0   2020-07-31 [1] CRAN (R 4.0.2)
#>  generics      0.1.0   2020-10-31 [1] CRAN (R 4.0.2)
#>  glue          1.4.2   2020-08-27 [1] CRAN (R 4.0.2)
#>  highr         0.8     2019-03-20 [1] CRAN (R 4.0.0)
#>  htmltools     0.5.0   2020-06-16 [1] CRAN (R 4.0.1)
#>  knitr         1.30    2020-09-22 [1] CRAN (R 4.0.2)
#>  lifecycle     0.2.0   2020-03-06 [1] CRAN (R 4.0.0)
#>  magrittr      1.5     2014-11-22 [1] CRAN (R 4.0.0)
#>  memoise       1.1.0   2017-04-21 [1] CRAN (R 4.0.0)
#>  pillar        1.4.6   2020-07-10 [1] CRAN (R 4.0.1)
#>  pkgbuild      1.1.0   2020-07-13 [1] CRAN (R 4.0.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.0.0)
#>  pkgload       1.1.0   2020-05-29 [1] CRAN (R 4.0.0)
#>  prettyunits   1.1.1   2020-01-24 [1] CRAN (R 4.0.0)
#>  processx      3.4.4   2020-09-03 [1] CRAN (R 4.0.2)
#>  ps            1.4.0   2020-10-07 [1] CRAN (R 4.0.2)
#>  purrr         0.3.4   2020-04-17 [1] CRAN (R 4.0.1)
#>  R6            2.5.0   2020-10-28 [1] CRAN (R 4.0.2)
#>  remotes       2.2.0   2020-07-21 [1] CRAN (R 4.0.1)
#>  rlang         0.4.8   2020-10-08 [1] CRAN (R 4.0.2)
#>  rmarkdown     2.5     2020-10-21 [1] CRAN (R 4.0.2)
#>  rprojroot     1.3-2   2018-01-03 [1] CRAN (R 4.0.0)
#>  sessioninfo   1.1.1   2018-11-05 [1] CRAN (R 4.0.0)
#>  stringi       1.5.3   2020-09-09 [1] CRAN (R 4.0.2)
#>  stringr       1.4.0   2019-02-10 [1] CRAN (R 4.0.0)
#>  testthat      3.0.0   2020-10-31 [1] CRAN (R 4.0.2)
#>  tibble        3.0.4   2020-10-12 [1] CRAN (R 4.0.2)
#>  tidyselect    1.1.0   2020-05-11 [1] CRAN (R 4.0.0)
#>  usethis       1.6.3   2020-09-17 [1] CRAN (R 4.0.2)
#>  vctrs         0.3.4   2020-08-29 [1] CRAN (R 4.0.2)
#>  withr         2.3.0   2020-09-22 [1] CRAN (R 4.0.2)
#>  xfun          0.19    2020-10-30 [1] CRAN (R 4.0.2)
#>  yaml          2.2.1   2020-02-01 [1] CRAN (R 4.0.0)
#> 
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library
@hadley

This comment has been minimized.

@mgirlich
Copy link
Collaborator Author

Makes sense to me but then an explicit error message would be great. But I think there should be a way to translate "one-liners" like in filter_at():

tbl_lazy(mtcars) %>% 
  dplyr::filter_at(
    dplyr::vars(mpg, cyl),
    dplyr::all_vars(!is.na(.))
  )

I do not see a way to replace the filter_at() with filter() + across() at the moment. It would be great if this were possible.

@hadley
Copy link
Member

hadley commented Nov 11, 2020

That will (probably) become something like:

mtcars %>%
  filter(if_all(c(mpg, cy), ~ !is.na(.x))

@soupofday

This comment has been minimized.

@hadley

This comment has been minimized.

@alex-m-ffm
Copy link

I think @mgirlich is not completely correct with the problem of multiple standard functions in a list. mean, max and min work for me in a list. It appears more to be an issue of specific functions, among them sd.

library(dplyr)
library(dbplyr)

con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")

copy_to(con, nycflights13::flights, "flights",
        temporary = FALSE, 
        indexes = list(
          c("year", "month", "day"), 
          "carrier", 
          "tailnum",
          "dest"
        )
)

flights_db <- tbl(con, "flights")

#this works fine
flights_db %>% group_by(carrier, dest) %>% 
  summarise(across(c(dep_delay, arr_delay, air_time, distance),
                   list(mean, max, min), na.rm = T)
            )

# this does not throw an error but also doesn`t evaluate`
flights_db %>% group_by(carrier, dest) %>% 
  summarise(across(c(dep_delay, arr_delay, air_time, distance),
                   sd)
  )

# outside a list sd works fine for each of the variables...
flights_db %>% group_by(carrier, dest) %>% 
  summarise(sd = sd(distance, na.rm = T))

# with sd included in the list I get an error
flights_db %>% group_by(carrier, dest) %>% 
  summarise(across(c(dep_delay, arr_delay, air_time, distance),
                   list(mean, max, min, sd), na.rm = T)
            )
#> Error: Result 4 must be a single string, not NULL of length 0

Would be great if that could be fixed (besides the formula support)!

@hadley
Copy link
Member

hadley commented Jan 21, 2021

Minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

db <- memdb_frame(x = 1:10, y = 2:11)
# Need to add support for formulas:
db %>% summarise(across(everything(), ~ mean(.x)))
#> Error: Unsupported `.fns` for dbplyr::across()

# Weird failure mode for sd
db %>% summarise(across(everything(), sd))
#> Error: Query contains no columns
db %>% summarise(across(everything(), list(sd)))
#> Error: Result 1 must be a single string, not NULL of length 0

Created on 2021-01-21 by the reprex package (v0.3.0.9001)

@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Jan 21, 2021
@hadley
Copy link
Member

hadley commented Jan 21, 2021

I'll leave #525 for the lack of formula support so this issue can focus on the problem with sd, which arises because I forgot to check the return value of find_fun() which can sometimes be NULL.

hadley added a commit that referenced this issue Jan 21, 2021
Now handles `across()` using NSE approach that's more in keeping with the keeping with the rest of dbplyr's translation, so that functions without native translation are passed along. It also gains support for `.fns = NULL` and for translating functions.

Fixes #525. Fixes #534. Fixes #554.
hadley added a commit that referenced this issue Jan 21, 2021
Now handles across() using NSE approach that's more in keeping with the keeping with the rest of dbplyr's translation, so that functions without native translation are passed along. It also gains support for .fns = NULL and for translating functions.

Fixes #525. Fixes #534. Fixes #554.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants