-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
across (via across_funs) errors if .fns is a list and uses standard evaluation #662
Comments
You were nearly there: library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
db <- memdb_frame(iris)
# Note: a list of characters (currently) produces ugly names
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list("mean", "sd")))
#> Warning: Missing values are always removed in SQL.
#> Use `mean(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> `Sepal.Length_"mean"` `Sepal.Length_"sd"`
#> <dbl> <dbl>
#> 1 5.84 0.828
# and they are different for local data frames
iris %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list("mean", "sd")))
#> Sepal.Length_1 Sepal.Length_2
#> 1 5.843333 0.8280661
# you need to explicitly use `list()`
st <- list("mean", "sd")
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list(!!!st)))
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> `Sepal.Length_"mean"` `Sepal.Length_"sd"`
#> <dbl> <dbl>
#> 1 5.84 0.828
# nicer names when using symbols instead of character
st2 <- lapply(st, sym)
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list(!!!st2)))
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> Sepal.Length_mean Sepal.Length_sd
#> <dbl> <dbl>
#> 1 5.84 0.828
# or simply use a character vector
st_char <- unlist(st)
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = !!st_char))
#> Warning: Missing values are always removed in SQL.
#> Use `sd(x, na.rm = TRUE)` to silence this warning
#> This warning is displayed only once per session.
#> # Source: lazy query [?? x 2]
#> # Database: sqlite 3.35.5 [:memory:]
#> Sepal.Length_mean Sepal.Length_sd
#> <dbl> <dbl>
#> 1 5.84 0.828 Created on 2021-06-11 by the reprex package (v2.0.0) |
Thanks for the work around. The character list was just a very simplified example for reprex purposes (I noticed the odd names, but it generally seems like fewer lines of code is better for reprex, so I ignored that because it wasn't really important here). That said, the comment was less "I could find a work around" than "the behavior is inconsistent". If anything, the error is incredibly misleading in the case of unquoting an assigned list. #> Error: `.fns` argument to dbplyr::across() must be a NULL, a function name, formula, or list Ideally using NSE and unquoting SE should result in the same behavior in most situations (although there are obvious exceptions, such as needing to use symbols for lazy evaluation in SE rather than using the function call, especially with dbplyr where the names are translated). But I couldn't see any reason why st <- c("mean", "sd")
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list("mean", "sd")))
db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = !!st)) would result in different behavior. In any case, I even went the base-R way to see if that would work (ignoring known environment issues with the base R eval/substitute), but that caused exactly the same problem. library(dplyr, warn.conflicts = FALSE)
library(dbplyr, warn.conflicts = FALSE)
db <- memdb_frame(iris)
st <- syms(list("mean", "sd"))
expr <- substitute(db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = stats)), list(stats = st))
expr
#> db %>% summarize(across(.cols = all_of("Sepal.Length"), .fns = list(
#> mean, sd)))
eval(expr)
#> Error: `.fns` argument to dbplyr::across() must be a NULL, a function name, formula, or list Created on 2021-06-11 by the reprex package (v2.0.0) So across/across_funs is forcing .fns = list() explicitly ("list" must be NSE and won't work with substitution, even if the stuff inside can be unquoted SE, which your work around showed), which seems antithetical to the purpose of tidyeval. |
I am trying to write an API where the user can specify one or more aggregation functions to apply to one or more columns of a database. Therefore, the function must be a programmed argument and ideally can be a list of character strings. Running this on the tbl itself works (see below), but running it on the database (sqlite, mysql, etc.) errored. There might be a work-around (although I haven't figured it out), but this behavior still feels unintended.
Created on 2021-06-10 by the reprex package (v2.0.0)
The text was updated successfully, but these errors were encountered: