Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Numerous basic functions failing translation when including a summarize as a lazy query for evaluation on server side #490

Closed
schradj opened this issue Aug 4, 2020 · 1 comment

Comments

@schradj
Copy link

schradj commented Aug 4, 2020

Was working on some new features and updating some old one for a package I've developed which utilizes some data summarizations using the dbplyr backend. In this package, I use summarize_at() a LOT which is now superseded by across(). As I was updating for the new across() function, I am getting an odd mix of errors for different functions. I've gone through the release notes for the latest versions of dplyr as best I can and done some searches for other submissions regarding this. Only thing I've found is #480 which confirmed to me that there was indeed something broken when I attempted to use across() with a lazy query. Anyway, I loaded up a small in-memory table and attempted numerous lazy evaluations, only mean and sum seem to work correctly. I know it is superseded, but as across() is not yet supported, for now, these examples continue to use summarize_at(). Though not included below, I also attempted the more verbose format required for standard summarize() i.e.) summarize(z = sum(z)) but got the same errors. Anytime any of the collect() calls shown below are uncommented, every thing works fine. I'm guessing that the dbplyr package is just lagging behind the massive changes included in the latest versions of tidyr and dplyr. Hope this helps. Would also like to know if this can be replicated so as to confirm I'm not just doing something incorrectly or have something weird going on with my system. Thanks!

library(dplyr)
mf <- dbplyr::memdb_frame(
  x = c(sapply(c('a', 'b','c'),rep,3)),
  y = rep(c("first", "second", "last"),3),
  z = 1:9)

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), sum) #works
  #summarize(across(z, sum)) # not yet supported (issue #480)

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), mean, na.rm = TRUE) #works

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), sd, na.rm = TRUE) #error, unless collected

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), IQR, na.rm = TRUE) #error, unless collected

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), mad, na.rm = TRUE) #error, unless collected

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), median) #error, unless collected

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), min, na.rm = TRUE) #works

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(z), max, na.rm = TRUE) #works

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(y,z), first) #error, unless collected

mf %>% #collect() %>% 
  group_by(x) %>% 
  summarize_at(vars(y,z), last) #error, unless collected

Most fail with "object 'z' not found" or "object 'y' not found"

Use of 'median' has an odd unique error of: 'Error: near "(": syntax error'

Here's my session info:
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] dplyr_1.0.1 magrittr_1.5

loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 rstudioapi_0.11 xml2_1.3.2 knitr_1.29 bit_4.0.3 tidyselect_1.1.0 R6_2.4.1
[8] rlang_0.4.7 fansi_0.4.1 stormtracker_0.11.0 blob_1.2.1 tools_4.0.2 data.table_1.13.0 xfun_0.16
[15] utf8_1.1.4 cli_2.0.2 DBI_1.1.0 dbplyr_1.4.4 commonmark_1.7 htmltools_0.5.0 ellipsis_0.3.1
[22] bit64_4.0.2 yaml_2.2.1 digest_0.6.25 assertthat_0.2.1 tibble_3.0.3 lifecycle_0.2.0 crayon_1.3.4
[29] purrr_0.3.4 vctrs_0.3.2 memoise_1.1.0 glue_1.4.1 evaluate_0.14 RSQLite_2.2.0 rmarkdown_2.3
[36] compiler_4.0.2 pillar_1.4.6 generics_0.0.2 pkgconfig_2.0.3

@hadley
Copy link
Member

hadley commented Sep 16, 2020

Duplicate of #480.

Superseded means that the functions will still be around for a long time, so you shouldn't worry that they will soon disappear.

@hadley hadley closed this as completed Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants