Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] expected output of x %>% dbplyr::compute() %>% dbplyr::remote_name() #639

Closed
yitao-li opened this issue Apr 13, 2021 · 7 comments · Fixed by #850
Closed

[question] expected output of x %>% dbplyr::compute() %>% dbplyr::remote_name() #639

yitao-li opened this issue Apr 13, 2021 · 7 comments · Fixed by #850
Milestone

Comments

@yitao-li
Copy link
Contributor

yitao-li commented Apr 13, 2021

For example, consider

df1 <- memdb_frame(x = 1)
df2 <- df1 %>% mutate(y = 0)
dfc <- df2 %>% compute()
print(dfc %>% remote_name())

With dbplyr version 2.1.1 the output is NULL, but I seem to remember some earlier version of dbplyr would produce a non-NULL output, and also based on how I understood the dbplyr documentation, a non-NULL output is expected, right?

BTW the aforementioned NULL / non-NULL outputs also apply to Spark dataframes in sparklyr or other DB backends, not just memdb_frames.

@yitao-li
Copy link
Contributor Author

In particular, now if I run the following with sparklyr:

library(sparklyr)

sc <- spark_connect(master = "local")
sdf <- sdf_len(sc, 5)
print(dbplyr::remote_name(sdf))
sdf <- sdf %>% dplyr::compute()
print(dbplyr::remote_name(sdf))

I see the table had a remote name to begin with, but then after dplyr::compute() the remote name is not there any more, which seems counter-intuitive, and I don't think this happened with an earlier version of dbplyr.

@yitao-li
Copy link
Contributor Author

yitao-li commented Apr 15, 2021

And same with the memdb_frame example too (i.e., if I simplify the example I mentioned in the issue to something matching the sparklyr example from the above):

library(dplyr)
library(dbplyr)

df <- memdb_frame(x = 1)
print(df %>% remote_name()) # has remote name
df <- df %>% compute()
print(df %>% remote_name()) # remote name becomes NULL now :/

@yitao-li
Copy link
Contributor Author

^^ but the examples above do return a non-NULL remote name after compute() with dbplyr 2.0, and I believe there was some change in dbplyr 2.1.x that caused the NULL output

@yitao-li
Copy link
Contributor Author

yitao-li commented Apr 20, 2021

Proposed change: #649 (i.e., I think remote_name() should ignore the "trailing" bunch of op_ungroup operations -- let me know if it makes sense)

@mgirlich
Copy link
Collaborator

I think it would make more sense to make remote_name() a generic. And then also op_group_by should be ignored. For example:

remote_name <- function(x) {
  UseMethod("remote_name")
}

#' @export
remote_name.tbl_lazy <- function(x) {
  remote_name(x$ops)
}

#' @export
remote_name.op_base <- function(x) {
  x$x
}

#' @export
remote_name.op_group_by <- function(x) {
  remote_name(x$x)
}

#' @export
remote_name.op_ungroup <- function(x) {
  remote_name(x$x)
}

#' @export
remote_name.default <- function(x) {
  return()
}

@yitao-li
Copy link
Contributor Author

@mgirlich That's a great idea! I have revised my PR accordingly.

@krlmlr
Copy link
Member

krlmlr commented Apr 24, 2021

FWIW it looks like this was introduced in 4355916 (#616).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants