You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dplyr disallows mutation on grouping variables. However, it seems that mutate.tbl_lazy doesn't check whether a mutated variable is a grouping variable and allows the function call to proceed without warnings.
The implication of this is that the seemingly successful mutation actually messed up the grouping information and introduced NA values into the grouping via this line of code. Note this line of code still runs. It just emits a warning about vector replacement that doesn't alert the end user of the actual problem.
When the user tries to call mutate() again, the call fails because of the NAs in grouping in this line of code when op_grps is called and triggers an Only strings can be converted to symbols error in this line of code when it tries to test if all select variables are symbol and fails on the NA value.
This error is quite subtle to diagnose as the final error message doesn't suggest anything related to the root cause.
To be consistent with dplyr's mutate() behavior, perhaps dbplyr should check if a mutated variable is included in the current grouping and disallow if it is in mutate.tbl_lazy().
library(dplyr)
#> #> Attaching package: 'dplyr'#> The following objects are masked from 'package:stats':#> #> filter, lag#> The following objects are masked from 'package:base':#> #> intersect, setdiff, setequal, union
library(dbplyr)
#> #> Attaching package: 'dbplyr'#> The following objects are masked from 'package:dplyr':#> #> ident, sql# Create a test local lazy tibblet<- lazy_frame(x=1, y=2, z=3)
# There are no groupings
op_grps(t)
#> character(0)# Add groupingst.grp<- group_by(t, x, y)
# Grouping is x, y
op_grps(t.grp)
#> [1] "x" "y"# Calling mutate() on grouped column runs successfully without warningt.grp.2<- mutate(t.grp, x=2)
# Grouping now has an NA value
op_grps(t.grp.2)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to#> replace is not a multiple of replacement length#> [1] "x" NA# Subsequent calling of mutate() now returns error
mutate(t.grp.2, x=3)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to#> replace is not a multiple of replacement length#> Error: Only strings can be converted to symbols
The text was updated successfully, but these errors were encountered:
jarodmeng
changed the title
mutate.tbl.lazy() allows mutation on grouping variables but creates downstream errors
mutate.tbl_lazy() allows mutation on grouping variables but creates downstream errors
Jan 17, 2020
library(dbplyr)
library(dplyr, warn.conflicts=FALSE)
db<- lazy_frame(x=1, y=2, z=3)
db2<-db %>%
group_by(x, y) %>%
mutate(x=2)
op_grps(db2)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to#> replace is not a multiple of replacement length#> [1] "x" NA
mutate(db2, x=3)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to#> replace is not a multiple of replacement length#> Error: Only strings can be converted to symbols
dplyr
disallows mutation on grouping variables. However, it seems thatmutate.tbl_lazy
doesn't check whether a mutated variable is a grouping variable and allows the function call to proceed without warnings.The implication of this is that the seemingly successful mutation actually messed up the grouping information and introduced NA values into the grouping via this line of code. Note this line of code still runs. It just emits a warning about vector replacement that doesn't alert the end user of the actual problem.
When the user tries to call
mutate()
again, the call fails because of the NAs in grouping in this line of code whenop_grps
is called and triggers anOnly strings can be converted to symbols
error in this line of code when it tries to test if all select variables are symbol and fails on the NA value.This error is quite subtle to diagnose as the final error message doesn't suggest anything related to the root cause.
To be consistent with
dplyr
'smutate()
behavior, perhapsdbplyr
should check if a mutated variable is included in the current grouping and disallow if it is inmutate.tbl_lazy()
.Created on 2020-01-16 by the reprex package (v0.3.0)
The text was updated successfully, but these errors were encountered: