dplyr disallows mutation on grouping variables. However, it seems that mutate.tbl_lazy doesn't check whether a mutated variable is a grouping variable and allows the function call to proceed without warnings.
The implication of this is that the seemingly successful mutation actually messed up the grouping information and introduced NA values into the grouping via this line of code. Note this line of code still runs. It just emits a warning about vector replacement that doesn't alert the end user of the actual problem.
When the user tries to call mutate() again, the call fails because of the NAs in grouping in this line of code when op_grps is called and triggers an Only strings can be converted to symbols error in this line of code when it tries to test if all select variables are symbol and fails on the NA value.
This error is quite subtle to diagnose as the final error message doesn't suggest anything related to the root cause.
To be consistent with dplyr's mutate() behavior, perhaps dbplyr should check if a mutated variable is included in the current grouping and disallow if it is in mutate.tbl_lazy().
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(dbplyr)
#>
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#>
#> ident, sql
# Create a test local lazy tibble
t <- lazy_frame(x = 1, y = 2, z = 3)
# There are no groupings
op_grps(t)
#> character(0)
# Add groupings
t.grp <- group_by(t, x, y)
# Grouping is x, y
op_grps(t.grp)
#> [1] "x" "y"
# Calling mutate() on grouped column runs successfully without warning
t.grp.2 <- mutate(t.grp, x = 2)
# Grouping now has an NA value
op_grps(t.grp.2)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> [1] "x" NA
# Subsequent calling of mutate() now returns error
mutate(t.grp.2, x = 3)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> Error: Only strings can be converted to symbols
Created on 2020-01-16 by the reprex package (v0.3.0)
dplyrdisallows mutation on grouping variables. However, it seems thatmutate.tbl_lazydoesn't check whether a mutated variable is a grouping variable and allows the function call to proceed without warnings.The implication of this is that the seemingly successful mutation actually messed up the grouping information and introduced NA values into the grouping via this line of code. Note this line of code still runs. It just emits a warning about vector replacement that doesn't alert the end user of the actual problem.
When the user tries to call
mutate()again, the call fails because of the NAs in grouping in this line of code whenop_grpsis called and triggers anOnly strings can be converted to symbolserror in this line of code when it tries to test if all select variables are symbol and fails on the NA value.This error is quite subtle to diagnose as the final error message doesn't suggest anything related to the root cause.
To be consistent with
dplyr'smutate()behavior, perhapsdbplyrshould check if a mutated variable is included in the current grouping and disallow if it is inmutate.tbl_lazy().Created on 2020-01-16 by the reprex package (v0.3.0)