Skip to content

mutate.tbl_lazy() allows mutation on grouping variables but creates downstream errors #396

@jarodmeng

Description

@jarodmeng

dplyr disallows mutation on grouping variables. However, it seems that mutate.tbl_lazy doesn't check whether a mutated variable is a grouping variable and allows the function call to proceed without warnings.

The implication of this is that the seemingly successful mutation actually messed up the grouping information and introduced NA values into the grouping via this line of code. Note this line of code still runs. It just emits a warning about vector replacement that doesn't alert the end user of the actual problem.

When the user tries to call mutate() again, the call fails because of the NAs in grouping in this line of code when op_grps is called and triggers an Only strings can be converted to symbols error in this line of code when it tries to test if all select variables are symbol and fails on the NA value.

This error is quite subtle to diagnose as the final error message doesn't suggest anything related to the root cause.

To be consistent with dplyr's mutate() behavior, perhaps dbplyr should check if a mutated variable is included in the current grouping and disallow if it is in mutate.tbl_lazy().

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

# Create a test local lazy tibble
t <- lazy_frame(x = 1, y = 2, z = 3)
# There are no groupings
op_grps(t)
#> character(0)

# Add groupings
t.grp <- group_by(t, x, y)
# Grouping is x, y
op_grps(t.grp)
#> [1] "x" "y"

# Calling mutate() on grouped column runs successfully without warning
t.grp.2 <- mutate(t.grp, x = 2)
# Grouping now has an NA value
op_grps(t.grp.2)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> [1] "x" NA

# Subsequent calling of mutate() now returns error
mutate(t.grp.2, x = 3)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> Error: Only strings can be converted to symbols

Created on 2020-01-16 by the reprex package (v0.3.0)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behaviordplyr verbs 🤖Translation of dplyr verbs to SQL

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions