Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mutate.tbl_lazy() allows mutation on grouping variables but creates downstream errors #396

Closed
jarodmeng opened this issue Jan 17, 2020 · 1 comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL

Comments

@jarodmeng
Copy link

dplyr disallows mutation on grouping variables. However, it seems that mutate.tbl_lazy doesn't check whether a mutated variable is a grouping variable and allows the function call to proceed without warnings.

The implication of this is that the seemingly successful mutation actually messed up the grouping information and introduced NA values into the grouping via this line of code. Note this line of code still runs. It just emits a warning about vector replacement that doesn't alert the end user of the actual problem.

When the user tries to call mutate() again, the call fails because of the NAs in grouping in this line of code when op_grps is called and triggers an Only strings can be converted to symbols error in this line of code when it tries to test if all select variables are symbol and fails on the NA value.

This error is quite subtle to diagnose as the final error message doesn't suggest anything related to the root cause.

To be consistent with dplyr's mutate() behavior, perhaps dbplyr should check if a mutated variable is included in the current grouping and disallow if it is in mutate.tbl_lazy().

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(dbplyr)
#> 
#> Attaching package: 'dbplyr'
#> The following objects are masked from 'package:dplyr':
#> 
#>     ident, sql

# Create a test local lazy tibble
t <- lazy_frame(x = 1, y = 2, z = 3)
# There are no groupings
op_grps(t)
#> character(0)

# Add groupings
t.grp <- group_by(t, x, y)
# Grouping is x, y
op_grps(t.grp)
#> [1] "x" "y"

# Calling mutate() on grouped column runs successfully without warning
t.grp.2 <- mutate(t.grp, x = 2)
# Grouping now has an NA value
op_grps(t.grp.2)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> [1] "x" NA

# Subsequent calling of mutate() now returns error
mutate(t.grp.2, x = 3)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> Error: Only strings can be converted to symbols

Created on 2020-01-16 by the reprex package (v0.3.0)

@jarodmeng jarodmeng changed the title mutate.tbl.lazy() allows mutation on grouping variables but creates downstream errors mutate.tbl_lazy() allows mutation on grouping variables but creates downstream errors Jan 17, 2020
@hadley
Copy link
Member

hadley commented Apr 14, 2020

Somewhat more minimal reprex:

library(dbplyr)
library(dplyr, warn.conflicts = FALSE)

db <- lazy_frame(x = 1, y = 2, z = 3)

db2 <- db %>% 
  group_by(x, y) %>% 
  mutate(x = 2)

op_grps(db2)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> [1] "x" NA

mutate(db2, x = 3)
#> Warning in grps[grps %in% names(old2new)] <- old2new[grps]: number of items to
#> replace is not a multiple of replacement length
#> Error: Only strings can be converted to symbols

Created on 2020-04-14 by the reprex package (v0.3.0)

@hadley hadley added bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL labels Apr 14, 2020
@hadley hadley closed this as completed in 0bf2176 Sep 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior verb trans 🤖 Translation of dplyr verbs to SQL
Projects
None yet
Development

No branches or pull requests

2 participants