Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

purrr's lambda function doesn't work correctly in dplyr::mutate() #1447

Closed
yutannihilation opened this issue Oct 11, 2015 · 6 comments
Closed

purrr's lambda function doesn't work correctly in dplyr::mutate() #1447

yutannihilation opened this issue Oct 11, 2015 · 6 comments
Assignees
Labels
Milestone

Comments

@yutannihilation
Copy link
Member

@yutannihilation yutannihilation commented Oct 11, 2015

(I originally reported this issue in tidyverse/purrr#103)

When I tried this example of purrr, I found the code sometimes fails with the following error:

#> Error: invalid term in model formula

The error above seems to occur when ggplot2 is loaded. A minimal code to reproduce the issue is bellow.

library(dplyr)
library(purrr)
d <- data_frame(training = list(mtcars, mtcars * 2))

# before ggplot2 is loaded, no error occurs
d %>%
  mutate(lm_result = map(training, ~ lm(mpg ~ wt, data = .)))
#> Source: local data frame [2 x 2]
#> 
#>               training lm_result
#>                 (list)    (list)
#> 1 <data.frame [32,11]>   <S3:lm>
#> 2 <data.frame [32,11]>   <S3:lm>

# after ggplot2, the very same code results in the error of `lm()`
library(ggplot2)
d %>%
  mutate(lm_result = map(training, ~ lm(mpg ~ wt, data = .)))
#> Error: invalid term in model formula

After some investigation, I found the .f is translated differently. But, I couldn't specify the cause...

debug_map <- function(.x, .f, ...) {print(.f); return(TRUE)}

library(dplyr)
library(purrr)
d <- data_frame(training = list(mtcars, mtcars * 2))

d %>%
  mutate(lm_result = debug_map(training, ~ lm(mpg ~ wt, data = .)))
#> ~lm(mpg ~ wt, data = .)
#> ...snip...

library(ggplot2)
d %>%
  mutate(lm_result = debug_map(training, ~ lm(mpg ~ wt, data = .)))
#> ~lm(list(manufacturer = c("audi", "audi", ...), model = c("a4", "a4", ...), displ = c(1.8, 1.8, ...), ...) ~ wt, data = .)
#> ...snip...

# If I use function(), it works fine
d %>%
  mutate(lm_result = debug_map(training, function(x) lm(mpg ~ wt, data = x)))
#> function(x) lm(mpg ~ wt, data = x)
#> ...snip...
@yutannihilation yutannihilation changed the title purrr's lambda function don't work correctly in dplyr::mutate() purrr's lambda function doesn't work correctly in dplyr::mutate() Oct 11, 2015
@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Oct 18, 2015

I got to understand this a bit clearer. The problem is that mpg column of mtcars in

d %>%
  mutate(lm_result = debug_map(training, function(x) lm(mpg ~ wt, data = x)))

is accidentally substituted with ggplot2::mpg.

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Oct 18, 2015

Here's a simpler example of this name-conflict problem. The problem occurs when

  1. the data.frame to be processed has data.frames as its column,
  2. there is an object that has the same name as one of the inner data.frame's column names,
  3. and purrr's lambda notation is used.
library(dplyr)
library(purrr)

df_xy <- data_frame(x = runif(100), y = runif(100))
d <- data_frame(dataframes = replicate(df_xy, n = 3, simplify = FALSE))

# no error occurs
d %>%
  mutate(lm_result = map(dataframes, ~ lm(y ~ x, data = .)))
#> ...snip...

# if there is an object whose name is the same as one of df_xy's column names
x <- iris

# Error!
d %>%
  mutate(lm_result = map(dataframes, ~ lm(y ~ x, data = .)))
#> Error: invalid model formula in ExtractVars

# function(x) works
d %>%
  mutate(lm_result = map(dataframes, function(a) lm(y ~ x, data = a)))
#> ...snip...

It seems that lazyeval cannot recognize the scope of the lambda function. Are there way to avoid this? Should I file this issue on lazyeval's repo?

@hadley
Copy link
Member

@hadley hadley commented Oct 19, 2015

This is a dplyr issue. We'll take a look when we're next working on dplyr.

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Oct 19, 2015

Sure. Thanks for the notice!

@hadley hadley added the bug label Oct 21, 2015
@hadley hadley added this to the 0.5 milestone Oct 21, 2015
@hadley
Copy link
Member

@hadley hadley commented Oct 21, 2015

I suspect this is because the hybrid evaluator is inlining variable names in ~, when it should never touch the contents of a ~ call.

@yutannihilation
Copy link
Member Author

@yutannihilation yutannihilation commented Nov 1, 2015

Thanks!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants