Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnest functions fail with lists of exprs #848

Closed
md0u80c9 opened this issue Jan 3, 2020 · 11 comments
Closed

Unnest functions fail with lists of exprs #848

md0u80c9 opened this issue Jan 3, 2020 · 11 comments
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames

Comments

@md0u80c9
Copy link

md0u80c9 commented Jan 3, 2020

Hi,

I have some code which uses a fair bit of meta programming (so a table of audit measures basically - where we have lists of the audit measure function being applied). I spotted a neat-looking optimisation to use tidyr to unnest the list.

Unfortunately, when using unnest where the list elements are functions, we get errors within the function (and presumably unpredictable behaviour). Ironically it's presumably because internally tidyr is doing exactly what my code is trying to do with the results of the function!

To demonstrate this, I used the tidyr example and converted the 'films' list-column to have functions rather than film titles (imagine we're going to pluck those functions and insert them into a later dplyr query on a table).


Brief description of the problem

df <- tibble(
   character = c("Toothless", "Dory"),
   metadata = list(
     list(
       species = "dragon",
       color = "black",
       films = c(
         rlang::expr(.data[["title"]] == "How to Train Your Dragon"),
         rlang::expr(.data[["title"]] == "How to Train Your Dragon 2"),
         rlang::expr(.data[["title"]] == "How to Train Your Dragon: The Hidden World")
        )
     ),
     list(
       species = "clownfish",
       color = "blue",
       films = c(rlang::expr(.data[["title"]] == "Finding Nemo"),
         rlang::expr(.data[["title"]] == "Finding Dory"))
     )
   )
 )

tidyr::unnest_wider(df, metadata)

tidyr::unnest_longer(df, films)

The last command to unnest films results in:

Error in eval_tidy(enquo(var), var_env) : object 'films' not found

@hadley

This comment has been minimized.

@hadley hadley added the reprex needs a minimal reproducible example label Jan 3, 2020
@md0u80c9

This comment has been minimized.

@hadley

This comment has been minimized.

@md0u80c9

This comment has been minimized.

@md0u80c9

This comment has been minimized.

@hadley

This comment has been minimized.

@md0u80c9

This comment has been minimized.

@hadley
Copy link
Member

hadley commented Jan 4, 2020

Yeah, it makes sense. Here's a simpler reprex:

library(tidyr)

df <- tibble(x = 1:2, y = list(list(quote(x)), list(quote(x), quote(y))))
df %>% unnest_longer(y)
#> Error: `x` must be a vector, not a symbol

Created on 2020-01-04 by the reprex package (v0.3.0)

@md0u80c9
Copy link
Author

md0u80c9 commented Jan 8, 2020

The error seems to be occurring because the simplify_col function in tidyr tries to call vctrs::vec_size on each item in the nested lists - so we can distill the problem, to the following expression which fails:

vec_size(quote(x))

Is this in turn a bug in vctrs? I presume the size of a vector of expressions should be calculable: length(quote(x)) returns 1, which I think is the answer I would expect here.

@hadley
Copy link
Member

hadley commented Jan 8, 2020

The heart of the problem is that quote(x) is not a vector; i.e. it's not something that belongs in the column of a data frame.

@md0u80c9
Copy link
Author

md0u80c9 commented Jan 8, 2020

I've been looking at the simplify_col function in rectangle.r which is where the error occurs (from line 324).

Here's my reasoning of what the function is doing - please feel free to clarify it if I'm wrong.

We're taking x as our input parameter to simplify_col, which must be either a list or a data frame.

We exit early if x is a list of lists.

We are then trying to establish the size of all contents of x by mapping over each item in the list or data frame and calling vec_size. In our case this is a list, each containing a single quote().

But when we are calling map, our vector quote(x) is being simplified to an expression, rather than a vector of expressions of length 1.

The error is because it now sees quote(x) is not a vector, rather than as a vector of length 1.
So I think the failure is occurring here because c(quote(x)) is being wrongly simplified to quote(x).

Testing this:

vec_size(quote(x)) fails with an error.
length(quote(x)) returns 1.
vec_size(c(quote(x)) returns 1 (because it's a vector).
vec_size(c(quote(x), quote(y)) returns 2.

I think we can fix this by modifying line 344 to instead read:

n <- map_int(x, ~ vec_size(c(.x)))

However, we then hit a similar problem a few lines later:

vec_c(!!!x)

Where we're again expecting all of the items in the list to be vectors to be spliced; yet we've had them forced to be expressions again. Is there a way to solve this latter case?

@hadley hadley added bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames and removed reprex needs a minimal reproducible example labels May 19, 2020
@hadley hadley closed this as completed in 04ebe35 May 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

No branches or pull requests

2 participants