Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rectanglers over-eager to git rid of emptiness #806

Closed
jennybc opened this issue Nov 20, 2019 · 3 comments
Closed

Rectanglers over-eager to git rid of emptiness #806

jennybc opened this issue Nov 20, 2019 · 3 comments
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames

Comments

@jennybc
Copy link
Member

jennybc commented Nov 20, 2019

Adapting the Toothless & Dory example, but where I've removed the films metadata for Toothless. When I hoist() or unnest_wider(), films is destined to become a list-column. I think the entry for Dory should be NULL, not an unspecified vector of length 1. Or maybe even an unspecified vector of length 0? Or a character vector of length 0?

library(tidyverse)

df <- tibble(
  character = c("Toothless", "Dory"),
  metadata = list(
    list(
      species = "dragon",
      color = "black"
    ),
    list(
      species = "clownfish",
      color = "blue",
      films = c("Finding Nemo", "Finding Dory")
    )
  )
)
df
#> # A tibble: 2 x 2
#>   character metadata        
#>   <chr>     <list>          
#> 1 Toothless <named list [2]>
#> 2 Dory      <named list [3]>

df %>% 
  hoist(metadata, films = "films")
#> # A tibble: 2 x 3
#>   character films     metadata        
#>   <chr>     <list>    <list>          
#> 1 Toothless <???>     <named list [2]>
#> 2 Dory      <chr [2]> <named list [2]>

(df2 <- df %>% 
  unnest_wider(metadata))
#> # A tibble: 2 x 4
#>   character species   color films    
#>   <chr>     <chr>     <chr> <list>   
#> 1 Toothless dragon    black <???>    
#> 2 Dory      clownfish blue  <chr [2]>
df2$films
#> [[1]]
#> <unspecified> [1]
#> 
#> [[2]]
#> [1] "Finding Nemo" "Finding Dory"
lengths(df2$films)
#> [1] 1 2

Created on 2019-11-20 by the reprex package (v0.3.0.9001)

@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 20, 2019

Should hoist() and unnest_wider() (and maybe unnest_longer() and unnest_auto()) have a keep_empty argument that they pass on to their corresponding usages of simplify_col? Right now it is hard coded as TRUE. That is where it changes from NULL to unspecified(1).

At least for this one example, flipping that to FALSE retains NULL for the Toothless films element

keep_empty = TRUE

keep_empty = TRUE,

Additional note: unnest() also already has a keep_empty argument, and it is defaulted to FALSE.

@jennybc
Copy link
Member Author

jennybc commented Nov 20, 2019

I think setting keep_empty = FALSE fixes it for the wrong reason. It happens to bypass the call to init_col(). Plus keep_empty = FALSE is an odd way to express that we want to retain emptiness.

@hadley hadley added bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames labels Nov 24, 2019
@hadley
Copy link
Member

hadley commented Nov 24, 2019

I think there's something wrong with the logic in init_col() — we only want to use unspecified() if the result is going to be a vector. Maybe we need a can_simplify() that we can use earlier in the function?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants