Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnest_wider() drops column if it contains empty lists #1125

Closed
RoelVerbelen opened this issue Jun 10, 2021 · 4 comments · Fixed by #1200
Closed

unnest_wider() drops column if it contains empty lists #1125

RoelVerbelen opened this issue Jun 10, 2021 · 4 comments · Fixed by #1200
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames

Comments

@RoelVerbelen
Copy link

Inconsistency in output format that came up when processing nested lists that might be empty:

library(tidyr)
df <- tibble(
  x = list(list(z = list(1)), list(z = list(2))), 
  y = 1:2
  )
# column z as expected
df %>% unnest_wider(x)
#> # A tibble: 2 x 2
#>   z              y
#>   <list>     <int>
#> 1 <list [1]>     1
#> 2 <list [1]>     2


df <- tibble(
  x = list(list(z = list()), list(z = list())), 
  y = 1:2
)
# column z not present
df %>% unnest_wider(x)
#> # A tibble: 2 x 1
#>       y
#>   <int>
#> 1     1
#> 2     2

Created on 2021-06-10 by the reprex package (v1.0.0)

@hadley hadley added bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames labels Aug 23, 2021
@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 3, 2021

A similar example:

library(tidyr)

# Mix of empty types
col <- list(
  list(a = list()),
  list(a = integer())
)

df <- tibble(col = col)
df
#> # A tibble: 2 × 1
#>   col             
#>   <list>          
#> 1 <named list [1]>
#> 2 <named list [1]>

# Lost the `a` col
unnest_wider(df, col)
#> # A tibble: 0 × 0


# Mix of empty types and character vectors
col <- list(
  list(a = "x"),
  list(a = list()),
  list(a = integer()),
  list(a = "y")
)

df <- tibble(col = col)
df
#> # A tibble: 4 × 1
#>   col             
#>   <list>          
#> 1 <named list [1]>
#> 2 <named list [1]>
#> 3 <named list [1]>
#> 4 <named list [1]>

# WAT. How could those be combined with the character vectors?
unnest_wider(df, col, simplify = TRUE)
#> # A tibble: 4 × 1
#>   a    
#>   <chr>
#> 1 x    
#> 2 <NA> 
#> 3 <NA> 
#> 4 y

# They could be combined because they were replaced with `NULL`.
unnest_wider(df, col, simplify = FALSE)
#> # A tibble: 4 × 1
#>   a        
#>   <list>   
#> 1 <chr [1]>
#> 2 <NULL>   
#> 3 <NULL>   
#> 4 <chr [1]>

Created on 2021-11-03 by the reprex package (v2.0.1)

This happens because of this compact() that removes size zero elements (as well as NULL elements), and I feel like that is wrong as it can result in some unnesting being allowed (as seen above)

x <- purrr::compact(x)

@DavisVaughan
Copy link
Member

DavisVaughan commented Nov 3, 2021

This actually affects the vignette for rectangling, which relies on the current behavior. It can be summed up as follows:

library(tidyr)
library(repurrrsive)

chars <- tibble(char = got_chars)
head(chars)
#> # A tibble: 6 × 1
#>   char             
#>   <list>           
#> 1 <named list [18]>
#> 2 <named list [18]>
#> 3 <named list [18]>
#> 4 <named list [18]>
#> 5 <named list [18]>
#> 6 <named list [18]>

# Inside each row is an `aliases` element.
# It is always a character vector except in this one case:
chars$char[[18]]$aliases
#> [1] "Catelyn Tully"     "Lady Stoneheart"   "The Silent Sistet"
#> [4] "Mother Mercilesr"  "The Hangwomans"
chars$char[[19]]$aliases
#> list()
chars$char[[20]]$aliases
#> [1] "Ned"            "The Ned"        "The Quiet Wolf"

# Unnest, then look at that aliases column
chars2 <- chars %>% unnest_wider(char)

# Should be:
# <chr  [5]>
# <list [0]>
# <chr  [3]>
chars2[18:20, "aliases"]
#> # A tibble: 3 × 1
#>   aliases  
#>   <list>   
#> 1 <chr [5]>
#> 2 <NULL>   
#> 3 <chr [3]>

# Current implementation means this works, but I don't
# think it should
unnest_longer(chars2[18:20, "aliases"], aliases)
#> # A tibble: 9 × 1
#>   aliases          
#>   <chr>            
#> 1 Catelyn Tully    
#> 2 Lady Stoneheart  
#> 3 The Silent Sistet
#> 4 Mother Mercilesr 
#> 5 The Hangwomans   
#> 6 <NA>             
#> 7 Ned              
#> 8 The Ned          
#> 9 The Quiet Wolf

# This should have been identical to:
vctrs::vec_c(
  chars$char[[18]]$aliases,
  chars$char[[19]]$aliases,
  chars$char[[20]]$aliases
)
#> Error: Can't combine `..1` <character> and `..2` <list>.

Created on 2021-11-03 by the reprex package (v2.0.1)

@mgirlich
Copy link
Contributor

mgirlich commented Nov 4, 2021

Note that, unfortunately, empty lists are a bit special because they easily appear when working with JSON:

json <- jsonlite::toJSON(
  list(
    list(x = 1:3),
    list(x = integer()),
    list(x = 4:5)
  )
)
json
#> [{"x":[1,2,3]},{"x":[]},{"x":[4,5]}]

jsonlite::fromJSON(json, simplifyDataFrame = FALSE)
#> [[1]]
#> [[1]]$x
#> [1] 1 2 3
#> 
#> 
#> [[2]]
#> [[2]]$x
#> list()
#> 
#> 
#> [[3]]
#> [[3]]$x
#> [1] 4 5
tibble::tibble(jsonlite::fromJSON(json, simplifyDataFrame = TRUE))
#> # A tibble: 3 × 1
#>   x        
#>   <list>   
#> 1 <int [3]>
#> 2 <int [0]>
#> 3 <int [2]>

Created on 2021-11-04 by the reprex package (v2.0.1)

And I guess the rectangling functions are typically used when working with JSON.

So, it might make sense to allow (via an argument?) empty lists to be combined with other vectors.

@DavisVaughan
Copy link
Member

Yea I was thinking the same thing! I'll likely add an argument for some form of that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants