Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for NULL in list column when unnesting? #436

Closed
kendonB opened this issue Mar 9, 2018 · 5 comments
Closed

Allow for NULL in list column when unnesting? #436

kendonB opened this issue Mar 9, 2018 · 5 comments
Labels
feature a feature request or enhancement rectangling 🗄️ converting deeply nested lists into tidy data frames

Comments

@kendonB
Copy link

kendonB commented Mar 9, 2018

library(tidyverse)
tibble(list = list(NULL, tibble(x = 1))) %>% 
  unnest()
#> Error: Each column must either be a list of vectors or a list of data frames [list]

I would have expected:

#> # A tibble: 2 x 1
#>       x
#>   <dbl>
#> 1 NA   
#> 2  1.00
@markdly
Copy link
Contributor

markdly commented Mar 25, 2018

I've been thinking about this issue too. To me, it feels like this fits in with the discussion happening over at #358 ...

@billdenney
Copy link
Contributor

I have a use case for this where:

I have multiple datasets in a clinical trial. Some data have one or more rows for each subject; some data may have zero rows for each subject. Specifically, lab measures from blood concentrations have at least one measure for each subject; adverse events (aka side effects) have zero or more rows per subject.

The number of rows in the data are the number of observations which is important for adverse events, and imputing an empty row would cause issues with many downstream processing efforts because counting adverse events would be more complex.

What I want to do is make nested datasets for both, merge them by subject number, and be able to unnest either individually later. As an example:

library(tidyverse)                    
#> -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1     v purrr   0.2.5
#> v tibble  1.4.2     v dplyr   0.7.5
#> v tidyr   0.8.1     v stringr 1.3.1
#> v readr   1.1.1     v forcats 0.3.0
#> Warning: package 'tidyr' was built under R version 3.4.4
#> Warning: package 'purrr' was built under R version 3.4.4
#> Warning: package 'dplyr' was built under R version 3.4.4
#> Warning: package 'stringr' was built under R version 3.4.4
#> -- Conflicts ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
d_adverse <-                          
  data.frame(SUBJID=1,                  
    AE="nausea") %>%                      
  as_tibble() %>%                       
  nest(-SUBJID, .key="adverse")         
d_lab <-                              
  data.frame(SUBJID=1:2,                
    labname="cholesterol") %>%            
  as_tibble() %>%                       
  nest(-SUBJID, .key="lab")             
#> Warning: package 'bindrcpp' was built under R version 3.4.4
d_total <- full_join(d_adverse, d_lab)
#> Joining, by = "SUBJID"
d_total %>%                           
  select(-lab) %>%                      
  unnest()                              
#> Error: Each column must either be a list of vectors or a list of data frames [adverse]

@markdly
Copy link
Contributor

markdly commented Jun 28, 2018

Hi @billdenney, if you need a temporary workaround, perhaps modifying the adverse list column to replace NULL with an empty tibble could help. Unnesting then returns the original adverse event counts:

library(tidyverse) 
d_total %>% 
  mutate(adverse = map_if(adverse, is.null, ~ tibble())) %>% 
  select(-lab) %>%                      
  unnest()  
#> # A tibble: 1 x 2
#>   SUBJID AE    
#>    <dbl> <fct> 
#> 1      1 nausea

@hadley hadley added feature a feature request or enhancement rectangling 🗄️ converting deeply nested lists into tidy data frames labels Jan 4, 2019
@hadley
Copy link
Member

hadley commented Jan 4, 2019

Supporting NULL values seems reasonable to me.

@hadley hadley closed this as completed in 64ee16b Mar 7, 2019
@hadley
Copy link
Member

hadley commented Mar 7, 2019

Fixed with a quick hack; will hopefully naturally fall out when I rewrite unnest() to use vctrs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement rectangling 🗄️ converting deeply nested lists into tidy data frames
Projects
None yet
Development

No branches or pull requests

4 participants