Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for NULL in list column when unnesting? #436

Closed
kendonB opened this issue Mar 9, 2018 · 5 comments
Closed

Allow for NULL in list column when unnesting? #436

kendonB opened this issue Mar 9, 2018 · 5 comments

Comments

@kendonB
Copy link

@kendonB kendonB commented Mar 9, 2018

library(tidyverse)
tibble(list = list(NULL, tibble(x = 1))) %>% 
  unnest()
#> Error: Each column must either be a list of vectors or a list of data frames [list]

I would have expected:

#> # A tibble: 2 x 1
#>       x
#>   <dbl>
#> 1 NA   
#> 2  1.00
@markdly
Copy link
Contributor

@markdly markdly commented Mar 25, 2018

I've been thinking about this issue too. To me, it feels like this fits in with the discussion happening over at #358 ...

@billdenney
Copy link
Contributor

@billdenney billdenney commented Jun 27, 2018

I have a use case for this where:

I have multiple datasets in a clinical trial. Some data have one or more rows for each subject; some data may have zero rows for each subject. Specifically, lab measures from blood concentrations have at least one measure for each subject; adverse events (aka side effects) have zero or more rows per subject.

The number of rows in the data are the number of observations which is important for adverse events, and imputing an empty row would cause issues with many downstream processing efforts because counting adverse events would be more complex.

What I want to do is make nested datasets for both, merge them by subject number, and be able to unnest either individually later. As an example:

library(tidyverse)                    
#> -- Attaching packages ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 2.2.1     v purrr   0.2.5
#> v tibble  1.4.2     v dplyr   0.7.5
#> v tidyr   0.8.1     v stringr 1.3.1
#> v readr   1.1.1     v forcats 0.3.0
#> Warning: package 'tidyr' was built under R version 3.4.4
#> Warning: package 'purrr' was built under R version 3.4.4
#> Warning: package 'dplyr' was built under R version 3.4.4
#> Warning: package 'stringr' was built under R version 3.4.4
#> -- Conflicts ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag()    masks stats::lag()
d_adverse <-                          
  data.frame(SUBJID=1,                  
    AE="nausea") %>%                      
  as_tibble() %>%                       
  nest(-SUBJID, .key="adverse")         
d_lab <-                              
  data.frame(SUBJID=1:2,                
    labname="cholesterol") %>%            
  as_tibble() %>%                       
  nest(-SUBJID, .key="lab")             
#> Warning: package 'bindrcpp' was built under R version 3.4.4
d_total <- full_join(d_adverse, d_lab)
#> Joining, by = "SUBJID"
d_total %>%                           
  select(-lab) %>%                      
  unnest()                              
#> Error: Each column must either be a list of vectors or a list of data frames [adverse]

@markdly
Copy link
Contributor

@markdly markdly commented Jun 28, 2018

Hi @billdenney, if you need a temporary workaround, perhaps modifying the adverse list column to replace NULL with an empty tibble could help. Unnesting then returns the original adverse event counts:

library(tidyverse) 
d_total %>% 
  mutate(adverse = map_if(adverse, is.null, ~ tibble())) %>% 
  select(-lab) %>%                      
  unnest()  
#> # A tibble: 1 x 2
#>   SUBJID AE    
#>    <dbl> <fct> 
#> 1      1 nausea

@hadley
Copy link
Member

@hadley hadley commented Jan 4, 2019

Supporting NULL values seems reasonable to me.

@hadley hadley closed this in 64ee16b Mar 7, 2019
@hadley
Copy link
Member

@hadley hadley commented Mar 7, 2019

Fixed with a quick hack; will hopefully naturally fall out when I rewrite unnest() to use vctrs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants