Skip to content

Preserve missing rows when unnesting #358

@leungi

Description

@leungi
Details Hi,

Supposed tibble is as such (columns separated by ' | '):

index | text | polarity | polarity_confidence | aspects
1 | blah1 | positive | 0.579939 | list()
2 | blah2 | negative | 0.693546 | list()
3 | blah3 | negative | 0.676733 | list()
4 | blah4 | positive | 0.756442 | list()
5 | blah5 | positive | 0.815249 | list()
6 | blah6 | positive | 0.72212 | list()
7 | blah7 | negative | 0.808398 | list(a = value, b = value, c = value)
8 | blah8 | negative | 0.63281 | list()
9 | blah9 | negative | 0.709047 | list()
10 | blah10 | negative | 0.912631 | list()
11 | blah11 | negative | 0.752882 | list(a = value, b = value, c = value)

Issue:
tibble %>%
unnest(aspects)

##will drop every row except from 7 and 11 (i.e. those with non-empty list), '.drop = FALSE' doesn't help

My workaround currently is as follow:

  1. by row, determine if list is empty (using length())
  2. if list is empty, sub with dummy non-empty list (using if_else)
  3. then unnest

Workaround code:
tibble %>%
mutate(listLength = map_int(aspects, length)) %>%
mutate(aspects = if_else(listLength <= 0, list(data.frame("NA")), aspects)) %>%
unnest(aspects)

Desired output:
index | text | polarity | polarity_confidence | a | b | c
1 | blah1 | positive | 0.579939 | NA | NA | NA
2 | blah2 | negative | 0.693546 | NA | NA | NA
3 | blah3 | negative | 0.676733 | NA | NA | NA
4 | blah4 | positive | 0.756442 | NA | NA | NA
5 | blah5 | positive | 0.815249 | NA | NA | NA
6 | blah6 | positive | 0.72212 | NA | NA | NA
7 | blah7 | negative | 0.808398 | value | value | value
8 | blah8 | negative | 0.63281 | NA | NA | NA
9 | blah9 | negative | 0.709047 | NA | NA | NA
10 | blah10 | negative | 0.912631 | NA | NA | NA
11 | blah11 | negative | 0.752882 | value | value | value

Am I missing something?

Look FW to insights. Thanks in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementrectangling 🗄️converting deeply nested lists into tidy data frames

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions