Skip to content

Directional unnesting #418

@hadley

Description

@hadley

Copying notes from lingering file on my desktop:

# Some intersting challenges from Jenny
# https://github.com/tidyverse/googledrive/blob/519452fc4d3257354079324e4afede777604848f/data-raw/discovery-doc-prep.R#L118-L124

library(tidyverse)
df <- tibble(
  g = c("a", "a", "b"),
  x = c(1, 3, 5),
  y = c("x", "y", "z")
)
df

# One missing case z = list(1, 2, 3)
# There you just want to simplify the list col to a vector


# rows and cols change; current nest behaviour
# df %>% nest(x, y, .key = "data")
nest_both <- tribble(
  ~g,  ~data,
  "a", tibble(x = c(1, 3), y = c("x", "y")),
  "b", tibble(x = 5, y = "z")
)
nest_both

# rows change; cols don't
# get list of vectors
# df %>% nest_rows(x, y)
nest_rows <- tribble(
  ~g, ~x,       ~y,
  "a", c(1, 3), c("x", "y"),
  "b", 5,       "z"
)

# cols change; rows don't
# use lists, not tibbles to convey intent.
# df %>% nest_cols(x, y, .key = "data")
nest_cols <- tribble(
  ~g,  ~data,
  "a", list(x = 1, y = "x"),
  "a", list(x = 3, y = "y"),
  "b", list(x = 5, y = "z")
)
nest_cols

# unnest ------------------------------------------------------------------

# All of these should be able to automatically determine the
# unnested direction: data frame = both; named vector = col;
# unnamed vector = row; anything else or mix = error.

unnest(nest_both, data)
unnest(nest_cols, data)
unnest(nest_rows, x, y)

# Lengths must be consistent (otherwise would have to cross?)
# nest_row %>% unnest_row()
nest_rows %>% unnest(x, y)

# bind_rows() handles name/type consistency
# nest_col %>% unnest_col()
nest_cols %>%
  mutate(data = data %>% map(as_tibble)) %>%
  unnest(data)

# What happens if we try do to the "wrong" direction?
nest_rows %>% unnest_col()
nest_cols %>% unnest_row()

# needs column names
# can you supply multiple columns? (yes, but how to supply names? provide numbers by default?)
# can you provide maximum number? need to handle potential raggedness
# (this is starting to feel like separate)
nest_rows %>% unnest_col()

# needs option to capture names
# how to manage types of data col? here would be mix of character and integer
# use purrr::simplify? uses unlist() but guarantees length will be ok

nest_cols %>% unnest(data)
nest_cols %>% unnest(.id = "name") # not picking up name

# would simplify to integer or die trying?
nest_cols %>% unnest(.id = "name", .type = "integer")

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementrectangling 🗄️converting deeply nested lists into tidy data frames

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions