Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directional unnesting #418

Closed
hadley opened this issue Feb 16, 2018 · 3 comments
Closed

Directional unnesting #418

hadley opened this issue Feb 16, 2018 · 3 comments

Comments

@hadley
Copy link
Member

@hadley hadley commented Feb 16, 2018

Copying notes from lingering file on my desktop:

# Some intersting challenges from Jenny
# https://github.com/tidyverse/googledrive/blob/519452fc4d3257354079324e4afede777604848f/data-raw/discovery-doc-prep.R#L118-L124

library(tidyverse)
df <- tibble(
  g = c("a", "a", "b"),
  x = c(1, 3, 5),
  y = c("x", "y", "z")
)
df

# One missing case z = list(1, 2, 3)
# There you just want to simplify the list col to a vector


# rows and cols change; current nest behaviour
# df %>% nest(x, y, .key = "data")
nest_both <- tribble(
  ~g,  ~data,
  "a", tibble(x = c(1, 3), y = c("x", "y")),
  "b", tibble(x = 5, y = "z")
)
nest_both

# rows change; cols don't
# get list of vectors
# df %>% nest_rows(x, y)
nest_rows <- tribble(
  ~g, ~x,       ~y,
  "a", c(1, 3), c("x", "y"),
  "b", 5,       "z"
)

# cols change; rows don't
# use lists, not tibbles to convey intent.
# df %>% nest_cols(x, y, .key = "data")
nest_cols <- tribble(
  ~g,  ~data,
  "a", list(x = 1, y = "x"),
  "a", list(x = 3, y = "y"),
  "b", list(x = 5, y = "z")
)
nest_cols

# unnest ------------------------------------------------------------------

# All of these should be able to automatically determine the
# unnested direction: data frame = both; named vector = col;
# unnamed vector = row; anything else or mix = error.

unnest(nest_both, data)
unnest(nest_cols, data)
unnest(nest_rows, x, y)

# Lengths must be consistent (otherwise would have to cross?)
# nest_row %>% unnest_row()
nest_rows %>% unnest(x, y)

# bind_rows() handles name/type consistency
# nest_col %>% unnest_col()
nest_cols %>%
  mutate(data = data %>% map(as_tibble)) %>%
  unnest(data)

# What happens if we try do to the "wrong" direction?
nest_rows %>% unnest_col()
nest_cols %>% unnest_row()

# needs column names
# can you supply multiple columns? (yes, but how to supply names? provide numbers by default?)
# can you provide maximum number? need to handle potential raggedness
# (this is starting to feel like separate)
nest_rows %>% unnest_col()

# needs option to capture names
# how to manage types of data col? here would be mix of character and integer
# use purrr::simplify? uses unlist() but guarantees length will be ok

nest_cols %>% unnest(data)
nest_cols %>% unnest(.id = "name") # not picking up name

# would simplify to integer or die trying?
nest_cols %>% unnest(.id = "name", .type = "integer")
@dan-reznik

This comment has been hidden.

@gvelasq

This comment has been hidden.

@hadley
Copy link
Member Author

@hadley hadley commented Mar 8, 2019

Probably should refocus unnest() on data frames, and create new unnest_long() and unnest_wide(). unnest() would warn once per session when given a vector.

I think unnest_long() and unnest_wide() would still have to handle data frames (if it doesn't add too much complexity).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants