New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unnest with list columns of differing lengths? #328

Closed
karldw opened this Issue Jul 17, 2017 · 3 comments

Comments

Projects
None yet
2 participants
@karldw
Contributor

karldw commented Jul 17, 2017

unnest currently can't handle multiple list columns with different lengths. If the user requests an unnesting of one list column from a dataframe with multiple, unnest will fail if the number of elements differs. Would it be possible for unnest to copy the other list columns, just as it copies values from standard, atomic columns?

In particular, the current behavior means unnest doesn't work with sf data, since the geometry column is already a list column.

library(sf)
library(dplyr)
library(tidyr)
nc <- st_read(system.file("shape/nc.shp", package = "sf")) %>% 
  slice(1:3) %>%
  select(NAME) %>%
  mutate(y = strsplit(c("a", "d,e,f", "g,h"), ","))

nc
#> Simple feature collection with 3 features and 2 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#> # A tibble: 3 x 3
#>        NAME         y          geometry
#>      <fctr>    <list>  <simple_feature>
#> 1      Ashe <chr [1]> <MULTIPOLYGON...>
#> 2 Alleghany <chr [3]> <MULTIPOLYGON...>
#> 3     Surry <chr [2]> <MULTIPOLYGON...>


# Current behavior:
unnest(nc, y, .drop = FALSE)
#>  Error: All nested columns must have the same number of elements.


# Expected behavior: values in geometry column copied for the newly-created rows
unnest(nc, y, .drop = FALSE)
#> # A tibble: 6 x 2
#>        NAME     y           geometry
#>      <fctr> <chr>   <simple_feature>
#> 1      Ashe     a  <MULTIPOLYGON...>
#> 2 Alleghany     d  <MULTIPOLYGON...>
#> 3 Alleghany     e  <MULTIPOLYGON...>
#> 4 Alleghany     f  <MULTIPOLYGON...>
#> 5     Surry     g  <MULTIPOLYGON...>
#> 6     Surry     h  <MULTIPOLYGON...>

Ref: r-spatial/sf#426

hongyuanjia added a commit to hongyuanjia/eplusr that referenced this issue Nov 1, 2017

@hadley hadley added the feature label Nov 15, 2017

@hadley

This comment has been minimized.

Member

hadley commented Nov 15, 2017

Oooh good idea!

@hadley hadley added the trees 🌲 label Nov 16, 2017

@hadley

This comment has been minimized.

Member

hadley commented Nov 20, 2017

I think this will need some extra syntax, maybe something like unnest(df, y, preserve = x)

@hadley hadley closed this in 82ef03a Nov 20, 2017

@karldw

This comment has been minimized.

Contributor

karldw commented Nov 20, 2017

Thank you! I think there's still an issue when the list variables aren't specified, but .preserve is.

I think the solution is to add something like:

if (is_empty(quos)) {
  list_cols <- names(data)[map_lgl(data, is_list)]
  list_cols <- tidyselect::vars_select(list_cols, -!!! enquo(.preserve))  # deselect .preserve vars
  quos <- syms(list_cols)
}

but the line above is wrong because I don't have the unquotation right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment