Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unnest with list columns of differing lengths? #328

karldw opened this issue Jul 17, 2017 · 3 comments

Allow unnest with list columns of differing lengths? #328

karldw opened this issue Jul 17, 2017 · 3 comments


Copy link

@karldw karldw commented Jul 17, 2017

unnest currently can't handle multiple list columns with different lengths. If the user requests an unnesting of one list column from a dataframe with multiple, unnest will fail if the number of elements differs. Would it be possible for unnest to copy the other list columns, just as it copies values from standard, atomic columns?

In particular, the current behavior means unnest doesn't work with sf data, since the geometry column is already a list column.

nc <- st_read(system.file("shape/nc.shp", package = "sf")) %>% 
  slice(1:3) %>%
  select(NAME) %>%
  mutate(y = strsplit(c("a", "d,e,f", "g,h"), ","))

#> Simple feature collection with 3 features and 2 fields
#> geometry type:  MULTIPOLYGON
#> dimension:      XY
#> bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
#> epsg (SRID):    4267
#> proj4string:    +proj=longlat +datum=NAD27 +no_defs
#> # A tibble: 3 x 3
#>        NAME         y          geometry
#>      <fctr>    <list>  <simple_feature>
#> 1      Ashe <chr [1]> <MULTIPOLYGON...>
#> 2 Alleghany <chr [3]> <MULTIPOLYGON...>
#> 3     Surry <chr [2]> <MULTIPOLYGON...>

# Current behavior:
unnest(nc, y, .drop = FALSE)
#>  Error: All nested columns must have the same number of elements.

# Expected behavior: values in geometry column copied for the newly-created rows
unnest(nc, y, .drop = FALSE)
#> # A tibble: 6 x 2
#>        NAME     y           geometry
#>      <fctr> <chr>   <simple_feature>
#> 1      Ashe     a  <MULTIPOLYGON...>
#> 2 Alleghany     d  <MULTIPOLYGON...>
#> 3 Alleghany     e  <MULTIPOLYGON...>
#> 4 Alleghany     f  <MULTIPOLYGON...>
#> 5     Surry     g  <MULTIPOLYGON...>
#> 6     Surry     h  <MULTIPOLYGON...>

Ref: r-spatial/sf#426

hongyuanjia added a commit to hongyuanjia/eplusr that referenced this issue Nov 1, 2017
@hadley hadley added the feature label Nov 15, 2017
Copy link

@hadley hadley commented Nov 15, 2017

Oooh good idea!

Copy link

@hadley hadley commented Nov 20, 2017

I think this will need some extra syntax, maybe something like unnest(df, y, preserve = x)

@hadley hadley closed this in 82ef03a Nov 20, 2017
Copy link
Contributor Author

@karldw karldw commented Nov 20, 2017

Thank you! I think there's still an issue when the list variables aren't specified, but .preserve is.

I think the solution is to add something like:

if (is_empty(quos)) {
  list_cols <- names(data)[map_lgl(data, is_list)]
  list_cols <- tidyselect::vars_select(list_cols, -!!! enquo(.preserve))  # deselect .preserve vars
  quos <- syms(list_cols)

but the line above is wrong because I don't have the unquotation right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants