Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide rbind solution that can add list element names as a variable in the output #22

Closed
jennybc opened this issue Aug 22, 2014 · 15 comments

Comments

@jennybc
Copy link
Member

jennybc commented Aug 22, 2014

Problem: you have a list of data.frames and the element names convey information. You want to row bind them together and, in the new data.frame, you want a variable for the list element each observation originated in.

2014-08-22_rbind-and-store-as var

Demo: fragment subset of iris into separate data.frames, stored as list.
Note: Species info carried only via list names

my_list <- lapply(split(subset(iris, select = -Species),
                        iris$Species), "[", 1:2, )

Simple rbind-y calls cannot recover Species:

do.call("rbind", my_list) # rownames have never looked so good ...
##               Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa.1               5.1         3.5          1.4         0.2
## setosa.2               4.9         3.0          1.4         0.2
## versicolor.51          7.0         3.2          4.7         1.4
## versicolor.52          6.4         3.2          4.5         1.5
## virginica.101          6.3         3.3          6.0         2.5
## virginica.102          5.8         2.7          5.1         1.9
dplyr::rbind_all(my_list)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          7.0         3.2          4.7         1.4
## 4          6.4         3.2          4.5         1.5
## 5          6.3         3.3          6.0         2.5
## 6          5.8         2.7          5.1         1.9

Current workaround: prep with mapply() to restore Species, then rbind (thanks @kara_woo for this snippet)

my_list2 <-
  mapply(`[<-`, my_list, 'Species', value = names(my_list), SIMPLIFY = FALSE)
dplyr::rbind_all(my_list2)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          4.9         3.0          1.4         0.2     setosa
## 3          7.0         3.2          4.7         1.4 versicolor
## 4          6.4         3.2          4.5         1.5 versicolor
## 5          6.3         3.3          6.0         2.5  virginica
## 6          5.8         2.7          5.1         1.9  virginica
@hadley hadley closed this as completed in b44eeb6 Aug 22, 2014
@hadley
Copy link
Member

hadley commented Aug 22, 2014

No documentation yet, but unnest() should do now what you want.

I'm still struggling if unnest() should work on both lists and columns of data frames that are lists. That might be confusing, but it's basically the same behaviour albeit with slightly different output.

@jennybc
Copy link
Member Author

jennybc commented Aug 23, 2014

Thank -- yes unnest() does it!

As a happy user of the grammar that runs through your packages, the words this task evokes for me are rbind and/or gather. Maybe unnest will seem more natural when I've worked with it more.

Morally, this operation seems like gathering variables into key-value pairs, with really different mechanics. Instead of different levels of a factor represented as separate variables in a data.frame, we've got them represented as separate data.frames.

@hadley
Copy link
Member

hadley commented Aug 23, 2014

I think it'll be more obvious why it's called unnest() once I sketch out the other pieces, and when you see what nest() does - they're fundamentally about lists of data frames and vectors, where spread() and gather() deal with columns.

@jennybc
Copy link
Member Author

jennybc commented Aug 23, 2014

I have every faith that it will be the most natural thing when you're done. :)

@dwinsemius
Copy link

This is what I would imagine to be the outcome of do.call( rbind.fill, df_list) but that's not what the code actually uses. It's an S3 method that seems to be first an rbind operation for the unmatched columns followed by (v)applying append column-wise matched on the basis of names. I wasn't really sure how the append_df would succeed, since it looked just like base::append with some attribute management (to handle Dates, datetimes, and factors presumably) but there is no append.data.frame. I was expecting some lapply(list, append_df), but it appears to be succeeding nonetheless, so probably it's just my confusion.

@hadley
Copy link
Member

hadley commented May 18, 2015

FWIW, I've removed this experimental method because I'm now pretty sure it's a bad idea (and dplyr::bind_rows() should do the equivalent in the next version)

@hadley
Copy link
Member

hadley commented May 18, 2015

Hmmm, but maybe unnest() needs to handle the case where a column of the data frame is a list of data frames:

data_frame(
  x = c(1, 2),
  y = c(3, 4),
  z = list(data_frame(a = 1), data_frame(a = 1:3))
)

(and those data frames could contain lists-columns themselves, but I think you'd need a second unnest to handle that)

I'm not certain whether or not this is useful (it might crop up as an alternative way of handling relational data), but it is interesting.

@hadley
Copy link
Member

hadley commented May 18, 2015

Oh that's #58

@dlebauer
Copy link

dlebauer commented Jan 6, 2017

So ... in case anyone else finds this in their search for adding a new column with bind_rows (or rbind_list or do.call(rbind, ...) and return a new column without inelegant convulsions... I'll finish out the example to demonstrate:

By

dplyr::bind_rows() should do the equivalent in the next version

What I gather is that (using dplyr 0.5.0) the argument .id can be used to specify the new column name, so following from the example #22 (comment)

bind_rows(my_list, .id = 'species')

returns

     species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa          5.1         3.5          1.4         0.2
2     setosa          4.9         3.0          1.4         0.2
3 versicolor          7.0         3.2          4.7         1.4
4 versicolor          6.4         3.2          4.5         1.5
5  virginica          6.3         3.3          6.0         2.5
6  virginica          5.8         2.7          5.1         1.9

P.S. thanks again for all of the great work you and your team are doing!

@d8aninja
Copy link

Is there a solution for the cases when rbind_rows will not accomplish this task if the constituent lists are of different length? In these cases I usually appeal to do.call(rbind, df) but there is obviously no .id argument to go along with that solution...

@jennybc
Copy link
Member Author

jennybc commented Sep 28, 2017

@d8aninja Do you want to make a little example that shows exactly what you mean (I'm not entirely sure) and ask it over in the tidyverse section of community.rstudio.com? This question is a good fit.

@d8aninja
Copy link

d8aninja commented Oct 3, 2017

@jennybc will do, thanks!

@dwinsemius
Copy link

dwinsemius commented Oct 12, 2017 via email

@agilebean
Copy link

Thank -- yes unnest() does it!

@jennybc can you show in code how this works?

I tried this with throws an error:

> my_list %>% unnest
Error in UseMethod("unnest_") : 
  no applicable method for 'unnest_' applied to an object of class "list"

@poidstotal
Copy link

poidstotal commented Mar 19, 2020

The built in method that works for me was to loop through the list and add the item name as a constant column.

for (i in 1:length(mylist)) {
  mylist[[i]]$newvar <- as.numeric(names(mylist[i]))
}

After that you can apply the usual do.call as following:
mylist<- do.call(rbind,mylist)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants