Provide rbind solution that can add list element names as a variable in the output #22

jennybc · 2014-08-22T17:53:57Z

Problem: you have a list of data.frames and the element names convey information. You want to row bind them together and, in the new data.frame, you want a variable for the list element each observation originated in.

Demo: fragment subset of iris into separate data.frames, stored as list.
Note: Species info carried only via list names

my_list <- lapply(split(subset(iris, select = -Species),
                        iris$Species), "[", 1:2, )

Simple rbind-y calls cannot recover Species:

do.call("rbind", my_list) # rownames have never looked so good ...

##               Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa.1               5.1         3.5          1.4         0.2
## setosa.2               4.9         3.0          1.4         0.2
## versicolor.51          7.0         3.2          4.7         1.4
## versicolor.52          6.4         3.2          4.5         1.5
## virginica.101          6.3         3.3          6.0         2.5
## virginica.102          5.8         2.7          5.1         1.9

dplyr::rbind_all(my_list)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          7.0         3.2          4.7         1.4
## 4          6.4         3.2          4.5         1.5
## 5          6.3         3.3          6.0         2.5
## 6          5.8         2.7          5.1         1.9

Current workaround: prep with mapply() to restore Species, then rbind (thanks @kara_woo for this snippet)

my_list2 <-
  mapply(`[<-`, my_list, 'Species', value = names(my_list), SIMPLIFY = FALSE)
dplyr::rbind_all(my_list2)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          4.9         3.0          1.4         0.2     setosa
## 3          7.0         3.2          4.7         1.4 versicolor
## 4          6.4         3.2          4.5         1.5 versicolor
## 5          6.3         3.3          6.0         2.5  virginica
## 6          5.8         2.7          5.1         1.9  virginica

The text was updated successfully, but these errors were encountered:

hadley · 2014-08-22T21:27:52Z

No documentation yet, but unnest() should do now what you want.

I'm still struggling if unnest() should work on both lists and columns of data frames that are lists. That might be confusing, but it's basically the same behaviour albeit with slightly different output.

jennybc · 2014-08-23T16:23:27Z

Thank -- yes unnest() does it!

As a happy user of the grammar that runs through your packages, the words this task evokes for me are rbind and/or gather. Maybe unnest will seem more natural when I've worked with it more.

Morally, this operation seems like gathering variables into key-value pairs, with really different mechanics. Instead of different levels of a factor represented as separate variables in a data.frame, we've got them represented as separate data.frames.

hadley · 2014-08-23T16:43:50Z

I think it'll be more obvious why it's called unnest() once I sketch out the other pieces, and when you see what nest() does - they're fundamentally about lists of data frames and vectors, where spread() and gather() deal with columns.

jennybc · 2014-08-23T17:10:27Z

I have every faith that it will be the most natural thing when you're done. :)

dwinsemius · 2014-09-20T18:17:52Z

This is what I would imagine to be the outcome of do.call( rbind.fill, df_list) but that's not what the code actually uses. It's an S3 method that seems to be first an rbind operation for the unmatched columns followed by (v)applying append column-wise matched on the basis of names. I wasn't really sure how the append_df would succeed, since it looked just like base::append with some attribute management (to handle Dates, datetimes, and factors presumably) but there is no append.data.frame. I was expecting some lapply(list, append_df), but it appears to be succeeding nonetheless, so probably it's just my confusion.

hadley · 2015-05-18T20:54:19Z

FWIW, I've removed this experimental method because I'm now pretty sure it's a bad idea (and dplyr::bind_rows() should do the equivalent in the next version)

hadley · 2015-05-18T21:33:03Z

Hmmm, but maybe unnest() needs to handle the case where a column of the data frame is a list of data frames:

data_frame(
  x = c(1, 2),
  y = c(3, 4),
  z = list(data_frame(a = 1), data_frame(a = 1:3))
)

(and those data frames could contain lists-columns themselves, but I think you'd need a second unnest to handle that)

I'm not certain whether or not this is useful (it might crop up as an alternative way of handling relational data), but it is interesting.

hadley · 2015-05-18T21:57:55Z

Oh that's #58

dlebauer · 2017-01-06T19:30:00Z

So ... in case anyone else finds this in their search for adding a new column with bind_rows (or rbind_list or do.call(rbind, ...) and return a new column without inelegant convulsions... I'll finish out the example to demonstrate:

By

dplyr::bind_rows() should do the equivalent in the next version

What I gather is that (using dplyr 0.5.0) the argument .id can be used to specify the new column name, so following from the example #22 (comment)

bind_rows(my_list, .id = 'species')

returns

     species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa          5.1         3.5          1.4         0.2
2     setosa          4.9         3.0          1.4         0.2
3 versicolor          7.0         3.2          4.7         1.4
4 versicolor          6.4         3.2          4.5         1.5
5  virginica          6.3         3.3          6.0         2.5
6  virginica          5.8         2.7          5.1         1.9

P.S. thanks again for all of the great work you and your team are doing!

d8aninja · 2017-09-28T17:14:21Z

Is there a solution for the cases when rbind_rows will not accomplish this task if the constituent lists are of different length? In these cases I usually appeal to do.call(rbind, df) but there is obviously no .id argument to go along with that solution...

jennybc · 2017-09-28T19:16:41Z

@d8aninja Do you want to make a little example that shows exactly what you mean (I'm not entirely sure) and ask it over in the tidyverse section of community.rstudio.com? This question is a good fit.

d8aninja · 2017-10-03T13:54:09Z

@jennybc will do, thanks!

dwinsemius · 2017-10-12T17:08:08Z

On Sep 28, 2017, at 10:14 AM, Jeff ***@***.***> wrote: Is there a solution for the cases when rbind_rows will not accomplish this task if the constituent lists are of different length? In these cases I usually appeal to do.call(rbind, df) but there is obviously no .id argument to go along with that solution... — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

The dplyr package has a `bind_rows` function. Missing items are filled with NA. Also see: rbind.fill in plyr-pkg.

…

-- David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law

agilebean · 2019-06-01T10:21:55Z

Thank -- yes unnest() does it!

@jennybc can you show in code how this works?

I tried this with throws an error:

> my_list %>% unnest
Error in UseMethod("unnest_") : 
  no applicable method for 'unnest_' applied to an object of class "list"

poidstotal · 2020-03-19T22:59:06Z

The built in method that works for me was to loop through the list and add the item name as a constant column.

for (i in 1:length(mylist)) {
  mylist[[i]]$newvar <- as.numeric(names(mylist[i]))
}

After that you can apply the usual do.call as following:
mylist<- do.call(rbind,mylist)

hadley closed this as completed in b44eeb6 Aug 22, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide rbind solution that can add list element names as a variable in the output #22

Provide rbind solution that can add list element names as a variable in the output #22

jennybc commented Aug 22, 2014

hadley commented Aug 22, 2014

jennybc commented Aug 23, 2014

hadley commented Aug 23, 2014

jennybc commented Aug 23, 2014

dwinsemius commented Sep 20, 2014

hadley commented May 18, 2015

hadley commented May 18, 2015

hadley commented May 18, 2015

dlebauer commented Jan 6, 2017

d8aninja commented Sep 28, 2017

jennybc commented Sep 28, 2017

d8aninja commented Oct 3, 2017

dwinsemius commented Oct 12, 2017 via email

agilebean commented Jun 1, 2019

poidstotal commented Mar 19, 2020 •

edited

Provide rbind solution that can add list element names as a variable in the output #22

Provide rbind solution that can add list element names as a variable in the output #22

Comments

jennybc commented Aug 22, 2014

hadley commented Aug 22, 2014

jennybc commented Aug 23, 2014

hadley commented Aug 23, 2014

jennybc commented Aug 23, 2014

dwinsemius commented Sep 20, 2014

hadley commented May 18, 2015

hadley commented May 18, 2015

hadley commented May 18, 2015

dlebauer commented Jan 6, 2017

d8aninja commented Sep 28, 2017

jennybc commented Sep 28, 2017

d8aninja commented Oct 3, 2017

dwinsemius commented Oct 12, 2017 via email

agilebean commented Jun 1, 2019

poidstotal commented Mar 19, 2020 • edited

poidstotal commented Mar 19, 2020 •

edited