Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Provide rbind solution that can add list element names as a variable in the output #22
Problem: you have a list of data.frames and the element names convey information. You want to row bind them together and, in the new data.frame, you want a variable for the list element each observation originated in.
Demo: fragment subset of iris into separate data.frames, stored as list.
my_list <- lapply(split(subset(iris, select = -Species), iris$Species), "[", 1:2, )
Simple rbind-y calls cannot recover Species:
do.call("rbind", my_list) # rownames have never looked so good ...
Current workaround: prep with mapply() to restore Species, then rbind (thanks @kara_woo for this snippet)
my_list2 <- mapply(`[<-`, my_list, 'Species', value = names(my_list), SIMPLIFY = FALSE) dplyr::rbind_all(my_list2)
No documentation yet, but
I'm still struggling if
Thank -- yes
As a happy user of the grammar that runs through your packages, the words this task evokes for me are
Morally, this operation seems like
This is what I would imagine to be the outcome of
Hmmm, but maybe
data_frame( x = c(1, 2), y = c(3, 4), z = list(data_frame(a = 1), data_frame(a = 1:3)) )
(and those data frames could contain lists-columns themselves, but I think you'd need a second unnest to handle that)
I'm not certain whether or not this is useful (it might crop up as an alternative way of handling relational data), but it is interesting.
So ... in case anyone else finds this in their search for adding a new column with
What I gather is that (using dplyr 0.5.0) the argument
bind_rows(my_list, .id = 'species')
P.S. thanks again for all of the great work you and your team are doing!
On Sep 28, 2017, at 10:14 AM, Jeff ***@***.***> wrote: Is there a solution for the cases when rbind_rows will not accomplish this task if the constituent lists are of different length? In these cases I usually appeal to do.call(rbind, df) but there is obviously no .id argument to go along with that solution... — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
The dplyr package has a `bind_rows` function. Missing items are filled with NA. Also see: rbind.fill in plyr-pkg.…
-- David Winsemius Alameda, CA, USA 'Any technology distinguishable from magic is insufficiently advanced.' -Gehm's Corollary to Clarke's Third Law