Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bind_rows(): accept lists? #1104

Closed
jennybc opened this issue Apr 25, 2015 · 6 comments
Closed

bind_rows(): accept lists? #1104

jennybc opened this issue Apr 25, 2015 · 6 comments
Assignees
Milestone

Comments

@jennybc
Copy link
Member

jennybc commented Apr 25, 2015

This tweet reminded me of this wish.

Sometimes I am disappointed that bind_rows() insists that its input already be data.frames. In that sense, it's not a drop-in replacement for data.table::rbindlist(). Any chance bind_rows() might accept a list of lists?

> my_list <- list(list(x = 1, y = 'a'), list(x = 2, y = 'b'))
> data.table::rbindlist(my_list) %>% str
Classes ‘data.table’ and 'data.frame':  2 obs. of  2 variables:
 $ x: num  1 2
 $ y: chr  "a" "b"
 - attr(*, ".internal.selfref")=<externalptr> 

> dplyr::bind_rows(my_list)
Error: object at index 1 not a data.frame
@romainfrancois
Copy link
Member

I can probably relax that a bit. starting from here:

List rbind_all( StrictListOf<DataFrame, NULL_or_Is<DataFrame> > dots ){
    return rbind__impl(dots) ;
}

Maybe instead of a StrictListOf<DataFrame, NULL_or_Is<DataFrame> > I could use something like a StrictListOf<Bindable> where Bindable is allowed to be NULL, a data.frame or a list with some constraints, e.g. that all the components have equal lengths or something.

@hadley
Copy link
Member

hadley commented Apr 25, 2015

I think it would be reasonable to accept:

  • NULL (ignore)
  • data frame (as currently)
  • a named list where each element is the same length

Maybe call it dataframeable or something like that? dataframeish?

(If we accept lists and null here, we should consider doing it for other verbs too, although maybe it's not so important because bind_rows() is usually done once early in the data import process)

@lionel-
Copy link
Member

lionel- commented Apr 25, 2015

But then how is bind_rows() going to be able to make the difference between a list to be taken as a data frame and a list of data frames, in case this list happens to be named? Wouldn't the data frames be taken as components of a list-column inside one dataframeable list, instead of a list of data frames to bind together?

Also linked to #992 if it gets merged.

@Mullefa
Copy link

Mullefa commented Apr 30, 2015

+1 for accepting NULL's.

As an example, it would be convenient if both these cases returned the iris data set (similar functionality to rbind() in this respect):

bind_rows(iris, NULL)
bind_rows(list(iris, NULL))

@romainfrancois romainfrancois self-assigned this Apr 30, 2015
@jennybc
Copy link
Member Author

jennybc commented Apr 30, 2015

Related wish for the new and improved bind_rows(): an ID variable. If you are row binding a list of data.frames or conformable lists, there's a high chance you want the names from the original list to come in as a variable in the result. This is one of the very best things about, e.g., plyr::ldply(), which I still resort to often in dplyr-ish projects.

@lionel-
Copy link
Member

lionel- commented Apr 30, 2015

I wrote a PR for this, #825, which will need to be adapted to what I did in #992 (if it is still relevant).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants