Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dplyr bind_rows coerces inputs where data.table rbindlist does not #1829

Closed
tchakravarty opened this issue May 15, 2016 · 10 comments
Closed
Labels
feature a feature request or enhancement
Milestone

Comments

@tchakravarty
Copy link

tchakravarty commented May 15, 2016

@hadley I am trying to call bind_rows on a list of data.frames which have the identical set of columns with the identical attributes (or at least to the extent that class can tell). The columns that are getting coerced to numeric have the class attributes c("chron", "date", "times"). data.table::rbindlist is correctly handling these types. Is there a reason for this difference and is dplyr doing the right thing and data.table is not? The latter certainly seems to have the more desirable outcome.

Here is a reproducible example (apologies for the formatting of the dput output):

li_dplyr = structure(
  list(
    df1 = structure(
      list(
        start_datetime = structure(
          c(16802.0833333333, 16802.0833333333, 16802.092025463, 16802.0922337963, 
            16802.1046643519, 16802.1048726852, 16802.1153819444, 16802.1155902778, 
            16802.1261574074), 
          format = structure(
            c("m/d/y", "h:m:s"), 
            .Names = c("dates", "times")
          ), 
          origin = structure(
            c(1, 1, 1970), 
            .Names = c(
              "month", "day", "year"
            )
          ), 
          class = c("chron", "dates", "times")
        )
      ), 
      class = "data.frame", 
      row.names = c(NA,  -9L), 
      .Names = c("start_datetime")
    ), 
    df2 = structure(
      list(
        start_datetime = structure(
          c(16809.0833333333, 16809.0833333333, 
            16809.084849537, 16809.0851041667, 16809.0921180556, 16809.0922569444, 
            16809.1040046296, 16809.1041435185, 16809.115787037), 
          format = structure(
            c("m/d/y", "h:m:s"), 
            .Names = c("dates", "times")
          ), 
          origin = structure(
            c(1, 1, 1970), 
            .Names = c(
              "month", "day", "year"
            )
          ), 
          class = c("chron", "dates", "times")
        )
      ), 
      class = "data.frame", 
      row.names = c(NA,  -9L), 
      .Names = c("start_datetime")
    )
  ), 
  .Names = c("df1", "df2")
)

# original column classes                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
lapply(li_dplyr, function(x) lapply(x, class))

# dplyr
lapply(dplyr::bind_rows(li_dplyr), class)

# data.table
lapply(data.table::rbindlist(li_dplyr), class)
@tchakravarty tchakravarty changed the title dplyr bind_rows coerces inputs where data.table does not dplyr bind_rows coerces inputs where data.table rbindlist does not May 15, 2016
@hadley hadley added the reprex needs a minimal reproducible example label May 26, 2016
@hadley
Copy link
Member

hadley commented May 26, 2016

Please provide a minimal reproducible example.

@tchakravarty
Copy link
Author

@hadley Does the example included work for you?

@hadley
Copy link
Member

hadley commented May 26, 2016

It doesn't even fit all on one screen - please make it MINIMAL.

@tchakravarty
Copy link
Author

@hadley Haha. Sure thing. Now painstakingly hand-formatted for your perusal.

@hadley
Copy link
Member

hadley commented May 26, 2016

It's still not exactly minimal - the shorter it is the more likely that I can quickly locate the bug...

@tchakravarty
Copy link
Author

@hadley Updated to what I would say is minimal.

@hadley
Copy link
Member

hadley commented May 26, 2016

Here is a minimal reprex:

l <- list(data_frame(x = chron::times(1)), data_frame(x = chron::times(2)))
bind_rows(l)
#> Source: local data frame [2 x 1]
#> 
#>       x
#>   <dbl>
#> 1     1
#> 2     2

@hadley hadley added feature a feature request or enhancement data frame and removed reprex needs a minimal reproducible example labels May 26, 2016
@hadley hadley added this to the future milestone May 26, 2016
@tchakravarty
Copy link
Author

@hadley Thanks for taking this on as an enhancement request. Just wondering what the issue is?

@gavinsimpson
Copy link

The problem seems more pervasive that just chron objects. For example the following coercion to "yearmon" "numeric" upon a bind_rows():

l <- list(data_frame(x = zoo::as.yearmon(2016 + (c(0:3) / 12))),
          data_frame(x = zoo::as.yearmon(2016 + (c(0:3) / 12))))
bind_rows(l)
# A tibble: 8 × 1
         x
     <dbl>
1 2016.000
2 2016.083
3 2016.167
4 2016.250
5 2016.000
6 2016.083
7 2016.167
8 2016.250

With the problem occurring whether the data frames are supplied as a list or as individual data frames.

It's quite an impediment to have to check/re-coerce variables types after binding.

@hadley
Copy link
Member

hadley commented Feb 16, 2017

Now part of #2432

@hadley hadley closed this as completed Feb 16, 2017
@lock lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
feature a feature request or enhancement
Projects
None yet
Development

No branches or pull requests

3 participants