Issue with bind_rows and types #5358

Fablepongiste · 2020-06-25T14:10:30Z

Should that really crash ?

bind_rows extremely strict with types, probably too strict ?

df1 <- structure(list(TEAM_ID = c("1", "2", "3", "4", "5", "6"), TEAM_ABBREVIATION = c("DEN", 
"DAL", "NYK", "ATL", "CHA", "MIA"), TEAM_NAME = c("Denver Nuggets", 
"Dallas Mavericks", "New York Knicks", "Atlanta Hawks", "Charlotte Hornets", 
"Miami Heat"), GAME_ID = c("1", "2", "3", "4", "5", "6"), GAME_DATE = c("2020-03-11", 
"2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11"
)), row.names = c(NA, 6L), class = "data.frame")

df2 <- structure(list(TEAM_ID = logical(0), TEAM_ABBREVIATION = logical(0), 
    TEAM_NAME = logical(0), GAME_ID = logical(0), GAME_DATE = logical(0)), class = "data.frame", row.names = integer(0))

df <- bind_rows(df1, df2)

Error: Can't combine ..1$TEAM_ID and ..2$TEAM_ID .

When you get no data, types can sometimes be defaulted to logical, should take either type of first data or better the type of the df with data ?

This is only since change to vectrs in dplyr 1.0

Might be on purpose, in case fine i guess

The text was updated successfully, but these errors were encountered:

hadley · 2020-06-25T16:56:47Z

Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session.

Fablepongiste · 2020-06-25T22:32:12Z

library(dplyr)

df1 <- structure(list(TEAM_ID = c("1", "2", "3", "4", "5", "6"),
                      TEAM_ABBREVIATION = c("DEN", 
                                            "DAL", "NYK", "ATL", "CHA", "MIA"), 
                      TEAM_NAME = c("Denver Nuggets", 
                                    "Dallas Mavericks", "New York Knicks", "Atlanta Hawks", "Charlotte Hornets", 
                                    "Miami Heat"), 
                      GAME_ID = c("1", "2", "3", "4", "5", "6"), 
                      GAME_DATE = c("2020-03-11", 
                                    "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11"
                      )), 
                 row.names = c(NA, 6L), class = "data.frame")

df2 <- structure(list(TEAM_ID = logical(0), TEAM_ABBREVIATION = logical(0), 
                      TEAM_NAME = logical(0), GAME_ID = logical(0), GAME_DATE = logical(0)), 
                 class = "data.frame", row.names = integer(0))

df <- bind_rows(df1, df2)
#> Error in bind_rows(df1, df2): could not find function "bind_rows"

^{Created on 2020-06-25 by the reprex package (v0.3.0)}

lionel- · 2020-06-26T05:45:23Z

@Fablepongiste Thank you. Can you please make sure the error message in the reprex corresponds to the one you're reporting? It looks like you're missing a library(dplyr).

romainfrancois · 2020-06-29T13:02:36Z

@Fablepongiste:

library(dplyr, warn.conflicts = FALSE)

df1 <- structure(list(TEAM_ID = c("1", "2", "3", "4", "5", "6"),
                      TEAM_ABBREVIATION = c("DEN", 
                                            "DAL", "NYK", "ATL", "CHA", "MIA"), 
                      TEAM_NAME = c("Denver Nuggets", 
                                    "Dallas Mavericks", "New York Knicks", "Atlanta Hawks", "Charlotte Hornets", 
                                    "Miami Heat"), 
                      GAME_ID = c("1", "2", "3", "4", "5", "6"), 
                      GAME_DATE = c("2020-03-11", 
                                    "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11"
                      )), 
                 row.names = c(NA, 6L), class = "data.frame")

df2 <- structure(list(TEAM_ID = logical(0), TEAM_ABBREVIATION = logical(0), 
                      TEAM_NAME = logical(0), GAME_ID = logical(0), GAME_DATE = logical(0)), 
                 class = "data.frame", row.names = integer(0))

df <- bind_rows(df1, df2)
#> Error: Can't combine `..1$TEAM_ID` <character> and `..2$TEAM_ID` <logical>.
#> Backtrace:
#>     █
#>  1. ├─dplyr::bind_rows(df1, df2)
#>  2. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id) /Users/romainfrancois/git/tidyverse/dplyr/R/bind.r:122:2
#>  3. └─vctrs::vec_default_ptype2(...)
#>  4.   └─vctrs::stop_incompatible_type(...)
#>  5.     └─vctrs:::stop_incompatible(...)
#>  6.       └─vctrs:::stop_vctrs(...)

^{Created on 2020-06-29 by the reprex package (v0.3.0.9001)}

romainfrancois · 2020-06-29T13:07:17Z

One thing along the lines of #5366 could be to ignore data frames with 0 rows when bind_rows(), that would at least solve the issue about "When you get no data..."

romainfrancois · 2020-06-29T13:14:30Z

btw @Fablepongiste part of making a reprex is simplifying the example so that it's easier for us to help you, e.g.

library(dplyr, warn.conflicts = FALSE)

df1 <- data.frame(x = c("a", "b"))
df2 <- data.frame(x = logical())

df <- bind_rows(df1, df2)
#> Error: Can't combine `..1$x` <character> and `..2$x` <logical>.
#> Backtrace:
#>     █
#>  1. ├─dplyr::bind_rows(df1, df2)
#>  2. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id) /Users/romainfrancois/git/tidyverse/dplyr/R/bind.r:122:2
#>  3. └─vctrs::vec_default_ptype2(...)
#>  4.   └─vctrs::stop_incompatible_type(...)
#>  5.     └─vctrs:::stop_incompatible(...)
#>  6.       └─vctrs:::stop_vctrs(...)

^{Created on 2020-06-29 by the reprex package (v0.3.0.9001)}

romainfrancois · 2020-06-29T13:25:57Z

ignoring 0 rows data frames when bind_rows() caused many problems so I don't think this is viable option.

hadley · 2020-06-29T13:37:55Z

I wonder if we should treat a logical() as unspecified? But that would be a big change, and I have a vague feeling that we tried that and it had some other major negative consequence.

In this case, if the root cause is reading a 0-row csv file, I think the solution is to fix the problem upstream by (e.g.) using col_types in readr::read_csv() to ensure that even empty data frames get the correct column types.

Fablepongiste · 2020-06-29T13:41:32Z

Sure for example @romainfrancois , sorry for this.

Seems to me there are other cases, not just reading a csv, where you can get empty data frames, and it is hard to always know them before it happens. Scrapping is good example.

Similar cases do not crash in R base neither in data.table, that's why I find it weird.

hadley · 2020-06-29T14:23:34Z

We are stricter in dplyr/vctrs because we believe it's safer. Sure, it's a bit annoying here, but it protects you from accidents like this:

rbind(
  data.frame(x = 1),
  data.frame(x = "b")
)
#>   x
#> 1 1
#> 2 b

^{Created on 2020-06-29 by the reprex package (v0.3.0)}

This is a deliberate design decision so I'm going to close this issue.

rdatasculptor · 2020-09-04T06:14:53Z

I am not aiming at reopening this issue since I completely understand the delibarate design. The thing is, this new behaviour of bind_rows() causes a lot of "Can't combine"-errors in my code. Because it affects a huge part of my automated reports scripts, I was wondering if there is an easy workaround (other than using rbind() or switching back to an earlier version of dplyr) until I have updated my scripts? Any ideas? thanks in advance!

lionel- · 2020-09-04T07:29:57Z

Sorry there is no easy workaround for allowing character coercions. So pinning dplyr to an older version seems the best way.

rdatasculptor · 2020-09-04T07:49:29Z

Or maybe something like this is possible (and yes I know it's quite ugly and I haven't tried it yet)?

bind_rows_workaround <- function(df1, df2){
df1 <- df1 %>% mutate(across(where(is.logical), as.character))
df2 <- df2 %>% mutate(across(where(is.logical), as.character))
bind_rows(df1, df2)
}

lionel- · 2020-09-04T08:46:52Z

oh yes that could be a good starting point to update your scripts.

Ljupch0 · 2022-01-26T11:41:44Z

Or maybe something like this is possible (and yes I know it's quite ugly and I haven't tried it yet)?
bind_rows_workaround <- function(df1, df2){
df1 <- df1 %>% mutate(across(where(is.logical), as.character))
df2 <- df2 %>% mutate(across(where(is.logical), as.character))
bind_rows(df1, df2)
}

This works as a workaround but it would also convert columns that deserve to be logical. The issue is getting slapped on the wrist when a column becomes logical when it's all NA. Converting an all NA logical column to any other type by default should not count as a type conversion. I think a new column type is needed for these cases, something like the unspecified() @hadley mentioned.

klin333 · 2022-08-02T00:34:22Z

perhaps a workaround for now

# dplyr 1.0+ prevents bind_rows between character columns and empty logical columns,
# work around is remove 0-row tibbles from the bind_rows
# only works for data frames and lists of data frames (can't do list that could be a data frame)
bind_rows_legacy <- function(..., .id = NULL) {
  fallback <- tibble() # best efforts fall back column spec
  args <- list(...)
  processed <- list()
  for(item in args) { # can't use purrr::map because of side effects on fallback
    if (is.data.frame(item)) {
      if (nrow(item) == 0) {
        fallback <- bind_cols(
          fallback, 
          item %>% select(-any_of(colnames(fallback)))
        )
        item <- tibble()
      } 
    } else if (is.list(item)) {
      item <- do.call(bind_rows_legacy, c(item, list(.id = .id)))
      if (nrow(item) == 0) {
        fallback <- bind_cols(
          fallback, 
          item %>% select(-any_of(colnames(fallback)))
        )
        item <- tibble()
      } 
    } else {
      stop("unsupported")
    }
    processed <- c(processed, list(item))
  }

  binded <- do.call(dplyr::bind_rows, c(processed, list(.id = .id)))
  
  binded <- bind_rows(
    binded,
    fallback %>% select(-any_of(colnames(binded)))
  )
  
  binded
}

> bind_rows_legacy(tibble(x = logical(0)), tibble(x = 'b'))
# A tibble: 1 x 1
  x    
  <chr>
1 b    

bind_rows_legacy(list(tibble(x = 'a'), tibble(x = 'b')), tibble(x = logical(0)), tibble(x = 'c', y = 1))
# A tibble: 3 x 2
  x         y
  <chr> <dbl>
1 a        NA
2 b        NA
3 c         1

bind_rows_legacy(list(tibble(x = 'a'), tibble(x = 'b')), tibble(x = logical(0)), tibble(x = 'c', y = TRUE))
# A tibble: 3 x 2
  x     y    
  <chr> <lgl>
1 a     NA   
2 b     NA   
3 c     TRUE 

> bind_rows_legacy(tibble(x = logical(0)))
# A tibble: 0 x 1
# ... with 1 variable: x <lgl>
# i Use `colnames()` to see all variable names

> bind_rows_legacy(tibble(x = logical(0), y = character(0)), tibble(x = 'b'))
# A tibble: 1 x 2
  x     y    
  <chr> <chr>
1 b     NA

hadley closed this as completed Jun 29, 2020

scrameri mentioned this issue Dec 21, 2021

qc_read_collection() error: Can't combine <double> and <character> kassambara/fastqcr#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with bind_rows and types #5358

Issue with bind_rows and types #5358

Fablepongiste commented Jun 25, 2020 •

edited

hadley commented Jun 25, 2020

Fablepongiste commented Jun 25, 2020 •

edited

lionel- commented Jun 26, 2020

romainfrancois commented Jun 29, 2020

romainfrancois commented Jun 29, 2020

romainfrancois commented Jun 29, 2020

romainfrancois commented Jun 29, 2020

hadley commented Jun 29, 2020

Fablepongiste commented Jun 29, 2020

hadley commented Jun 29, 2020

rdatasculptor commented Sep 4, 2020

lionel- commented Sep 4, 2020

rdatasculptor commented Sep 4, 2020

lionel- commented Sep 4, 2020

Ljupch0 commented Jan 26, 2022

klin333 commented Aug 2, 2022 •

edited

Issue with bind_rows and types #5358

Issue with bind_rows and types #5358

Comments

Fablepongiste commented Jun 25, 2020 • edited

hadley commented Jun 25, 2020

Fablepongiste commented Jun 25, 2020 • edited

lionel- commented Jun 26, 2020

romainfrancois commented Jun 29, 2020

romainfrancois commented Jun 29, 2020

romainfrancois commented Jun 29, 2020

romainfrancois commented Jun 29, 2020

hadley commented Jun 29, 2020

Fablepongiste commented Jun 29, 2020

hadley commented Jun 29, 2020

rdatasculptor commented Sep 4, 2020

lionel- commented Sep 4, 2020

rdatasculptor commented Sep 4, 2020

lionel- commented Sep 4, 2020

Ljupch0 commented Jan 26, 2022

klin333 commented Aug 2, 2022 • edited

Fablepongiste commented Jun 25, 2020 •

edited

Fablepongiste commented Jun 25, 2020 •

edited

klin333 commented Aug 2, 2022 •

edited