-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with bind_rows and types #5358
Comments
Could you please rework your reproducible example to use the reprex package ? That makes it easier to see both the input and the output, formatted in such a way that I can easily re-run in a local session. |
library(dplyr)
df1 <- structure(list(TEAM_ID = c("1", "2", "3", "4", "5", "6"),
TEAM_ABBREVIATION = c("DEN",
"DAL", "NYK", "ATL", "CHA", "MIA"),
TEAM_NAME = c("Denver Nuggets",
"Dallas Mavericks", "New York Knicks", "Atlanta Hawks", "Charlotte Hornets",
"Miami Heat"),
GAME_ID = c("1", "2", "3", "4", "5", "6"),
GAME_DATE = c("2020-03-11",
"2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11"
)),
row.names = c(NA, 6L), class = "data.frame")
df2 <- structure(list(TEAM_ID = logical(0), TEAM_ABBREVIATION = logical(0),
TEAM_NAME = logical(0), GAME_ID = logical(0), GAME_DATE = logical(0)),
class = "data.frame", row.names = integer(0))
df <- bind_rows(df1, df2)
#> Error in bind_rows(df1, df2): could not find function "bind_rows" Created on 2020-06-25 by the reprex package (v0.3.0) |
@Fablepongiste Thank you. Can you please make sure the error message in the reprex corresponds to the one you're reporting? It looks like you're missing a |
library(dplyr, warn.conflicts = FALSE)
df1 <- structure(list(TEAM_ID = c("1", "2", "3", "4", "5", "6"),
TEAM_ABBREVIATION = c("DEN",
"DAL", "NYK", "ATL", "CHA", "MIA"),
TEAM_NAME = c("Denver Nuggets",
"Dallas Mavericks", "New York Knicks", "Atlanta Hawks", "Charlotte Hornets",
"Miami Heat"),
GAME_ID = c("1", "2", "3", "4", "5", "6"),
GAME_DATE = c("2020-03-11",
"2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11", "2020-03-11"
)),
row.names = c(NA, 6L), class = "data.frame")
df2 <- structure(list(TEAM_ID = logical(0), TEAM_ABBREVIATION = logical(0),
TEAM_NAME = logical(0), GAME_ID = logical(0), GAME_DATE = logical(0)),
class = "data.frame", row.names = integer(0))
df <- bind_rows(df1, df2)
#> Error: Can't combine `..1$TEAM_ID` <character> and `..2$TEAM_ID` <logical>.
#> Backtrace:
#> █
#> 1. ├─dplyr::bind_rows(df1, df2)
#> 2. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id) /Users/romainfrancois/git/tidyverse/dplyr/R/bind.r:122:2
#> 3. └─vctrs::vec_default_ptype2(...)
#> 4. └─vctrs::stop_incompatible_type(...)
#> 5. └─vctrs:::stop_incompatible(...)
#> 6. └─vctrs:::stop_vctrs(...) Created on 2020-06-29 by the reprex package (v0.3.0.9001) |
One thing along the lines of #5366 could be to ignore data frames with 0 rows when |
btw @Fablepongiste part of making a reprex is simplifying the example so that it's easier for us to help you, e.g. library(dplyr, warn.conflicts = FALSE)
df1 <- data.frame(x = c("a", "b"))
df2 <- data.frame(x = logical())
df <- bind_rows(df1, df2)
#> Error: Can't combine `..1$x` <character> and `..2$x` <logical>.
#> Backtrace:
#> █
#> 1. ├─dplyr::bind_rows(df1, df2)
#> 2. │ └─vctrs::vec_rbind(!!!dots, .names_to = .id) /Users/romainfrancois/git/tidyverse/dplyr/R/bind.r:122:2
#> 3. └─vctrs::vec_default_ptype2(...)
#> 4. └─vctrs::stop_incompatible_type(...)
#> 5. └─vctrs:::stop_incompatible(...)
#> 6. └─vctrs:::stop_vctrs(...) Created on 2020-06-29 by the reprex package (v0.3.0.9001) |
ignoring 0 rows data frames when |
I wonder if we should treat a In this case, if the root cause is reading a 0-row csv file, I think the solution is to fix the problem upstream by (e.g.) using |
Sure for example @romainfrancois , sorry for this. Seems to me there are other cases, not just reading a csv, where you can get empty data frames, and it is hard to always know them before it happens. Scrapping is good example. Similar cases do not crash in R base neither in data.table, that's why I find it weird. |
We are stricter in dplyr/vctrs because we believe it's safer. Sure, it's a bit annoying here, but it protects you from accidents like this: rbind(
data.frame(x = 1),
data.frame(x = "b")
)
#> x
#> 1 1
#> 2 b Created on 2020-06-29 by the reprex package (v0.3.0) This is a deliberate design decision so I'm going to close this issue. |
I am not aiming at reopening this issue since I completely understand the delibarate design. The thing is, this new behaviour of |
Sorry there is no easy workaround for allowing character coercions. So pinning dplyr to an older version seems the best way. |
Or maybe something like this is possible (and yes I know it's quite ugly and I haven't tried it yet)?
|
oh yes that could be a good starting point to update your scripts. |
This works as a workaround but it would also convert columns that deserve to be logical. The issue is getting slapped on the wrist when a column becomes logical when it's all NA. Converting an all NA logical column to any other type by default should not count as a type conversion. I think a new column type is needed for these cases, something like the |
perhaps a workaround for now # dplyr 1.0+ prevents bind_rows between character columns and empty logical columns,
# work around is remove 0-row tibbles from the bind_rows
# only works for data frames and lists of data frames (can't do list that could be a data frame)
bind_rows_legacy <- function(..., .id = NULL) {
fallback <- tibble() # best efforts fall back column spec
args <- list(...)
processed <- list()
for(item in args) { # can't use purrr::map because of side effects on fallback
if (is.data.frame(item)) {
if (nrow(item) == 0) {
fallback <- bind_cols(
fallback,
item %>% select(-any_of(colnames(fallback)))
)
item <- tibble()
}
} else if (is.list(item)) {
item <- do.call(bind_rows_legacy, c(item, list(.id = .id)))
if (nrow(item) == 0) {
fallback <- bind_cols(
fallback,
item %>% select(-any_of(colnames(fallback)))
)
item <- tibble()
}
} else {
stop("unsupported")
}
processed <- c(processed, list(item))
}
binded <- do.call(dplyr::bind_rows, c(processed, list(.id = .id)))
binded <- bind_rows(
binded,
fallback %>% select(-any_of(colnames(binded)))
)
binded
}
|
Should that really crash ?
bind_rows extremely strict with types, probably too strict ?
Error: Can't combine
..1$TEAM_ID
and..2$TEAM_ID
.When you get no data, types can sometimes be defaulted to logical, should take either type of first data or better the type of the df with data ?
This is only since change to vectrs in dplyr 1.0
Might be on purpose, in case fine i guess
The text was updated successfully, but these errors were encountered: