New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected lubridate interval behavior when filtering dataframe via dplyr #3206
Comments
Hi, I guess this is the same kind of bug as #2568 as suggested on SO and #1581 is more exact one. If you use Let's set our hope on vctrs... dplyr::filter(dat_mod, .data$id == 7000) %>%
pull(interval) %>%
str
#> Formal class 'Interval' [package "lubridate"] with 3 slots
#> ..@ .Data: num 1209600
#> ..@ start: POSIXct[1:7632], format: "2016-10-11" NA NA NA ...
#> ..@ tzone: chr "UTC"
dat_mod[dat_mod$id == 7000, ] %>%
pull(interval) %>%
str
#> Formal class 'Interval' [package "lubridate"] with 3 slots
#> ..@ .Data: num 1209600
#> ..@ start: POSIXct[1:1], format: "2017-08-02"
#> ..@ tzone: chr "UTC" |
Thanks @yutannihilation! Since this seems to be a much more pervasive issue (and one that appears to omit of a simple solution given how long it's been since some of these issues were posted), perhaps we could just (for now) get a warning based on the class of columns? Maybe something like this would work for my current problem? filter <- function (.data, ...)
{
num_interval_cols <- length(which(unlist(lapply(.data, class)) == "Interval"))
if (num_interval_cols > 0) {warning("S4 Interval class not currently supported inside filter(). Results may not be accurate.")}
UseMethod("filter")
}
|
Thanks!
I think so. One idea is that add a row ID column and subset the original data.frame by the row IDs in the result data.frame. Though I don't think my code is great enough, SO is great place where you can ask for the better version of the code :) myfilter <- function(d, ...) {
preds <- rlang::quos(...)
result <- d %>%
# add row IDs to distinguish rows in the result
tibble::rowid_to_column(var = "rowid") %>%
dplyr::filter(!!! preds)
# overwrite S4 cols by data properly subsetted by `[`
cols_S4 <- colnames(result)[purrr::map_lgl(result, ~ isS4(x = .))]
result[, cols_S4] <- d[result$rowid, cols_S4]
# remove row ID column
dplyr::select(result, -rowid)
} |
Woule the tidyverse team entertain a pure R solution to this? I believe the filter method is written in C - correct? |
Ah, if you thought the code above is a "solution", no. I intended to post a temporary "work around" until vctrs package does the right thing. |
Thanks. Do you think #2432 would fix this issue as well? |
I'm not quite sure how #2432 will be implemented, but I think so. The root cause is the absence of nice ways to dispatch the correct method for non-base types, which is the same as other issues listed on there. |
I'll close this now as a duplicate to #2432 At the moment, we have a workaround in place to essentially refuse to deal with
This is of course temporary until we deal with #2432, but at least this is less surprising. |
This old issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with reprex) and link to this issue. https://reprex.tidyverse.org/ |
Background
I posted this on SO as well, since I'm not entirely certain this is a bug - perhaps just my ignorance.
I am creating a
lubridate
interval vector using a dataset similar to that which is available in the following chunk:Problem
If I try to filter the object via
dplyr
, I get an unexpected interval returned:My own investigation
Every interval (viewed via
dplyr::filter
) appears to be the same length as the expected interval, but all anchored (incorrectly) to 2016-10-11 UTC.I suspect this is a
dplyr::filter
bug as I can search by bracket notation and get the expected result.Also, this does not appear to be an artifact of what is getting displayed in the output. If I assign the filtered object to a new object, the incorrect interval is preserved:
The text was updated successfully, but these errors were encountered: