filter on lubridate interval does not filter start slot #1581

ajholguin · 2015-12-11T01:06:11Z

It looks like filter is sub-setting the ".Data" slot, but not the corresponding "start" slot.

library(dplyr)
library(lubridate)

dat <- data.frame(id = 1:2,
                  intervals = c(interval(ymd(20090201), ymd(20090202)), interval(ymd(20090202), ymd(20090203))))

dat_filt <- dat %>% 
  filter(id == 1)

# data slot is filtered, but start slot is not
str(dat_filt$intervals)

result:

Formal class 'Interval' [package "lubridate"] with 3 slots
  ..@ .Data: num 86400
  ..@ start: POSIXct[1:2], format: "2009-02-01" "2009-02-02"
  ..@ tzone: chr "UTC"

length(dat_filt$intervals)         # --> 1
length(dat_filt$intervals@start)   # --> 2

hadley · 2016-03-01T15:20:55Z

@romainfrancois could you please take a look at what it would take to fix this?

krlmlr · 2016-09-25T08:42:12Z

Perhaps we can call [ from the C++ code in the general case, this will also get us POSIXlt support for free.

hadley · 2016-09-25T13:48:59Z

I think we need specialised methods for performance, but maybe we can fall back for unknown types. I'm planning on moving this code in r-lib/vctrs#7 (at least the low level vector stuff, then the data frame assembly can remain in dplyr, or possibly move to tibble)

krlmlr · 2016-09-25T19:16:35Z

r-lib/vctrs#7 is about coercion, but I agree it's good to have that logic available in C++ and fall back to R only as a last resort.

lelandbr · 2016-10-27T21:51:26Z

Just fyi, what I'm assuming is this same issue also affects slice, and arrange, as well as summarize with nth, first, or last

Here is a test df I made for #2214, which I'll include here bc I think I might be seeing a slightly different phenotype from @ajholguin above.

Each column is a different class. Either a duration (1:6 days) or a date (1st through 6th), so that incorrect row subsetting or ordering will be easily apparent. lubridate columns are on the right.

df <- tibble(grps          = c("a","a","b","b","c","c"), 
             to.fill       = c(1,NA,3,NA,5,NA),
             to.arrange    = 6:1,
             base.Date     = as.Date(0:5,origin = "1970-01-01"),
             base.POSIXct  = as.POSIXct(as.character(base.Date)),
             # base.POSIXlt: tibble does not allow POSIXlt columns
             base.difftime = as.difftime(1:6,units="days"),
             date.date     = date::as.date(1:6), #doesn't display very well in a data.frame
             chron.dates   = chron::chron(0:5),
             chron.times   = chron::chron(times. = (1:6)),
             lubr.Interval = lubridate::interval( as_date(0:5), as_date(10:15) ),
             lubr.Period   = lubridate::period(days=1:6),
             lubr.Duration = lubridate::ddays(1:6)
             )

And here are what I assume must be the same or at least very similar to the filter issue:

# slice is also affected, in addition to filter:
df %>% filter(to.fill %in% c(1,3,5))
df %>% slice(c(1,3,5))
df[c(1,3,5),] # this should be the correct output

# same with arrange:
df %>% arrange(to.arrange)
df[6:1,] # this should be the correct output

# sample_n seems to work though
df %>% sample_n(3)

# group_by() %>% summarize(nth), same problem?
df %>% group_by(grps) %>% summarize_all( funs(nth(.,1)) )

# Similar problem with first() and last(), 
# but with a twist ... there are some NAs and 0s introduced
df %>% group_by(grps) %>% summarize_all(first) 
df %>% group_by(grps) %>% summarize_all(last)

krlmlr · 2016-11-07T23:26:24Z

Should be taken care of in r-lib/vctrs#27.

hadley · 2017-02-16T20:48:05Z

Now part of #2432

hadley added the bug an unexpected problem or unintended behavior label Mar 1, 2016

hadley mentioned this issue Mar 12, 2016

bind_rows does not preserve lubridate interval class #1708

Closed

hadley added the data frame label May 26, 2016

hadley added this to the future milestone May 26, 2016

hadley mentioned this issue Jun 7, 2016

Windowing operations on intervals #1879

Closed

hadley mentioned this issue Aug 10, 2016

Bug with mutate() involving POSIXct values (only when using group_by()) #2061

Closed

krlmlr mentioned this issue Nov 7, 2016

filter() on dataframe containing lubridate Interval: corrupt data frame #2221

Closed

hadley mentioned this issue Feb 16, 2017

Improve support non-base types #2432

Closed

hadley closed this as completed Feb 16, 2017

yutannihilation mentioned this issue Nov 14, 2017

Unexpected lubridate interval behavior when filtering dataframe via dplyr #3206

Closed

vspinu mentioned this issue Nov 27, 2017

Changes in calculated intervals when left_joining tidyverse/lubridate#614

Closed

lock bot locked as resolved and limited conversation to collaborators Jun 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

filter on lubridate interval does not filter start slot #1581

filter on lubridate interval does not filter start slot #1581

ajholguin commented Dec 11, 2015

hadley commented Mar 1, 2016

krlmlr commented Sep 25, 2016

hadley commented Sep 25, 2016

krlmlr commented Sep 25, 2016

lelandbr commented Oct 27, 2016

krlmlr commented Nov 7, 2016

hadley commented Feb 16, 2017

filter on lubridate interval does not filter start slot #1581

filter on lubridate interval does not filter start slot #1581

Comments

ajholguin commented Dec 11, 2015

hadley commented Mar 1, 2016

krlmlr commented Sep 25, 2016

hadley commented Sep 25, 2016

krlmlr commented Sep 25, 2016

lelandbr commented Oct 27, 2016

krlmlr commented Nov 7, 2016

hadley commented Feb 16, 2017