Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter on lubridate interval does not filter start slot #1581

Closed
ajholguin opened this issue Dec 11, 2015 · 7 comments
Closed

filter on lubridate interval does not filter start slot #1581

ajholguin opened this issue Dec 11, 2015 · 7 comments
Labels
bug an unexpected problem or unintended behavior
Milestone

Comments

@ajholguin
Copy link

It looks like filter is sub-setting the ".Data" slot, but not the corresponding "start" slot.

library(dplyr)
library(lubridate)

dat <- data.frame(id = 1:2,
                  intervals = c(interval(ymd(20090201), ymd(20090202)), interval(ymd(20090202), ymd(20090203))))

dat_filt <- dat %>% 
  filter(id == 1)

# data slot is filtered, but start slot is not
str(dat_filt$intervals)

result:

Formal class 'Interval' [package "lubridate"] with 3 slots
  ..@ .Data: num 86400
  ..@ start: POSIXct[1:2], format: "2009-02-01" "2009-02-02"
  ..@ tzone: chr "UTC"
length(dat_filt$intervals)         # --> 1
length(dat_filt$intervals@start)   # --> 2
@hadley
Copy link
Member

hadley commented Mar 1, 2016

@romainfrancois could you please take a look at what it would take to fix this?

@krlmlr
Copy link
Member

krlmlr commented Sep 25, 2016

Perhaps we can call [ from the C++ code in the general case, this will also get us POSIXlt support for free.

@hadley
Copy link
Member

hadley commented Sep 25, 2016

I think we need specialised methods for performance, but maybe we can fall back for unknown types. I'm planning on moving this code in r-lib/vctrs#7 (at least the low level vector stuff, then the data frame assembly can remain in dplyr, or possibly move to tibble)

@krlmlr
Copy link
Member

krlmlr commented Sep 25, 2016

r-lib/vctrs#7 is about coercion, but I agree it's good to have that logic available in C++ and fall back to R only as a last resort.

@lelandbr
Copy link

Just fyi, what I'm assuming is this same issue also affects slice, and arrange, as well as summarize with nth, first, or last

Here is a test df I made for #2214, which I'll include here bc I think I might be seeing a slightly different phenotype from @ajholguin above.

Each column is a different class. Either a duration (1:6 days) or a date (1st through 6th), so that incorrect row subsetting or ordering will be easily apparent. lubridate columns are on the right.

df <- tibble(grps          = c("a","a","b","b","c","c"), 
             to.fill       = c(1,NA,3,NA,5,NA),
             to.arrange    = 6:1,
             base.Date     = as.Date(0:5,origin = "1970-01-01"),
             base.POSIXct  = as.POSIXct(as.character(base.Date)),
             # base.POSIXlt: tibble does not allow POSIXlt columns
             base.difftime = as.difftime(1:6,units="days"),
             date.date     = date::as.date(1:6), #doesn't display very well in a data.frame
             chron.dates   = chron::chron(0:5),
             chron.times   = chron::chron(times. = (1:6)),
             lubr.Interval = lubridate::interval( as_date(0:5), as_date(10:15) ),
             lubr.Period   = lubridate::period(days=1:6),
             lubr.Duration = lubridate::ddays(1:6)
             )

And here are what I assume must be the same or at least very similar to the filter issue:

# slice is also affected, in addition to filter:
df %>% filter(to.fill %in% c(1,3,5))
df %>% slice(c(1,3,5))
df[c(1,3,5),] # this should be the correct output

# same with arrange:
df %>% arrange(to.arrange)
df[6:1,] # this should be the correct output

# sample_n seems to work though
df %>% sample_n(3)

# group_by() %>% summarize(nth), same problem?
df %>% group_by(grps) %>% summarize_all( funs(nth(.,1)) )

# Similar problem with first() and last(), 
# but with a twist ... there are some NAs and 0s introduced
df %>% group_by(grps) %>% summarize_all(first) 
df %>% group_by(grps) %>% summarize_all(last) 

@krlmlr
Copy link
Member

krlmlr commented Nov 7, 2016

Should be taken care of in r-lib/vctrs#27.

@hadley
Copy link
Member

hadley commented Feb 16, 2017

Now part of #2432

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

4 participants