Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interval() and dplyr::filter() display incorrect intervals #621

Closed
giovannotti opened this issue Jan 4, 2018 · 4 comments
Closed

interval() and dplyr::filter() display incorrect intervals #621

giovannotti opened this issue Jan 4, 2018 · 4 comments

Comments

@giovannotti
Copy link

Lubridate's interval() function seems to display incorrect negativ conditional results when I combine it with dplyr::filter().
This is the data frame to test:

library(lubridate)
library(dplyr)
test <- tibble::tibble(id = 1:2,
                       date1 = c(as.Date("1990-01-01", origin = "1970-01-01"), 
                                 as.Date("1993-01-01", origin = "1970-01-01")),
                       date2 = c(as.Date("1991-02-02", origin = "1970-01-01"), 
                                 as.Date("1992-02-02", origin = "1970-01-01")))

When filtered by negative interval condition with dplyr::filter() it displays wrong interval but interestingly right interval start:

test %>%
  mutate(int = interval(date1, date2), intstart = int_start(interval(date1, date2))) %>%
  filter(as.numeric(int) < 0)
  
#   # A tibble: 1 x 5
#      id      date1      date2                            int   intstart
#   <int>     <date>     <date>                 <S4: Interval>     <dttm>
# 1     2 1993-01-01 1992-02-02 1990-01-01 UTC--1989-02-01 UTC 1993-01-01

Without filter everything is fine:

test %>%
  mutate(int = interval(date1, date2), inststart = int_start(interval(date1, date2)))
  
#  A tibble: 2 x 5
#      id      date1      date2                            int  inststart
#   <int>     <date>     <date>                 <S4: Interval>     <dttm>
# 1     1 1990-01-01 1991-02-02 1990-01-01 UTC--1991-02-02 UTC 1990-01-01
# 2     2 1993-01-01 1992-02-02 1993-01-01 UTC--1992-02-02 UTC 1993-01-01

As well when using generic subsetting:

test %>%
  mutate(int = interval(date1, date2), inststart = int_start(interval(date1, date2))) %>%
  .[as.numeric(.$int) < 0,]
  
#  A tibble: 1 x 4
#      id      date1      date2                            int  inststart
#   <int>     <date>     <date>                 <S4: Interval>     <dttm>
# 1     2 1993-01-01 1992-02-02 1993-01-01 UTC--1992-02-02 UTC 1993-01-01

When I then try to extract the interval start with int_start() after filtering with dpylr::filter() I get an error suggesting that filter() doesn't filter the interval variable int.

test %>%
  mutate(int = interval(date1, date2)) %>%
  filter(as.numeric(int) < 0) %>%
  mutate(intstart = int_start(int))
  
  # Error in mutate_impl(.data, dots) : 
  # Column `intstart` must be length 1 (the number of rows), not 2

Session Output:

sessionInfo()

# R version 3.4.3 (2017-11-30)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows >= 8 x64 (build 9200)

# Matrix products: default

# locale:
#  [1] LC_COLLATE=German_Germany.1252  LC_CTYPE=German_Germany.1252    LC_MONETARY=German_Germany.1252 LC_NUMERIC=C                   
#  [5] LC_TIME=German_Germany.1252    

# attached base packages:
#  [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
#  [1] dplyr_0.7.4     lubridate_1.7.1

# loaded via a namespace (and not attached):
#  [1] compiler_3.4.3   assertthat_0.2.0 R6_2.2.2         magrittr_1.5     tools_3.4.3      bindrcpp_0.2     glue_1.2.0       tibble_1.3.4    
#  [9] yaml_2.1.16      Rcpp_0.12.14     stringi_1.1.6    stringr_1.2.0    pkgconfig_2.0.1  rlang_0.1.4      bindr_0.1
@yutannihilation
Copy link
Member

This is the issue on the dplyr's side. tidyverse/dplyr#3206

@giovannotti
Copy link
Author

Ok, sry, I didn't realized it. Thanks.

@vspinu
Copy link
Member

vspinu commented Jan 28, 2018

Yes. It's a well known issue not specific to lubridate (tidyverse/dplyr#2432)

@vspinu vspinu closed this as completed Jan 28, 2018
@yutannihilation
Copy link
Member

yutannihilation commented Jan 29, 2018

It's a well known issue

I feel this is a lessor-known known issue, yet the issue quite a few people may encounter, unfortunately... Considering the situation the issue will not going to be solved soon, some efforts are needed to make this really "known" (e.g. "Known Issues" section on some documentation?). I'm not saying lubridate is responsible for that, tough. ¯\_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants