Skip to content

A/a in orders argument produces unexpected failures interacting with presence of B/b #1104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmobrien opened this issue Dec 15, 2022 · 1 comment

Comments

@jmobrien
Copy link
Contributor

When A/a are explicitly provided to the orders argument, they sometimes can produce issues despite the specification being otherwise correct:

  • With full weekday names, having either "A" or "a" will result in a failure to parse if "B" instead of "b" is used (similar to Both B and b now call C parser 0b in guess_formats() #972, but seemingly unrelated)
  • With abbreviated weekday names, the same as above occurs, plus an "Ab" combination will fail to parse.

While I realize that, in practice, neither "A" nor "a" actually need to be present in the default case, it does seem strange that including them causes such failures.

require(tidyverse, quietly = TRUE)
require(lubridate, quietly = TRUE)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union

# Build data frame for testing:

# Date with full and abbreviated weekday name:
full <- "Wednesday Apr 24 13:45:07 GMT-0500 2019"
abbrev <- "Wed Apr 24 13:45:07 GMT-0500 2019"
entries <- 
  data.frame(
    type = c("fullname", "abbrev"),
    datestring = c(full, abbrev)
  )

# Order options:
orders <- 
  c(
    # "A" & "a" (which docs say should behave identically to each other), plus "B"
    "aBdHMSzY","aBdHMSzY","ABdHMSzY","ABdHMSzY",
    # Same, but with "b" instead (which, again, should behave identically to "B")
    "abdHMSzY","abdHMSzY","AbdHMSzY","AbdHMSzY",
    # Versions without A/a
    "BdHMSzY","BdHMSzY","BdHMSzY","BdHMSzY",
    "bdHMSzY","bdHMSzY","bdHMSzY","bdHMSzY"
  )

# Testing df:
testing_df <- 
  expand_grid(entries, orders)

# Testing what works for parsing or not:
test_parse <- 
  testing_df |> 
  mutate(
    date_parse = 
      suppressWarnings(map2_dbl(datestring, orders, parse_date_time)) |> 
      as.POSIXct(origin = "1960-01-01")
  )

print(test_parse, n =  Inf)
#> # A tibble: 32 × 4
#>    type     datestring                              orders   date_parse         
#>    <chr>    <chr>                                   <chr>    <dttm>             
#>  1 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 aBdHMSzY NA                 
#>  2 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 aBdHMSzY NA                 
#>  3 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 ABdHMSzY NA                 
#>  4 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 ABdHMSzY NA                 
#>  5 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 abdHMSzY 2009-04-23 13:45:07
#>  6 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 abdHMSzY 2009-04-23 13:45:07
#>  7 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 AbdHMSzY 2009-04-23 13:45:07
#>  8 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 AbdHMSzY 2009-04-23 13:45:07
#>  9 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 BdHMSzY  2009-04-23 13:45:07
#> 10 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 BdHMSzY  2009-04-23 13:45:07
#> 11 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 BdHMSzY  2009-04-23 13:45:07
#> 12 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 BdHMSzY  2009-04-23 13:45:07
#> 13 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 bdHMSzY  2009-04-23 13:45:07
#> 14 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 bdHMSzY  2009-04-23 13:45:07
#> 15 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 bdHMSzY  2009-04-23 13:45:07
#> 16 fullname Wednesday Apr 24 13:45:07 GMT-0500 2019 bdHMSzY  2009-04-23 13:45:07
#> 17 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       aBdHMSzY NA                 
#> 18 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       aBdHMSzY NA                 
#> 19 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       ABdHMSzY NA                 
#> 20 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       ABdHMSzY NA                 
#> 21 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       abdHMSzY 2009-04-23 13:45:07
#> 22 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       abdHMSzY 2009-04-23 13:45:07
#> 23 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       AbdHMSzY NA                 
#> 24 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       AbdHMSzY NA                 
#> 25 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       BdHMSzY  2009-04-23 13:45:07
#> 26 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       BdHMSzY  2009-04-23 13:45:07
#> 27 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       BdHMSzY  2009-04-23 13:45:07
#> 28 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       BdHMSzY  2009-04-23 13:45:07
#> 29 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       bdHMSzY  2009-04-23 13:45:07
#> 30 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       bdHMSzY  2009-04-23 13:45:07
#> 31 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       bdHMSzY  2009-04-23 13:45:07
#> 32 abbrev   Wed Apr 24 13:45:07 GMT-0500 2019       bdHMSzY  2009-04-23 13:45:07

# Testing what the guessing tool gives back:
test_guess <- 
  testing_df |> 
  mutate(
    guess = 
      map2(datestring, orders, guess_formats)
  ) |> 
  select(-datestring) |> 
  unnest(guess) |> 
  group_by(across(-guess)) |> 
  mutate(count = row_number(),
         .before = guess) |> 
  ungroup()


print(test_guess, n = Inf)
#> # A tibble: 66 × 4
#>    type     orders   count guess                              
#>    <chr>    <chr>    <int> <chr>                              
#>  1 fullname aBdHMSzY     1 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#>  2 fullname aBdHMSzY     2 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#>  3 fullname ABdHMSzY     1 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#>  4 fullname ABdHMSzY     2 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#>  5 fullname abdHMSzY     1 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#>  6 fullname abdHMSzY     2 %A %b %d %H:%M:%S GMT%Oz %Y        
#>  7 fullname abdHMSzY     3 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#>  8 fullname abdHMSzY     4 %A %b %d %H:%M:%S GMT%Oz %Y        
#>  9 fullname AbdHMSzY     1 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 10 fullname AbdHMSzY     2 %A %b %d %H:%M:%S GMT%Oz %Y        
#> 11 fullname AbdHMSzY     3 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 12 fullname AbdHMSzY     4 %A %b %d %H:%M:%S GMT%Oz %Y        
#> 13 fullname BdHMSzY      1 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 14 fullname BdHMSzY      2 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 15 fullname BdHMSzY      3 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 16 fullname BdHMSzY      4 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 17 fullname BdHMSzY      5 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 18 fullname BdHMSzY      6 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 19 fullname BdHMSzY      7 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 20 fullname BdHMSzY      8 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 21 fullname bdHMSzY      1 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 22 fullname bdHMSzY      2 %A %b %d %H:%M:%S GMT%Oz %Y        
#> 23 fullname bdHMSzY      3 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 24 fullname bdHMSzY      4 Wednesday %b %d %H:%M:%S GMT%Oz %Y 
#> 25 fullname bdHMSzY      5 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 26 fullname bdHMSzY      6 %A %b %d %H:%M:%S GMT%Oz %Y        
#> 27 fullname bdHMSzY      7 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 28 fullname bdHMSzY      8 Wednesday %b %d %H:%M:%S GMT%Oz %Y 
#> 29 fullname bdHMSzY      9 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 30 fullname bdHMSzY     10 %A %b %d %H:%M:%S GMT%Oz %Y        
#> 31 fullname bdHMSzY     11 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 32 fullname bdHMSzY     12 Wednesday %b %d %H:%M:%S GMT%Oz %Y 
#> 33 fullname bdHMSzY     13 %A %Ob %d %H:%M:%S GMT%Oz %Y       
#> 34 fullname bdHMSzY     14 %A %b %d %H:%M:%S GMT%Oz %Y        
#> 35 fullname bdHMSzY     15 Wednesday %Ob %d %H:%M:%S GMT%Oz %Y
#> 36 fullname bdHMSzY     16 Wednesday %b %d %H:%M:%S GMT%Oz %Y 
#> 37 abbrev   aBdHMSzY     1 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 38 abbrev   aBdHMSzY     2 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 39 abbrev   abdHMSzY     1 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 40 abbrev   abdHMSzY     2 %a %b %d %H:%M:%S GMT%Oz %Y        
#> 41 abbrev   abdHMSzY     3 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 42 abbrev   abdHMSzY     4 %a %b %d %H:%M:%S GMT%Oz %Y        
#> 43 abbrev   BdHMSzY      1 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 44 abbrev   BdHMSzY      2 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 45 abbrev   BdHMSzY      3 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 46 abbrev   BdHMSzY      4 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 47 abbrev   BdHMSzY      5 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 48 abbrev   BdHMSzY      6 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 49 abbrev   BdHMSzY      7 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 50 abbrev   BdHMSzY      8 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 51 abbrev   bdHMSzY      1 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 52 abbrev   bdHMSzY      2 %a %b %d %H:%M:%S GMT%Oz %Y        
#> 53 abbrev   bdHMSzY      3 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 54 abbrev   bdHMSzY      4 Wed %b %d %H:%M:%S GMT%Oz %Y       
#> 55 abbrev   bdHMSzY      5 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 56 abbrev   bdHMSzY      6 %a %b %d %H:%M:%S GMT%Oz %Y        
#> 57 abbrev   bdHMSzY      7 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 58 abbrev   bdHMSzY      8 Wed %b %d %H:%M:%S GMT%Oz %Y       
#> 59 abbrev   bdHMSzY      9 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 60 abbrev   bdHMSzY     10 %a %b %d %H:%M:%S GMT%Oz %Y        
#> 61 abbrev   bdHMSzY     11 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 62 abbrev   bdHMSzY     12 Wed %b %d %H:%M:%S GMT%Oz %Y       
#> 63 abbrev   bdHMSzY     13 %a %Ob %d %H:%M:%S GMT%Oz %Y       
#> 64 abbrev   bdHMSzY     14 %a %b %d %H:%M:%S GMT%Oz %Y        
#> 65 abbrev   bdHMSzY     15 Wed %Ob %d %H:%M:%S GMT%Oz %Y      
#> 66 abbrev   bdHMSzY     16 Wed %b %d %H:%M:%S GMT%Oz %Y

<sup>Created on 2022-12-15 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
@vspinu vspinu closed this as completed in a922ddf Jan 21, 2023
@vspinu
Copy link
Member

vspinu commented Jan 21, 2023

Thanks for the comprehensive report. Fixed!

vspinu added a commit that referenced this issue Jan 25, 2023
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Jun 5, 2023
Version 1.9.2
=============

### BUG FIXES

* [#1104](tidyverse/lubridate#1104) Fix
  incorrect parsing of months when %a format is present.

### OTHER

* Adapt to internal name changes in R-devel

Version 1.9.1
=============

### NEW FEATURES

* `as_datetime()` accepts multiple formats in format argument, just like `as_date()` does.

### BUG FIXES

* [#1091](tidyverse/lubridate#1091) Fix
  formatting of numeric inputs to parse_date_time.

* [#1092](tidyverse/lubridate#1092) Fix
  regression in `ymd_hm` on locales where `p` format is not defined.

* [#1097](tidyverse/lubridate#1097) Fix
  `as_date("character")` to work correctly with formats that include
  extra characters.

* [#1098](tidyverse/lubridate#1098) Roll
  over the month boundary in `make_dateime()` when units exceed their
  maximal values.

* [#1090](tidyverse/lubridate#1090)
  timechange has been moved from Depends to Imports.

Version 1.9.0
=============

### NEW FEATURES

* `roll` argument to updating and time-zone manipulation functions is
  deprecated in favor of a new `roll_dst` parameter.

* [#1042](tidyverse/lubridate#1042)
  `as_date` with character inputs accepts multiple formats in `format`
  argument. When `format` is supplied, the input string is parsed with
  `parse_date_time` instead of the old `strptime`.

* [#1055](tidyverse/lubridate#1055)
  Implement `as.integer` method for Duration, Period and Interval
  classes.

* [#1061](tidyverse/lubridate#1061) Make
  `year<-`, `month<-` etc. accessors truly generic. In order to make
  them work with arbitrary class XYZ, it's enough to define a
  `reclass_date.XYZ` method.

* [#1061](tidyverse/lubridate#1061) Add
  support for `year<-`, `month<-` etc. accessors for `data.table`'s
  IDate and ITime objects.

* [#1017](tidyverse/lubridate#1017)
  `week_start` argument in all lubridate functions now accepts full
  and abbreviated names of the days of the week.

* The assignment value `wday<-` can be a string either in English or
  as provided by the current locale.

* Date rounding functions accept a date-time `unit` argument for
  rounding to a vector of date-times.

* [#1005](tidyverse/lubridate#1005)
  `as.duration` now allows for full roundtrip `duration ->
  as.character -> as.duration`

* [#911](tidyverse/lubridate#911) C parsers
  treat multiple spaces as one (just like strptime does)

* `stamp` gained new argument `exact=FALSE` to indicate whether
  `orders` argument is an exact strptime formats string or not.

* [#1001](tidyverse/lubridate#1001) Add
  `%within` method with signature (Interval, list), which was
  documented but not implemented.

* [#941](tidyverse/lubridate#941)
  `format_ISO8601()` gained a new option `usetz="Z"` to format time
  zones with a "Z" and convert the time to the UTC time zone.

* [#931](tidyverse/lubridate#931) Usage of
  `Period` objects in rounding functions is explicitly documented.

### BUG FIXES

* [#1036](tidyverse/lubridate#1036)
  `%within%` now correctly works with flipped intervals

* [#1085](tidyverse/lubridate#1085)
  `as_datetime()` now preserves the time zone of the POSIXt input.

* [#1072](tidyverse/lubridate#1072) Names
  are now handled correctly when combining multiple Period or Interval
  objects.

* [#1003](tidyverse/lubridate#1003)
  Correctly handle r and R formats in locales which have no p format

* [#1074](tidyverse/lubridate#1074) Fix
  concatination of named Period, Interval and Duration vectors.

* [#1044](tidyverse/lubridate#1044) POSIXlt
  results returned by `fast_strptime()` and `parse_date_time2()` now
  have a recycled `isdst` field.

* [#1069](tidyverse/lubridate#1069) Internal
  code handling the addition of period months and years no longer
  generates partially recycled POSIXlt objects.

* Fix rounding of POSIXlt objects

* [#1007](tidyverse/lubridate#1007) Internal
  lubridate formats are no longer propagated to stamp formater.

* `train` argument in `parse_date_time` now takes effect. It was
  previously ignored.

* [#1004](tidyverse/lubridate#1004) Fix
  `c.POSIXct` and `c.Date` on empty single POSIXct and Date vectors.

* [#1013](tidyverse/lubridate#1013) Fix
  c(`POSIXct`,`POSIXlt`) heterogeneous concatenation.

* [#1002](tidyverse/lubridate#1002) Parsing
  only with format `j` now works on numeric inputs.

* `stamp()` now correctly errors when no formats could be guessed.

* Updating a date with timezone (e.g. `tzs = "UTC"`) now returns a POSIXct.

### INTERNALS

* `lubridate` is now relying on `timechange` package for update and
  time-zone computation. Google's CCTZ code is no longer part of the
  package.

* `lubridate`'s updating logic is now built on top of `timechange`
  package.

* Change implementation of `c.Period`, `c.Duration` and `c.Interval`
  from S4 to S3.

Version 1.8.0
=============

### NEW FEATURES

* [#960](tidyverse/lubridate#960)
  `c.POSIXct` and `c.Date` can deal with heterogeneous object types
  (e.g `c(date, datetime)` works as expected)

### BUG FIXES

* [#994](tidyverse/lubridate#994)
  Subtracting two duration or two period objects no longer results in
  an ambiguous dispatch note.

* `c.Date` and `c.POSIXct` correctly deal with empty vectors.

* `as_datetime(date, tz=XYZ)` returns the date-time object with HMS
  set to 00:00:00 in the corresponding `tz`

### CHANGES

* [#966](tidyverse/lubridate#966) Lubridate is
  now built with cpp11 (contribution of @DavisVaughan)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants