-
Notifications
You must be signed in to change notification settings - Fork 214
Description
I wanted to switch some code at my company from base::strptime() to lubridate::fast_strptime(), because the %z in the latter understands ISO 8601 offsets like ±[hh]:[mm] and ±[hh][mm], whereas base::strptime()'s %z only understands the ±[hh][mm] format, without the colon.
However, this ended up causing problems, because base::strptime() allows varying whitespace in between the day and hour portion of the input, whereas lubridate::fast_strptime() does not.
lubridate::fast_strptime("8/1/2012 8:02:51.397000 AM", "%m/%d/%Y %I:%M:%OS %p")
#> [1] NA
lubridate::fast_strptime( "8/1/2012 8:02:51.397000 AM", "%m/%d/%Y %I:%M:%OS %p")
#> [1] "2012-08-01 08:02:51 UTC"
strptime("8/1/2012 8:02:51.397000 AM", "%m/%d/%Y %I:%M:%OS %p")
#> [1] "2012-08-01 08:02:51 UTC"
strptime( "8/1/2012 8:02:51.397000 AM", "%m/%d/%Y %I:%M:%OS %p")
#> [1] "2012-08-01 08:02:51 UTC"Created on 2020-07-27 by the reprex package (v0.3.0)
The colon-including format is the default (only?) format used by Python's pandas.Timeseries.isoformat(), and I wanted to consume some data emitted by pandas, so that led me to fast_strptime(). The stricter whitespace would likely break other stuff elsewhere in our code, though, so I'm not sure I can change.
It would be great if this small change in behavior between the two functions could be reconciled, or if not, at least noted in the docs as an intentional difference, for the wary.