New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding parsing support for weeks #506
Comments
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
In simple words, do you want to be able to parse weeks (US, ISO, EU)? Aka, be able to parse %V, %U and %W formats? As in:
Note that R doesn't have partial date-times other than Date object. So the result of this should be either POSIXct or Date. |
Yes, if there's no "direct approach", I'd like to parse weeks to a POSIX date (including whatever workarounds are necessary to get there; e.g. defining a day of the week) to then express the date as YYYY-mm (or Note, though, that while this seems to work on MacOS and Ubuntu/Linux at least for |
It should be straightforward addition to the internal parser, and should work the same on all platforms. System's strptime is known for buggy/inconsistent parsing of partial date-times. |
This comment has been minimized.
This comment has been minimized.
It seems strange to me that the documentation for lubridate::parse_date_time("2015 03", "Y W")
#> [1] "2015-10-29 UTC" Created on 2020-10-29 by the reprex package (v0.3.0) Created on 2020-10-29 by the reprex package (v0.3.0) |
Just giving this a In particular, it is annoying that it lubridate::as_date("2019-W02-1")
#> [1] "2019-02-01" Created on 2020-12-01 by the reprex package (v0.3.0) If we force strptime evaluation I get a different silent error in both ISO week, lubridate::as_date("2019-W02-1", format = "%Y-W%V-%d")
#> [1] "2019-12-01"
lubridate::as_date("2019-W02-1", format = "%Y-W%W-%d")
#> [1] "2019-12-01" Created on 2020-12-01 by the reprex package (v0.3.0) |
Related, there is "%G" (week-based year, or more precisely ISO-week-based year, relevant for year-ends). This is also not supported by lubridate nor in I suspect this has become more relevant as more week-based data using ISO week standard has been released due to Covid-19. It would be very helpful to have lubridate as the one go-to package to handle even these seemingly odd situations. A brief investigation of what happens in different scenarios and parsers: lubridate::parse_date_time("2020-W53-1", "%G-%U-%u") # should return a date in 2020
#> Error in FUN(X[[i]], ...): Unknown formats supplied: G
lubridate::parse_date_time("2020-W53-7", "%G-%U-%u") # should return a date in 2021
#> Error in FUN(X[[i]], ...): Unknown formats supplied: G
lubridate::parse_date_time("2020-W53-1", "%G-%V-%u") # should return a date in 2020
#> Error in FUN(X[[i]], ...): Unknown formats supplied: GV
lubridate::parse_date_time("2020-W53-7", "%G-%V-%u") # should return a date in 2021
#> Error in FUN(X[[i]], ...): Unknown formats supplied: GV
strptime("2020-W53-1", "%G-%U-%u") # should return a date in 2020
#> [1] NA
strptime("2020-W53-7", "%G-%U-%u") # should return a date in 2021
#> [1] NA
# the wrong way to do it, correctly fails (though with odd message) on wk 53 when using %Y and %U
lubridate::as_date("2020-W53-1", format = "%Y-W%U-%u")
#> Warning in strptime(x, format, tz = "UTC"): (0-based) yday 369 in year 2020 is
#> invalid
#> [1] NA
lubridate::as_date("2021-W01-1", format = "%Y-W%U-%u")
#> [1] "2021-01-04"
lubridate::as_date("2020-W53-1", format = "%G-W%U-%u")
#> Warning in strptime(x, format, tz = "UTC"): (0-based) yday 371 in year
#> -2147481748 is invalid
#> [1] NA
lubridate::as_date("2021-W01-1", format = "%Y-W%U-%u")
#> [1] "2021-01-04"
# ISOweek handles this correctly
ISOweek::ISOweek2date("2020-W53-1")
#> [1] "2020-12-28"
ISOweek::ISOweek2date("2020-W53-7")
#> [1] "2021-01-03"
ISOweek::date2ISOweek("2020-12-28") # should be in ISO wk 53 of 2020
#> [1] "2020-W53-1"
ISOweek::date2ISOweek("2021-01-03") # ditto
#> [1] "2020-W53-7"
ISOweek::date2ISOweek("2021-01-04") # should be ISO week 1 of 2021
#> [1] "2021-W01-1"
# somehow format does too
format(as.Date("2020-12-28"), "%G-W%V") # should be in ISO wk 53 of 2020
#> [1] "2020-W53"
format(as.Date("2021-01-03"), "%G-W%V") # ditto
#> [1] "2020-W53"
format(as.Date("2021-01-04"), "%G-W%V") # should be ISO week 1 of 2021
#> [1] "2021-W01" Created on 2021-01-31 by the reprex package (v0.3.0) |
As many people here have seen, This is in the docs for
The key bit is the last line in parenthesis. This The clock package fully supports the ISO 8601 week based format. The correct format string to use is library(clock)
date_parse(c("2019-W02-1", "2020-W01-1", "2020-W02-1"), format = "%G-W%V-%u")
#> [1] "2019-01-07" "2019-12-30" "2020-01-06" https://clock.r-lib.org/reference/date_parse.html It also supports partial ISO 8601 week based dates that don't have a day. There currently isn't a parser straight into a week precision type, but if you add a dummy library(clock)
library(magrittr)
week_strings <- c("2019-W02", "2020-W01", "2020-W02")
week_strings <- paste0(week_strings, "-1")
week_strings %>%
date_parse(format = "%G-W%V-%u") %>%
as_iso_year_week_day() %>%
calendar_narrow("week")
#> <iso_year_week_day<week>[3]>
#> [1] "2019-W02" "2020-W01" "2020-W02" |
any ideas on why it's not reliably possible to map ISO 8601 week numbers to month of the year numbers? It's also not possible if using the US or UK convention. Tried to outline the problem in this Stackoverflow post. As so often with that sort of stuff I'm suspecting Windows to be the problem here (possibly mixed with a German locale).
Is there some hidden kung-fu in
lubridate
to hack around this nasty problem? If not, would you kindly consider addressing this issue in one of your next releases?Best regards,
Janko
The text was updated successfully, but these errors were encountered: