New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent date parsing when year is 2- or 4-digits and order "mdy" #556

Closed
malwinare opened this Issue Jun 22, 2017 · 2 comments

Comments

Projects
None yet
3 participants
@malwinare

malwinare commented Jun 22, 2017

Hi all,

For dates with mixed 2- and 4-digit year, the date-parsing behavior of parse_date_time() is inconsistent when orders = "mdy". But, it works fine when, for example, order = "ymd".

Examples

parse_date_time("apr.12.50", orders = "mdy")   # "2050-04-12" 

# inconsistent
parse_date_time(c("apr.12.50","apr.2.2016"), orders = "mdy")  #  "0050-04-12"  "2016-04-02"

# works fine:
parse_date_time(c("50.apr.12","2016.apr.2"), orders = "ymd")  # "2050-04-12" "2016-04-02"

R version: 3.4.0
lubridate: lubridate_1.6.0.9009

Thanks in advance.
Best,
Malwina

@vspinu vspinu added the bug label Jun 23, 2017

@cderv

This comment has been minimized.

Contributor

cderv commented Jun 27, 2017

I tried to find where it could come from but it is not obvious.

Training of lubridate found two formats %b.%d.%Y and %b.%d.%y. lubridate:::.select_formats prioritizes the first one. Then lubridate::.striptime pass parsing through base::strptime that do not gives back NA on "apr.12.50" when used with format %b.%d.%Y. So %b.%d.%y is never used and %b.%d.%Y is used on both.

It is not the case with orders = "ymd". Difference is it uses c_parser with %y.%Om.%d and %Y.%Om.%d formats. with orders = "mdy", guess_format does not replace m with Om because of grepl("[^O][mbB]", orders)

Hope this little investigation could help fix the bug.

@vspinu

This comment has been minimized.

Member

vspinu commented Oct 2, 2017

Thanks @cderv for spotting this. This was a corner case bug due to the scoring in .select_formats indeed. When y occurred at the end of the format, it wasn't detected by that regexp. Now fixed!

@vspinu vspinu closed this in 6468bd7 Oct 2, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment