New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent recognition of incorrect input, given locale, in ymd and mdy #255

Closed
HywelMJ opened this Issue Aug 24, 2014 · 5 comments

Comments

Projects
None yet
2 participants
@HywelMJ

HywelMJ commented Aug 24, 2014

R version 3.1.0 lubridate 1.3.3.
Sys.getlocale(category = "LC_ALL")
[1] "LC_COLLATE=Welsh_United Kingdom.1252;LC_CTYPE=Welsh_United Kingdom.1252;LC_MONETARY=Welsh_United Kingdom.1252;LC_NUMERIC=C;LC_TIME=Welsh_United Kingdom.1252"

ymd("1989, Hydref 17")
[1] "1989-10-17 UTC"

but
ymd("1989, October 17")
[1] NA (which is good, given the locale)

while
mdy("Hydref 12, 1975")
[1] "1975-10-12 UTC"

and
mdy("October 12, 1975")
[1] "2075-12-19 UTC" (which is wrong).

@vspinu

This comment has been minimized.

Member

vspinu commented Aug 25, 2014

This is why you have the locale argument.

mdy("October 12, 1975", locale = "en_US.UTF-8")
[1] "1975-10-12 UTC"

In the future the parser might recognize English months independently of the locale, till then this is the only solution.

@HywelMJ

This comment has been minimized.

HywelMJ commented Aug 25, 2014

I appreciate that. What I was trying to point out was that ymd, returning NA makes a good job of it, while mdy (returning 2075...) doesn't.

@vspinu

This comment has been minimized.

Member

vspinu commented Aug 25, 2014

Well, lubridate tries to be smart about huge number of formats. In this case it does a good job for your locale because it interprets "121975" as "1975-10-12 UTC". Which is not what you need but it's the only logical answer for mdy for those numbers and your locale:

> mdy("12, 1975")
[1] "2075-12-19 UTC"
> mdy("121975")
[1] "2075-12-19 UTC"
> mdy("12, 19, 75")
[1] "2075-12-19 UTC"
> mdy("12, bla bla 19, 75")
[1] "2075-12-19 UTC"

A good way out is to recognize the English months and parse the date correctly irrespective of the current locale. I hope to be able to implement this in the relatively close future.

@HywelMJ

This comment has been minimized.

HywelMJ commented Aug 26, 2014

No criticism intended. Actually I suppose it was the 2075 that caught my intention. Interestingly, your comment above is wrong: "121975" [isn't interpreted] as "1975-10-12 UTC" but, as you show, "2075-12-19 UTC". I wonder whether guessing "2075" when a "1975" is present really is the best choice at this time in the current century (irrespective of locale)?

@vspinu

This comment has been minimized.

Member

vspinu commented Aug 26, 2014

Interestingly, your comment above is wrong: "121975" [isn't interpreted] as "1975-10-12 UTC" but, as you show, "2075-12-19 UTC".

The code and the comments are correct. This is how it is intended to work. You ask for mdy you get mdy not my.

There is a small issue in that code though. The 75 is interpreted as 2075 and it should probably be 1975 in order to be consistent with how strptime works. This is discussed in #253.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment