New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible bug in `parse_date_time` or inconsistency #326

Closed
JasonAizkalns opened this Issue May 20, 2015 · 8 comments

Comments

Projects
None yet
3 participants
@JasonAizkalns

JasonAizkalns commented May 20, 2015

Possible bug ? in parse_date_time -- see this SO post for more detail.

Long and short = why does the first method fail in light of the other methods working?

#### This fails ####
parse_date_time('4/18/1950 0130', '%m%d%Y %H%M')
# [1] NA
# Warning message:
# All formats failed to parse. No formats found.

#### But all of these work ####
parse_date_time('4/18/1950', '%m%d%Y')
# [1] "1950-04-18 UTC"

parse_date_time('4/18/1950 01', '%m%d%Y %H')
# [1] "1950-04-18 01:00:00 UTC"

parse_date_time('04/18/1950 0130', '%m%d%Y %H%M')
# [1] "1950-04-18 01:30:00 UTC"

parse_date_time('4/18/1950 01:30', '%m%d%Y %H%M')
# [1] "1950-04-18 01:30:00 UTC"
@vspinu

This comment has been minimized.

Member

vspinu commented May 20, 2015

This is a bug.

For completeness, from your other post:

> parse_date_time(c('12/17/1996 04:00:00','04/18/1950 0130'), c("mdYHMS", "mdYHM"))
[1] "1996-12-17 04:00:00 UTC" "1950-04-18 01:30:00 UTC"
> parse_date_time(c('12/17/1996 04:00:00','4/18/1950 0130'), c("mdYHMS", "mdYHM"))
[1] "1996-12-17 04:00:00 UTC" NA                       
Warning message:
 1 failed to parse. 


> mdy_hms(c('12/17/1996 04:00:00','04/18/1950 0130'), truncated = 1)
[1] "1996-12-17 04:00:00 UTC" "1950-04-18 01:30:00 UTC"
> mdy_hms(c('12/17/1996 04:00:00','4/18/1950 0130'), truncated = 1)
[1] "1996-12-17 04:00:00 UTC" NA                       
Warning message:
 1 failed to parse. 

BTW, if you want to match PM/AM you need to use I format instead of H.

@vspinu vspinu added the bug label May 20, 2015

@dracodoc

This comment has been minimized.

dracodoc commented Oct 31, 2015

I tried I but had error:

fast_strptime("10/26/2015 02:19:09 PM", "%m/%d/%Y %I:%M:%S %p")
Error in structure(xx, class = c("POSIXct", "POSIXt"), tzone = tz) : 
  Unrecognized format I supplied

The problem may came from the system locale have different string for AM/PM indicator. I was having a Chinese locale, US time zone in Windows. Once I set the locale right, the problem solved.

Sys.setlocale("LC_TIME", "C")

I did have to use C instead of English or en_US.

See more details on this.

So this could be not really a bug, but it will help a lot if the error messages and function documents include information and examples like this. I believe this problem is actually quite common: there are many people using system locale other than English but need to read time format in English.

EDIT: Yes I messed up a little in finding examples in all previous attempts. The I format problem is not related to locale.

@vspinu

This comment has been minimized.

Member

vspinu commented Oct 31, 2015

Your post is a unrelated to this issue and a bit misleading. You cannot have solved it with locales because fast_strptime doesn't accept I format. You fixed the %p problem as suggested in your SO post but but that's not the bug reported in this issue.

You are right about AM/PM though. See #327 for AM/PM problem and the proposed solution.

@vspinu vspinu closed this in cb35b1e Oct 31, 2015

@vspinu

This comment has been minimized.

Member

vspinu commented Oct 31, 2015

Tthere are two ways lubridate infers formats, flex and exact. With flex matching all numeric elements can have flexible length (for example 4 and 04 for day will work), but then, there must be non-numeric separators between the elements. For the exact matcher there need not be non-numeric separators but elements must have exact posix number of digits (like 04).

In your example

parse_date_time('4/18/1950 0130', 'mdY HM')
[1] NA
Warning message:
All formats failed to parse. No formats found. 

you want to perform flex matching on the date part 4/18/1950 and exact matching on time part 0130. This is not possible and would be very difficult to implement giving the limitation of the R's regexp matcher.

Instead, I am fixing it by adding a new argument to parse_date_time, exact=FALSE. When TRUE the orders argument is interpreted as exact strptime formats and no guessing or training is performed. This way you can add as many exact formats as you want and you will also gain in speed because no guessing is performed at all.

@dracodoc

This comment has been minimized.

dracodoc commented Oct 31, 2015

Thanks for clarifying. I realized part of the formats convention in reading document but not fully.

I think the document for parse_date_time can be improved. The format section said

Here are all the formats recognized by lubridate. ... Formats accepted by parse_date_time2 and fast_strptime are marked with "!".

Format strings without "!" are not supported by parse_date_time2 and fast_strptime, this was implied but not stated clear enough, especially when you are reading the long list below that sentence.

I'd suggest to mark each format string's availability with some thing like this:

(Use parse for parse_date_time, parse_2 for parse_date_time2, fast for fast_strptime)
I parse
d parse parse_2 fast

@vspinu

This comment has been minimized.

Member

vspinu commented Oct 31, 2015

Yeh. I am trying to improve that doc without increasing its size. Suggestions are welcome.

Formats accepted by parse_date_time2 and fast_strptime are marked with "!".

I am changing it into Fast perasers, \code{parse_date_time2} and \code{fast_strptime}, currently accept only formats marked with "!".

Would that work?

@dracodoc

This comment has been minimized.

dracodoc commented Oct 31, 2015

This is much better! Thanks!

Do you think add explicit marks like fast only after each accepted string will take too much space?

I think the file size is not really relevant since it's just text. I don't want the page too cluttered too, but I do believe the format like

I parse
Hours as decimal number (01–12 or 0–12).
d parse parse_2 fast
Day of the month as decimal number (01–31 or 0–31)

is much easier to read and didn't need any extra line.

The main problem I have with ! is that this usage is not general enough so user don't have the intuitive when he is reading.

@vspinu

This comment has been minimized.

Member

vspinu commented Oct 31, 2015

Do you think add explicit marks like fast only after each accepted string will take too much space?

I understand your problem, but I don't like to have long repetitions in the doc. I cannot control how Rd outputstables so the paragraphs will be shifted to the right.

I think once you know what ! means, it should be straightforwards to read the docs. It's really a minor issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment