Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bizarre day parsing issue #307

Closed
robertzk opened this issue Mar 17, 2015 · 7 comments
Closed

Bizarre day parsing issue #307

robertzk opened this issue Mar 17, 2015 · 7 comments

Comments

@robertzk
Copy link

@robertzk robertzk commented Mar 17, 2015

I have a very spooky dataset. It fails to parse in full on this exact set in this exact order, but parses fine in every subset / subsample I have tried.

dates <- c("{ND}", "{ND}", "{ND}", "2006-11-26", "{ND}", "{ND}", "{ND}",
"2010-06-05", "2014-06-01", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", "2006-10-31", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "2010-10-31", "2009-05-01",
"{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", NA, "{ND}", "{ND}", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "2008-09-09",
"{ND}", "2003-01-01", "{ND}", "{ND}", "2013-02-28", "2011-10-31",
"{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "{ND}", "{ND}", "2010-08-31", "{ND}", "{ND}",
"2011-02-01", "2012-03-31", "2013-06-04", "{ND}", "{ND}", "{ND}",
"{ND}", "{ND}", "2005-12-12", "2006-09-30", NA, "{ND}")

First, notice

> all(is.na(lubridate::ymd(dates)))
[1] TRUE

However, every other variation of samples I have tried parses fine...

variations <- list(dates[-1], dates[-102], sample(dates), sample(dates, 50), dates[-50], dates[c(F,T)], dates[c(T,T,T,T,F)], dates[102:1])
all(sapply(variations, function(v) !all(is.na(lubridate::ymd(v)))))
# [1] TRUE

Note that shuffling the vector or reversing it is sufficient to get ymd to parse the vector correctly.

I understand if you don't want to look into / fix this, but it did cause a production issue. Baffling.

@peterhurford
Copy link

@peterhurford peterhurford commented Mar 17, 2015

👍

I have replicated this on a separate machine. I have also replicated it again using R --no-init-file.

@robertzk
Copy link
Author

@robertzk robertzk commented Mar 17, 2015

Looks like it's your "irregular guesser" in .get_train_set: the first prime indices under 100 in the vector I provided all have an "{ND}", and any shuffle or re-ordering will almost always disrupt this.

Oh well, space-time trade-off and all, I guess. No way to fix this without impacting performance; someone was bound to hit this in the wild eventually.

@vspinu
Copy link
Member

@vspinu vspinu commented Mar 17, 2015

You were very unfortunate with your very sparce date vector. Lubridate has a training engine which is used to automatically recognize the formats. The training is done on a small subset of the original vector which is generated based on first 501 primes. See the internal .get_train_set functions for how it is done. If you have a better idea I am all ears.

@robertzk
Copy link
Author

@robertzk robertzk commented Mar 17, 2015

No, that's alright, I have the same issues in my packages. Obviously there is no mathematical solution without inspecting the full vector that won't run into edge cases.

@robertzk robertzk closed this Mar 17, 2015
@vspinu
Copy link
Member

@vspinu vspinu commented Mar 17, 2015

You can use parse_date_time2 which doesn't use any training and is very fast:

parse_date_time2(dates, "Ymd")

At least partially, training should also be removed from parse_date_time, at least if there was only one supplied order as in your case.

@vspinu
Copy link
Member

@vspinu vspinu commented Mar 17, 2015

Ok. Closing in favor of #308.

@vspinu vspinu closed this Mar 17, 2015
@robertzk
Copy link
Author

@robertzk robertzk commented Mar 17, 2015

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

3 participants
You can’t perform that action at this time.