Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vroom converts unusual datetime stamps to NAs #240

Closed
cboettig opened this issue Jun 2, 2020 · 3 comments
Closed

vroom converts unusual datetime stamps to NAs #240

cboettig opened this issue Jun 2, 2020 · 3 comments
Labels
bug an unexpected problem or unintended behavior

Comments

@cboettig
Copy link

cboettig commented Jun 2, 2020

Consider this minimal example:

vroom::vroom("date\n2015-06-14T09Z\n2015-06-14T09Z", delim=",")

gives:

> vroom::vroom("date\n2015-06-14T09Z\n2015-06-14T09Z", delim=",")
Rows: 2                                                                                                        
Columns: 1
Delimiter: ","
dttm [1]: date

Use `spec()` to retrieve the guessed column specification
Pass a specification to the `col_types` argument to quiet this message
# A tibble: 2 x 1
  date               
  <dttm>             
1 NA                 
2 NA 

I believe this data is specifying an hour but not a minute in the time portion, which arguably is poor form but, government data. vroom seems to anticipate this is a datetime though, and attempts to coerce it, creating an NA. Would it be possible to make the datetime checking stricter, such that patterns like the above that cannot be coerced are just treated as character data? That would obviously be preferable to getting all NAs unexpectedly.

@j-sirgo
Copy link

j-sirgo commented Jun 2, 2020

Perhaps use the code:

library(vroom)
vroom("date\n2015-06-14T09Z\n2015-06-14T09Z", delim=",", col_types=list(date=col_datetime(format="%Y-%m-%dT%Z")))
#> # A tibble: 2 x 1
#>   date               
#>   <dttm>             
#> 1 2015-06-14 08:00:00
#> 2 2015-06-14 08:00:00
# Eastern Time
vroom("date\n2015-06-14T09Z\n2015-06-14T09Z", delim=",", col_types=list(date=col_datetime(format="%Y-%m-%dT%Z")), locale=locale(tz="US/Eastern"))
#> # A tibble: 2 x 1
#>   date               
#>   <dttm>             
#> 1 2015-06-14 04:00:00
#> 2 2015-06-14 04:00:00

Default is UTC or set locale to timezone of your choosing.

@cboettig
Copy link
Author

cboettig commented Jun 2, 2020

@j-sirgo Yes, thanks. I'm aware that I can manually define the type format. My suggestion is that it is a bug for vroom to decide this sting is a datetime when it simultaneously is not a datetime format that it can parse successfully.

Also note that zoom parses this date-time format just fine if the UTC time zone suffix letter Z is omitted:

vroom("date\n2015-06-14T09\n2015-06-14T09", delim=",")

Again, I know the usual answer to "vroom guesses incorrectly" is "don't let vroom guess, you should always declare your types". But this particular wrong guess is pernicious in that it's not obvious from the return result that vroom is to blame without eyeballing the csv directly, and seems like it could be remedied to be consistent with the excellent guessing mechanisms for so many other date time formats, including the one above.

@jimhester jimhester added the bug an unexpected problem or unintended behavior label Jun 5, 2020
@jimhester
Copy link
Collaborator

Thanks for reporting the issue and for the reproducible example! This data should now guess as character, and match what we parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants