Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fractional seconds of col_time objects #1394

Closed
muschellij2 opened this issue Mar 25, 2022 · 1 comment
Closed

Fractional seconds of col_time objects #1394

muschellij2 opened this issue Mar 25, 2022 · 1 comment

Comments

@muschellij2
Copy link

This essentially a duplicate of tidyverse/vroom#422, but I wanted to show that the same issue exists in readr.edition 1 as well. Overall - fractional seconds that are in a col_time column need to be handled by the user with care but if it is a col_datetime it's likely fine.

This is the option needed to set to see fractional seconds when printing

options(digits.secs = 3)
options(readr.edition = 1L)

Load hms so we can see things as times versus difftime objects

library(hms)

Example File

url = "https://github.com/r-lib/vroom/files/8353807/62163.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file, which is needed for our analysis.

readLines(file, 5)
#> [1] "DAY_OF_DATA,START_TIME,END_TIME,DATA_QUALITY_FLAG_CODE,DATA_QUALITY_FLAG_VALUE"
#> [2] "2,14:11:35.463000,14:11:35.463000,COUNT_SPIKES_Z,1"                            
#> [3] "2,14:10:00.000000,14:10:59.988000,ADJACENT_INVALID,1"                          
#> [4] "2,14:12:00.000000,14:12:59.988000,ADJACENT_INVALID,1"                          
#> [5] "4,11:17:25.938000,11:17:25.938000,COUNT_SPIKES_Z,1"

Reading data with vroom

Here we read the data and see that the output is a col_time object

data = readr::read_csv(file, progress = FALSE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   DAY_OF_DATA = col_double(),
#>   START_TIME = col_time(format = ""),
#>   END_TIME = col_time(format = ""),
#>   DATA_QUALITY_FLAG_CODE = col_character(),
#>   DATA_QUALITY_FLAG_VALUE = col_double()
#> )
readr::spec(data)
#> cols(
#>   DAY_OF_DATA = col_double(),
#>   START_TIME = col_time(format = ""),
#>   END_TIME = col_time(format = ""),
#>   DATA_QUALITY_FLAG_CODE = col_character(),
#>   DATA_QUALITY_FLAG_VALUE = col_double()
#> )

No fractional seconds are printed:

head(data)
#> # A tibble: 6 × 5
#>   DAY_OF_DATA START_TIME END_TIME DATA_QUALITY_FLAG_CODE DATA_QUALITY_FLAG_VALUE
#>         <dbl> <time>     <time>   <chr>                                    <dbl>
#> 1           2 14:11:35   14:11:35 COUNT_SPIKES_Z                               1
#> 2           2 14:10:00   14:10:59 ADJACENT_INVALID                             1
#> 3           2 14:12:00   14:12:59 ADJACENT_INVALID                             1
#> 4           4 11:17:25   11:17:25 COUNT_SPIKES_Z                               1
#> 5           4 11:35:21   11:35:21 COUNT_SPIKES_X                               1
#> 6           4 11:16:00   11:16:59 ADJACENT_INVALID                             1

We can confirm that they are truncated.

as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0 0 0 0 0

Different col_time format

The default %AT I don’t think takes into account fractional seconds, so
we need to pass our own format in:
Here we specify the col_time so that it uses %OS, which I think is
R-specific as per ?strptime

col_time_with_frac_secs = function(...) {
  readr::col_time(format = "%H:%M:%OS", ...)
}

read in the data

data = readr::read_csv(file,
                    col_types =
                      readr::cols(
                        START_TIME = col_time_with_frac_secs(),
                        END_TIME = col_time_with_frac_secs()
                      ))

Fractional seconds are preserved

as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800

Overall, that may be the end of it to be an issue to point to people in
the future. I’m not sure if there should be warning that things
may be truncated or whether this should/could guess fractional seconds.
Below I just show that the default for a datetime does preserve fractional
seconds and the format of how you store dates and times can lead to
fractional second differences.

Datetime versus time object

If we have a file with a datetime, then this doesn’t seem to be an issue:

Example File

url = "https://github.com/r-lib/vroom/files/8353874/62163_data.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file, which is needed for our analysis.

readLines(file, 5)
#> [1] "HEADER_TIMESTAMP,X,Y,Z"                    
#> [2] "2000-01-08 17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08 17:30:00.013,0.17,0.094,-0.751" 
#> [4] "2000-01-08 17:30:00.025,0.22,0.109,-0.727" 
#> [5] "2000-01-08 17:30:00.038,0.258,0.047,-0.78"

Reading data with vroom

Here we read the data and see that the output is a col_time object

data = readr::read_csv(file, progress = FALSE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   HEADER_TIMESTAMP = col_datetime(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )
readr::spec(data)
#> cols(
#>   HEADER_TIMESTAMP = col_datetime(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )

We see fractional seconds are preserved:

head(data)
#> # A tibble: 6 × 4
#>   HEADER_TIMESTAMP            X     Y      Z
#>   <dttm>                  <dbl> <dbl>  <dbl>
#> 1 2000-01-08 17:30:00.000 0.208 0.079 -0.751
#> 2 2000-01-08 17:30:00.013 0.17  0.094 -0.751
#> 3 2000-01-08 17:30:00.024 0.22  0.109 -0.727
#> 4 2000-01-08 17:30:00.037 0.258 0.047 -0.78 
#> 5 2000-01-08 17:30:00.049 0.276 0.029 -0.762
#> 6 2000-01-08 17:30:00.062 0.258 0.032 -0.777

We can confirm that they are preserved

as.numeric(lubridate::seconds(data$HEADER_TIMESTAMP[1:5])) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995
as.numeric(data$HEADER_TIMESTAMP[1:5]) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995

Separated Date and Time

If we have a file with 2 columns, separated in date and time, where
time, has fractional seconds, then this is again an issue as the
original issue.

Example File

url = "https://github.com/r-lib/vroom/files/8353880/62163_data_separated.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file.

readLines(file, 5)
#> [1] "DATE,TIMESTAMP,X,Y,Z"                      
#> [2] "2000-01-08,17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08,17:30:00.013,0.17,0.094,-0.751" 
#> [4] "2000-01-08,17:30:00.025,0.22,0.109,-0.727" 
#> [5] "2000-01-08,17:30:00.038,0.258,0.047,-0.78"

Reading data with vroom

Here we read the data and see that the output is a col_time object

data = readr::read_csv(file, progress = FALSE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   DATE = col_date(format = ""),
#>   TIMESTAMP = col_time(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )
readr::spec(data)
#> cols(
#>   DATE = col_date(format = ""),
#>   TIMESTAMP = col_time(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )

No fractional seconds are printed:

head(data)
#> # A tibble: 6 × 5
#>   DATE       TIMESTAMP     X     Y      Z
#>   <date>     <time>    <dbl> <dbl>  <dbl>
#> 1 2000-01-08 17:30     0.208 0.079 -0.751
#> 2 2000-01-08 17:30     0.17  0.094 -0.751
#> 3 2000-01-08 17:30     0.22  0.109 -0.727
#> 4 2000-01-08 17:30     0.258 0.047 -0.78 
#> 5 2000-01-08 17:30     0.276 0.029 -0.762
#> 6 2000-01-08 17:30     0.258 0.032 -0.777

We can confirm that they are truncated.

as.numeric(lubridate::seconds(data$TIMESTAMP[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$TIMESTAMP[1:5]) %% 1
#> [1] 0 0 0 0 0

Created on 2022-03-25 by the reprex package (v2.0.1)

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Debian GNU/Linux 10 (buster)
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       Etc/UTC
#>  date     2022-03-25
#>  pandoc   2.14.0.3 @ /usr/lib/rstudio-server/bin/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.2.0.9000 2022-03-16 [1] Github (r-lib/cli@51463d2)
#>  crayon        1.5.0      2022-02-14 [1] CRAN (R 4.1.2)
#>  curl          4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  generics      0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  hms         * 1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.37       2021-12-16 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
#>  lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
#>  readr         2.1.2      2022-01-30 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2.9000 2022-03-16 [1] Github (r-lib/sessioninfo@27965c2)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0.9000 2021-12-14 [1] xgit (git@github.com:tidyverse/stringr.git@dd909b7)
#>  tibble        3.1.6      2021-11-07 [1] CRAN (R 4.1.0)
#>  tzdb          0.2.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.2)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.30       2022-03-02 [1] CRAN (R 4.1.2)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/jupyter/.R/library
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@muschellij2
Copy link
Author

Closing this for posterity, upcoming releases will use vroom so no action is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant