Fractional seconds of `col_time` objects #1394

muschellij2 · 2022-03-25T21:59:35Z

This essentially a duplicate of tidyverse/vroom#422, but I wanted to show that the same issue exists in readr.edition 1 as well. Overall - fractional seconds that are in a col_time column need to be handled by the user with care but if it is a col_datetime it's likely fine.

This is the option needed to set to see fractional seconds when printing

options(digits.secs = 3)
options(readr.edition = 1L)

Load hms so we can see things as times versus difftime objects

library(hms)

Example File

url = "https://github.com/r-lib/vroom/files/8353807/62163.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file, which is needed for our analysis.

readLines(file, 5)
#> [1] "DAY_OF_DATA,START_TIME,END_TIME,DATA_QUALITY_FLAG_CODE,DATA_QUALITY_FLAG_VALUE"
#> [2] "2,14:11:35.463000,14:11:35.463000,COUNT_SPIKES_Z,1"                            
#> [3] "2,14:10:00.000000,14:10:59.988000,ADJACENT_INVALID,1"                          
#> [4] "2,14:12:00.000000,14:12:59.988000,ADJACENT_INVALID,1"                          
#> [5] "4,11:17:25.938000,11:17:25.938000,COUNT_SPIKES_Z,1"

Reading data with `vroom`

Here we read the data and see that the output is a col_time object

data = readr::read_csv(file, progress = FALSE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   DAY_OF_DATA = col_double(),
#>   START_TIME = col_time(format = ""),
#>   END_TIME = col_time(format = ""),
#>   DATA_QUALITY_FLAG_CODE = col_character(),
#>   DATA_QUALITY_FLAG_VALUE = col_double()
#> )
readr::spec(data)
#> cols(
#>   DAY_OF_DATA = col_double(),
#>   START_TIME = col_time(format = ""),
#>   END_TIME = col_time(format = ""),
#>   DATA_QUALITY_FLAG_CODE = col_character(),
#>   DATA_QUALITY_FLAG_VALUE = col_double()
#> )

No fractional seconds are printed:

head(data)
#> # A tibble: 6 × 5
#>   DAY_OF_DATA START_TIME END_TIME DATA_QUALITY_FLAG_CODE DATA_QUALITY_FLAG_VALUE
#>         <dbl> <time>     <time>   <chr>                                    <dbl>
#> 1           2 14:11:35   14:11:35 COUNT_SPIKES_Z                               1
#> 2           2 14:10:00   14:10:59 ADJACENT_INVALID                             1
#> 3           2 14:12:00   14:12:59 ADJACENT_INVALID                             1
#> 4           4 11:17:25   11:17:25 COUNT_SPIKES_Z                               1
#> 5           4 11:35:21   11:35:21 COUNT_SPIKES_X                               1
#> 6           4 11:16:00   11:16:59 ADJACENT_INVALID                             1

We can confirm that they are truncated.

as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0 0 0 0 0

Different `col_time` format

The default %AT I don’t think takes into account fractional seconds, so
we need to pass our own format in:
Here we specify the col_time so that it uses %OS, which I think is
R-specific as per ?strptime

col_time_with_frac_secs = function(...) {
  readr::col_time(format = "%H:%M:%OS", ...)
}

read in the data

data = readr::read_csv(file,
                    col_types =
                      readr::cols(
                        START_TIME = col_time_with_frac_secs(),
                        END_TIME = col_time_with_frac_secs()
                      ))

Fractional seconds are preserved

as.numeric(lubridate::seconds(data$START_TIME[1:5])) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800
as.numeric(data$START_TIME[1:5]) %% 1
#> [1] 0.463 0.000 0.000 0.938 0.800

Overall, that may be the end of it to be an issue to point to people in
the future. I’m not sure if there should be warning that things
may be truncated or whether this should/could guess fractional seconds.
Below I just show that the default for a datetime does preserve fractional
seconds and the format of how you store dates and times can lead to
fractional second differences.

Datetime versus time object

If we have a file with a datetime, then this doesn’t seem to be an issue:

Example File

url = "https://github.com/r-lib/vroom/files/8353874/62163_data.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file, which is needed for our analysis.

readLines(file, 5)
#> [1] "HEADER_TIMESTAMP,X,Y,Z"                    
#> [2] "2000-01-08 17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08 17:30:00.013,0.17,0.094,-0.751" 
#> [4] "2000-01-08 17:30:00.025,0.22,0.109,-0.727" 
#> [5] "2000-01-08 17:30:00.038,0.258,0.047,-0.78"

Reading data with `vroom`

Here we read the data and see that the output is a col_time object

data = readr::read_csv(file, progress = FALSE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   HEADER_TIMESTAMP = col_datetime(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )
readr::spec(data)
#> cols(
#>   HEADER_TIMESTAMP = col_datetime(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )

We see fractional seconds are preserved:

head(data)
#> # A tibble: 6 × 4
#>   HEADER_TIMESTAMP            X     Y      Z
#>   <dttm>                  <dbl> <dbl>  <dbl>
#> 1 2000-01-08 17:30:00.000 0.208 0.079 -0.751
#> 2 2000-01-08 17:30:00.013 0.17  0.094 -0.751
#> 3 2000-01-08 17:30:00.024 0.22  0.109 -0.727
#> 4 2000-01-08 17:30:00.037 0.258 0.047 -0.78 
#> 5 2000-01-08 17:30:00.049 0.276 0.029 -0.762
#> 6 2000-01-08 17:30:00.062 0.258 0.032 -0.777

We can confirm that they are preserved

as.numeric(lubridate::seconds(data$HEADER_TIMESTAMP[1:5])) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995
as.numeric(data$HEADER_TIMESTAMP[1:5]) %% 1
#> [1] 0.00000000 0.01300001 0.02499998 0.03799999 0.04999995

Separated Date and Time

If we have a file with 2 columns, separated in date and time, where
time, has fractional seconds, then this is again an issue as the
original issue.

Example File

url = "https://github.com/r-lib/vroom/files/8353880/62163_data_separated.csv.gz"
file = file.path(tempdir(), basename(url))
if (!file.exists(file)) {
  curl::curl_download(url, file)
}

Data Header

We see the fractional seconds in this file.

readLines(file, 5)
#> [1] "DATE,TIMESTAMP,X,Y,Z"                      
#> [2] "2000-01-08,17:30:00.000,0.208,0.079,-0.751"
#> [3] "2000-01-08,17:30:00.013,0.17,0.094,-0.751" 
#> [4] "2000-01-08,17:30:00.025,0.22,0.109,-0.727" 
#> [5] "2000-01-08,17:30:00.038,0.258,0.047,-0.78"

Reading data with `vroom`

Here we read the data and see that the output is a col_time object

data = readr::read_csv(file, progress = FALSE)
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   DATE = col_date(format = ""),
#>   TIMESTAMP = col_time(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )
readr::spec(data)
#> cols(
#>   DATE = col_date(format = ""),
#>   TIMESTAMP = col_time(format = ""),
#>   X = col_double(),
#>   Y = col_double(),
#>   Z = col_double()
#> )

No fractional seconds are printed:

head(data)
#> # A tibble: 6 × 5
#>   DATE       TIMESTAMP     X     Y      Z
#>   <date>     <time>    <dbl> <dbl>  <dbl>
#> 1 2000-01-08 17:30     0.208 0.079 -0.751
#> 2 2000-01-08 17:30     0.17  0.094 -0.751
#> 3 2000-01-08 17:30     0.22  0.109 -0.727
#> 4 2000-01-08 17:30     0.258 0.047 -0.78 
#> 5 2000-01-08 17:30     0.276 0.029 -0.762
#> 6 2000-01-08 17:30     0.258 0.032 -0.777

We can confirm that they are truncated.

as.numeric(lubridate::seconds(data$TIMESTAMP[1:5])) %% 1
#> [1] 0 0 0 0 0
as.numeric(data$TIMESTAMP[1:5]) %% 1
#> [1] 0 0 0 0 0

^{Created on 2022-03-25 by the reprex package (v2.0.1)}

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       Debian GNU/Linux 10 (buster)
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C.UTF-8
#>  ctype    C.UTF-8
#>  tz       Etc/UTC
#>  date     2022-03-25
#>  pandoc   2.14.0.3 @ /usr/lib/rstudio-server/bin/pandoc/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  cli           3.2.0.9000 2022-03-16 [1] Github (r-lib/cli@51463d2)
#>  crayon        1.5.0      2022-02-14 [1] CRAN (R 4.1.2)
#>  curl          4.3.2      2021-06-23 [1] CRAN (R 4.1.0)
#>  digest        0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  ellipsis      0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate      0.15       2022-02-18 [1] CRAN (R 4.1.2)
#>  fansi         1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap       1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  fs            1.5.2      2021-12-08 [1] CRAN (R 4.1.2)
#>  generics      0.1.2      2022-01-31 [1] CRAN (R 4.1.2)
#>  glue          1.6.2      2022-02-24 [1] CRAN (R 4.1.2)
#>  highr         0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  hms         * 1.1.1      2021-09-26 [1] CRAN (R 4.1.0)
#>  htmltools     0.5.2      2021-08-25 [1] CRAN (R 4.1.0)
#>  knitr         1.37       2021-12-16 [1] CRAN (R 4.1.2)
#>  lifecycle     1.0.1      2021-09-24 [1] CRAN (R 4.1.0)
#>  lubridate     1.8.0      2021-10-07 [1] CRAN (R 4.1.0)
#>  magrittr      2.0.2      2022-01-26 [1] CRAN (R 4.1.2)
#>  pillar        1.7.0      2022-02-01 [1] CRAN (R 4.1.2)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  R6            2.5.1      2021-08-19 [1] CRAN (R 4.1.0)
#>  readr         2.1.2      2022-01-30 [1] CRAN (R 4.1.2)
#>  reprex        2.0.1      2021-08-05 [1] CRAN (R 4.1.0)
#>  rlang         1.0.2      2022-03-04 [1] CRAN (R 4.1.2)
#>  rmarkdown     2.11       2021-09-14 [1] CRAN (R 4.1.0)
#>  rstudioapi    0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  sessioninfo   1.2.2.9000 2022-03-16 [1] Github (r-lib/sessioninfo@27965c2)
#>  stringi       1.7.6      2021-11-29 [1] CRAN (R 4.1.0)
#>  stringr       1.4.0.9000 2021-12-14 [1] xgit (git@github.com:tidyverse/stringr.git@dd909b7)
#>  tibble        3.1.6      2021-11-07 [1] CRAN (R 4.1.0)
#>  tzdb          0.2.0      2021-10-27 [1] CRAN (R 4.1.0)
#>  utf8          1.2.2      2021-07-24 [1] CRAN (R 4.1.0)
#>  vctrs         0.3.8      2021-04-29 [1] CRAN (R 4.1.2)
#>  withr         2.5.0      2022-03-03 [1] CRAN (R 4.1.2)
#>  xfun          0.30       2022-03-02 [1] CRAN (R 4.1.2)
#>  yaml          2.3.5      2022-02-21 [1] CRAN (R 4.1.2)
#> 
#>  [1] /home/jupyter/.R/library
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

The text was updated successfully, but these errors were encountered:

muschellij2 · 2022-03-25T22:00:05Z

Closing this for posterity, upcoming releases will use vroom so no action is needed.

muschellij2 closed this as completed Mar 25, 2022

muschellij2 mentioned this issue Apr 24, 2024

Fractional Seconds with conversion and rounding/truncation? tidyverse/lubridate#1163

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fractional seconds of `col_time` objects #1394

Fractional seconds of `col_time` objects #1394

muschellij2 commented Mar 25, 2022

muschellij2 commented Mar 25, 2022

Fractional seconds of col_time objects #1394

Fractional seconds of col_time objects #1394

Comments

muschellij2 commented Mar 25, 2022

Example File

Data Header

Reading data with vroom

Different col_time format

Datetime versus time object

Example File

Data Header

Reading data with vroom

Separated Date and Time

Example File

Data Header

Reading data with vroom

muschellij2 commented Mar 25, 2022

Fractional seconds of `col_time` objects #1394

Fractional seconds of `col_time` objects #1394

Reading data with `vroom`

Different `col_time` format

Reading data with `vroom`

Reading data with `vroom`