Description
I have an issue where the season()
in a model context where the function creates an all NA factor. It is apparently due to the fact that my index is not exactly integer. I imagine this is a general problem if one has sub-second data or otherwise has been through an imperfect aggregation (in this case to hourly data).
The issue seems to be present both on CRAN and the latest commit from GitHub.
Please see the reproducible example below where I also propose some solutions. If you wish, I can try to make pull request with a solution.
# remotes::install_github('tidyverts/fable')
# install.packages('fable')
library("fable")
# A data subset
x <- structure(list(data = c(0.654340987099764, 0.306863543295109, 0.472474420817171, -1.09341948531794,
1.20966833172894, 0.265420322089116, -1.91999831324977, -0.276682839817029, 0.159697643465573, 0.611967188546101),
timestamp = structure(c(1560254400.0001, 1560258000.0001, 1560261600.0001, 1560265200.0001, 1560268800.0001,
1560272400.0001, 1560276000.0001, 1560279600.0001, 1560283200.0001, 1560286800.0001), class = c("POSIXct",
"POSIXt"), tzone = "UTC")), row.names = c(NA, 10L), envir = "prod", key = structure(list(.rows = list(1:10)),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame")), index = structure("timestamp",
ordered = TRUE), index2 = "timestamp", interval = structure(list(year = 0, quarter = 0, month = 0,
week = 0, day = 0, hour = 1, minute = 0, second = 0, millisecond = 0, microsecond = 0, nanosecond = 0,
unit = 0), class = "interval"), class = c("tbl_ts", "tbl_df", "tbl", "data.frame"))
print(x)
## # A tsibble: 10 x 2 [1h] <UTC>
## data timestamp
## * <dbl> <dttm>
## 1 0.654 2019-06-11 12:00:00
## 2 0.307 2019-06-11 13:00:00
## 3 0.472 2019-06-11 14:00:00
## 4 -1.09 2019-06-11 15:00:00
## 5 1.21 2019-06-11 16:00:00
## 6 0.265 2019-06-11 17:00:00
## 7 -1.92 2019-06-11 18:00:00
## 8 -0.277 2019-06-11 19:00:00
## 9 0.160 2019-06-11 20:00:00
## 10 0.612 2019-06-11 21:00:00
The high-level issue and error message:
x %>% model(TSLM(data ~ season()))
## A mable: 1 x 1
# `TSLM(data ~ season())`
# <model>
#1 <NULL model>
#Warning message:
#1 error encountered for TSLM(data ~ season())
#[1] 0 (non-NA) cases
The error message when calling model(TSLM(data ~ trend() + season()))
is much harder to understand.
Anyway, calling fable:::season(x)
in the model context (I cannot readily see why it does not work directly) ultimately causes
#fable:::season(x) ->
fable:::season.tbl_ts(x, NULL)
## # A tibble: 10 x 1
## day
## <fct>
## 1 <NA>
## 2 <NA>
## 3 <NA>
## 4 <NA>
## 5 <NA>
## 6 <NA>
## 7 <NA>
## 8 <NA>
## 9 <NA>
## 10 <NA>
As far as I can tell, that ultimatly calls fable:::season.numeric
and creates what is equivalent to:
idx_num <- c(433404.000000028, 433405.000000028, 433406.000000028, 433407.000000028, 433408.000000028,
433409.000000028, 433410.000000028, 433411.000000028, 433412.000000028, 433413.000000028)
factor((idx_num%%24) + 1, levels = seq_len(24))
## [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
It's easy to see here why it fails, but it was not so obvious when printing idx_num
as the digits are not printed.
Some possible solutions. Make the code evaluate to:
# Solution 1 (unsafe?)
factor(as.integer((idx_num%%24) + 1), levels = seq_len(24))
## [1] 13 14 15 16 17 18 19 20 21 22
## Levels: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
# Solution 2 - something _equivalent_ to
l <- (idx_num%%24) + 1
if (isTRUE(all.equal(ll <- as.integer(l), l))) {
result <- factor(ll, levels = seq_len(24))
} else {
result <- factor(l, levels = seq_len(24))
}
Or, get users to fix it. In this case, it is simple. But I wonder what would work also for subsecond data.
# Solution 3 -
y <- x
y$timestamp <- as.POSIXct(as.integer(y$timestamp), origin = "1970-01-01 00:00", tz = "UTC")
fable:::season.tbl_ts(y, NULL)
## # A tibble: 10 x 1
## day
## <fct>
## 1 13
## 2 14
## 3 15
## 4 16
## 5 17
## 6 18
## 7 19
## 8 20
## 9 21
## 10 22
Some session info for completeness:
sessionInfo()
## R version 3.6.2 (2019-12-12)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.3 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8
## [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] fable_0.1.1.9000 fabletools_0.1.1
##
## loaded via a namespace (and not attached):
## [1] Rcpp_1.0.3 formatR_1.7 pillar_1.4.3 compiler_3.6.2 remotes_2.1.0
## [6] tools_3.6.2 zeallot_0.1.0 packrat_0.5.0-24 lubridate_1.7.4 tsibble_0.8.5
## [11] lifecycle_0.1.0 tibble_2.1.3 gtable_0.3.0 anytime_0.3.7 pkgconfig_2.0.3
## [16] rlang_0.4.2 cli_2.0.1 rstudioapi_0.10 curl_4.2 dplyr_0.8.3
## [21] stringr_1.4.0 generics_0.0.2 vctrs_0.2.1 grid_3.6.2 tidyselect_0.2.5
## [26] glue_1.3.1 R6_2.4.1 fansi_0.4.1 purrr_0.3.3 ggplot2_3.2.1
## [31] tidyr_1.0.0 magrittr_1.5 backports_1.1.5 scales_1.1.0 assertthat_0.2.1
## [36] colorspace_1.4-1 utf8_1.1.4 stringi_1.4.5 lazyeval_0.2.2 munsell_0.5.0
## [41] crayon_1.3.4