forecast.TSLM does not use t distribution for prediction intervals #343

mbg-unsw · 2021-08-17T22:37:55Z

When I compared the output of predict.lm to forecast.TSLM the prediction intervals are wider. Looking at the code, it appears predict.lm uses the t distribution (as I'd expect) while forecast.TSLM uses the normal distribution.

Perhaps this is a convention in the time series world but if so could it be documented?

Happy to provide a MWE if that helps.

robjhyndman · 2021-08-20T03:28:14Z

Because we assume normality when creating combination forecasts and in reconciliation, it would be problematic to use t distributions in fables by default. So it is probably best to add some comment in the documentation for forecast() to say we use a normal approximation.

mbg-unsw · 2021-08-20T03:46:33Z

Thanks Rob, that sounds good to me.

mitchelloharawild · 2021-08-26T02:12:04Z

I've updated the docs and added a approx_normal argument to allow forecast(<TSLM>, approx_normal=FALSE) which returns Student T forecasts.

library(fpp3)
#> ── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
#> ✓ tibble      3.1.3          ✓ tsibble     1.0.1     
#> ✓ dplyr       1.0.7          ✓ tsibbledata 0.3.0     
#> ✓ tidyr       1.1.3          ✓ feasts      0.2.2     
#> ✓ lubridate   1.7.10         ✓ fable       0.3.1.9000
#> ✓ ggplot2     3.3.5.9000
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> x lubridate::date()    masks base::date()
#> x dplyr::filter()      masks stats::filter()
#> x tsibble::intersect() masks base::intersect()
#> x tsibble::interval()  masks lubridate::interval()
#> x dplyr::lag()         masks stats::lag()
#> x tsibble::setdiff()   masks base::setdiff()
#> x tsibble::union()     masks base::union()
as_tsibble(USAccDeaths) %>%
  model(lm = TSLM(log(value) ~ trend() + season())) %>%
  forecast(approx_normal = FALSE) %>% 
  hilo()
#> # A tsibble: 24 x 6 [1M]
#> # Key:       .model [1]
#>    .model    index                value .mean                   `80%`
#>    <chr>     <mth>               <dist> <dbl>                  <hilo>
#>  1 lm     1979 Jan t(t(59, 8.9, 0.055)) 7620. [7083.734,  8171.471]80
#>  2 lm     1979 Feb t(t(59, 8.8, 0.055)) 6899. [6413.748,  7398.606]80
#>  3 lm     1979 Mar t(t(59, 8.9, 0.055)) 7639. [7101.707,  8192.203]80
#>  4 lm     1979 Apr   t(t(59, 9, 0.055)) 7841. [7289.238,  8408.530]80
#>  5 lm     1979 May t(t(59, 9.1, 0.055)) 8645. [8036.810,  9270.895]80
#>  6 lm     1979 Jun t(t(59, 9.1, 0.055)) 9087. [8447.703,  9744.882]80
#>  7 lm     1979 Jul t(t(59, 9.2, 0.055)) 9909. [9211.088, 10625.489]80
#>  8 lm     1979 Aug t(t(59, 9.1, 0.055)) 9237. [8587.107,  9905.692]80
#>  9 lm     1979 Sep   t(t(59, 9, 0.055)) 8237. [7657.235,  8833.035]80
#> 10 lm     1979 Oct   t(t(59, 9, 0.055)) 8517. [7917.291,  9133.023]80
#> # … with 14 more rows, and 1 more variable: 95% <hilo>
as_tsibble(USAccDeaths) %>%
  model(lm = TSLM(log(value) ~ trend() + season())) %>%
  forecast(approx_normal = TRUE) %>% 
  hilo()
#> # A tsibble: 24 x 6 [1M]
#> # Key:       .model [1]
#>    .model    index            value .mean                   `80%`
#>    <chr>     <mth>           <dist> <dbl>                  <hilo>
#>  1 lm     1979 Jan t(N(8.9, 0.003)) 7620. [7089.402,  8164.937]80
#>  2 lm     1979 Feb t(N(8.8, 0.003)) 6899. [6418.880,  7392.690]80
#>  3 lm     1979 Mar t(N(8.9, 0.003)) 7639. [7107.390,  8185.653]80
#>  4 lm     1979 Apr   t(N(9, 0.003)) 7841. [7295.070,  8401.807]80
#>  5 lm     1979 May t(N(9.1, 0.003)) 8645. [8043.240,  9263.482]80
#>  6 lm     1979 Jun t(N(9.1, 0.003)) 9087. [8454.462,  9737.091]80
#>  7 lm     1979 Jul t(N(9.2, 0.003)) 9908. [9218.459, 10616.993]80
#>  8 lm     1979 Aug t(N(9.1, 0.003)) 9237. [8593.978,  9897.772]80
#>  9 lm     1979 Sep   t(N(9, 0.003)) 8237. [7663.362,  8825.973]80
#> 10 lm     1979 Oct   t(N(9, 0.003)) 8516. [7923.626,  9125.721]80
#> # … with 14 more rows, and 1 more variable: 95% <hilo>

^{Created on 2021-08-26 by the reprex package (v2.0.0)}

mitchelloharawild · 2021-08-26T02:13:50Z

On a related issue @robjhyndman, when we generate() future samples shouldn't this come from a Student's T also?

fable/R/lm.R

Line 347 in 9b46cd8

new_data$.innov <- stats::rnorm(length(pred), sd = sqrt(vars))

robjhyndman · 2021-08-26T02:22:56Z

Using t for the t-distribution as well as for transformations is potentially confusing. Perhaps transformations should be f?
The model assumes Gaussian errors. The reason t comes up for the forecast distribution is that the parameters of the normal distribution are estimated, so if we use simulation, we should really draw from the distribution of the parameter estimates and then use normal variates. I think that's too complicated as it would require estimating the parameter covariance matrix which is not stored (or easily calculated) for most models. So I think just using Gaussian errors in simulations is fine, conditional on the parameter estimates. If someone really wants to include parameter uncertainty in the simulations, they can write their own code.

mitchelloharawild · 2021-08-26T02:24:08Z

* Using `t` for the t-distribution as well as for transformations is potentially confusing. Perhaps transformations should be `f`?

I also noticed this. I was thinking of using * but perhaps this isn't evident enough t(3,4,5)*.

* The model assumes Gaussian errors. The reason t comes up for the forecast distribution is that the parameters of the normal distribution are estimated, so if we use simulation, we should really draw from the distribution of the parameter estimates and then use normal variates. I think that's too complicated as it would require estimating the parameter covariance matrix which is not stored (or easily calculated) for most models. So I think just using Gaussian errors in simulations is fine, conditional on the parameter estimates. If someone really wants to include parameter uncertainty in the simulations, they can write their own code.

👍

mitchelloharawild self-assigned this Aug 26, 2021

mitchelloharawild added the enhancement New feature or request label Aug 26, 2021

mitchelloharawild closed this as completed in c3e764b Aug 26, 2021

mitchelloharawild reopened this Aug 26, 2021

mitchelloharawild closed this as completed Aug 26, 2021

jennahamlin mentioned this issue Sep 25, 2023

forecast and TSLM how is the mean (point forecast) calculated? #402

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

forecast.TSLM does not use t distribution for prediction intervals #343

forecast.TSLM does not use t distribution for prediction intervals #343

mbg-unsw commented Aug 17, 2021

robjhyndman commented Aug 20, 2021

mbg-unsw commented Aug 20, 2021

mitchelloharawild commented Aug 26, 2021

mitchelloharawild commented Aug 26, 2021

robjhyndman commented Aug 26, 2021

mitchelloharawild commented Aug 26, 2021

forecast.TSLM does not use t distribution for prediction intervals #343

forecast.TSLM does not use t distribution for prediction intervals #343

Comments

mbg-unsw commented Aug 17, 2021

robjhyndman commented Aug 20, 2021

mbg-unsw commented Aug 20, 2021

mitchelloharawild commented Aug 26, 2021

mitchelloharawild commented Aug 26, 2021

robjhyndman commented Aug 26, 2021

mitchelloharawild commented Aug 26, 2021