How to make `predict_in_sample()` derived from the first `d` term of original data when differencing order (`d`) isn't zero #533

theabc50111 · 2022-12-18T09:27:25Z

Describe the question you have

I think the the usage of predict_in_sample() in pmdarima(python pkg) is same as fitted() in forecast(R pkg).

But I find out that the output of predict_in_sample() in pmdarima(python pkg) is different with output of fitted() in forecast(R pkg) when difference order isn't zero.

I use the following python codes to generate the output of predict_in_sample() in three different differencing order (d) :

import numpy as np
from pmdarima.arima import ARIMA, auto_arima

for diff_ord in range(1,4):
    model = ARIMA(order=(2,diff_ord,1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)

    ori_time_series = np.array([0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
                                0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
                                -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154])

    model = model.fit(ori_time_series)
    pred_in_sample = np.array(model.predict_in_sample())
    print(f"pred_in_sample: {list(pred_in_sample)}")

I copy the output of above python codes, then paste to a R script to compare the difference between predict_in_sample() in pmdarima and fitted() in forecast(R pkg), R script:

par(mfrow=c(3,2))

resid_py_arr = rbind(c(0.5074603452679165, -0.34007254404457005, 0.5791635465131197, -0.2603422455105438, -0.11715725264909946, -0.9906800274596668, 0.21001659120388472, 0.3475615524863871, 0.7611416545018854, -0.8381194770281715, 0.2623376398458589, -0.23206647709890738, 0.4220053862092831, 0.06219916252952257, -0.4222181747488597, 0.038538708232362495, -0.08313995790478867, -0.26009526647620496, -0.48253573807639416, 0.5137650759426555, 0.34360142378578173),
                     c(0.49261470998182477, -0.6081910499091976, 1.0569346650372444, -0.6076380990855064, -0.2378976168316683, -0.8352597431282236, 1.0357942689078028, 0.631678563908118, 0.2767179073401864, -1.6808983058879592, 0.5653496038522947, 0.09552482041758414, 0.3659123662710821, -0.3491598976266268, -0.6745863302200333, 0.27064401578860775, 0.05909012652824919, -0.3734973292295873, -0.4340483262207756, 0.9099684919779161, 0.09388976669825833),
                     c(0.4940135127419584, -0.8586664744669461, 1.3118624972776918, -2.3436128660405227, -0.11312237780455078, -0.2567555216685367, 1.8351322442185336, 0.16659999993838948, -0.7634839186197411, -2.006759899752263, 1.8390440448281202, 0.3681746393692201, -0.30073609275686164, -0.6107051995729558, -0.5143875708651311, 0.8646807292504737, 0.03706843182642394, -0.7386026479980223, -0.1930232885395262, 1.3558665622526413, -0.579683995736004))

fitted_val_py_arr = rbind(c(-0.007880175267916512, 0.49169989404457004, 0.28841210348688023, 0.5696976455105438, 0.32260810264909945, 0.5077959474596667, 0.47801250879611523, 0.4985613475136129, 0.07088064549811457, 0.1008287770281715, 0.3425456601541411, 0.6408112270989074, 0.15507516379071695, 0.21370781747047746, 0.21008435474885973, 0.3850643917676375, 0.4155697579047887, 0.18344750047620495, 0.27881111807639414, 0.4178551640573445, 0.23041397621421822),
                          c(0.006965460018175224, 0.7598183999091976, -0.18935901503724428, 0.9169934990855064, 0.4433484668316683, 0.35237566312822355, -0.34776516890780296, 0.21444433609188196, 0.5553043926598136, 0.9436076058879592, 0.0395336961477053, 0.31321992958241585, 0.21116818372891794, 0.6250668776266268, 0.4624525102200333, 0.15295908421139226, 0.2733396734717508, 0.2968495632295873, 0.23032370622077558, 0.02165174802208386, 0.48012563330174163),
                          c(0.005566657258041537, 1.0102938244669462, -0.44428684727769174, 2.6529682660405225, 0.3185732278045508, -0.22612855833146328, -1.1471031442185335, 0.6795229000616105, 1.595506218619741, 1.2694691997522631, -1.2341607448281202, 0.040570110630779865, 0.8778166427568617, 0.8866121795729558, 0.30225375086513107, -0.44107762925047367, 0.29536136817357606, 0.6619548819980223, -0.010701331460473806, -0.4242463222526414, 1.153699395736004))

y<-c(0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
     0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
     -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154)  # The statistical part of the question is understanding that the in-sample one-step-ahead forecasts of an ARIMA model are actually the fitted values of that model. In R, the method fitted applied on model output object normally returns the fitted values of the model. However, the method is not applicable to the output of function arima. There is a workaround: fitted values equal original values minus residuals. Residuals can be extracted from a fitted object using the method residuals (and that applies to the output of function arima).

for (dif_ord in seq(1:3)) {
  #  Better still, use the forecast package which does have a fitted method for outputs from Arima and auto.arima. – Rob Hyndman Feb 26, 2016 at 9:49
  #install.packages('forecast')
  library(forecast)
  fit.model.2 <- Arima(y, order = c(2, dif_ord, 1))
  
  resid_r_forecast_arima <- residuals(fit.model.2)
  resid_py <- resid_py_arr[dif_ord,]

  plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
  axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA)  # Add horizontal grid 
  axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA)  # Add vertical grid
  lines(resid_r_forecast_arima, col=2, lty=2)
  lines(resid_py, col=3, lty=3)
  legend(13, -1, c("origin", "resid_r", "resid_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=0.9)
  mtext(paste("Check residual series trend for diff order:", dif_ord))
  
  
  
  fitted_val_r_forecast_arima <- fitted(fit.model.2)
  fitted_val_py <- fitted_val_py_arr[dif_ord,]
  plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
  axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA)  # Add horizontal grid 
  axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA)  # Add vertical grid
  lines(fitted_val_r_forecast_arima, col=2, lty=2)
  lines(fitted_val_py, col=3, lty=3)
  legend(13, -1, c("origin", "fitted_r", "fitted_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=1)
  mtext(paste("Check fitted series trend for diff order:", dif_ord))
}

output image of R script:

I observe that: for the first d(difference order) term has huge difference, then they getting close after d(difference order) term.
And I think the output of forecast(R package) is correct., based on:
- the response in predict_in_sample of auto_arima produces fitted-values fluctuating around zero #140 (comment)
  
  This produces some junky values for the first d indices when predicting in-sample.
- and observe first d term of original time-series(black line) and output of fitted() (red line)

My questions are:

Do I have any misunderstanding on above description? If no,
How should I adjust the parameters of predict_in_sample() in pmdarima to get the same output with fitted() in forecast?
- I have tried the the parameter:start, but it only shorten the length of output of predict_in_sample().

Versions (if necessary)

No response

The text was updated successfully, but these errors were encountered:

theabc50111 added the ❔ : question label Dec 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make `predict_in_sample()` derived from the first `d` term of original data when differencing order (`d`) isn't zero #533

How to make `predict_in_sample()` derived from the first `d` term of original data when differencing order (`d`) isn't zero #533

theabc50111 commented Dec 18, 2022

How to make predict_in_sample() derived from the first d term of original data when differencing order (d) isn't zero #533

How to make predict_in_sample() derived from the first d term of original data when differencing order (d) isn't zero #533

Comments

theabc50111 commented Dec 18, 2022

Describe the question you have

Versions (if necessary)

How to make `predict_in_sample()` derived from the first `d` term of original data when differencing order (`d`) isn't zero #533

How to make `predict_in_sample()` derived from the first `d` term of original data when differencing order (`d`) isn't zero #533