Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make predict_in_sample() derived from the first d term of original data when differencing order (d) isn't zero #533

Open
theabc50111 opened this issue Dec 18, 2022 · 0 comments

Comments

@theabc50111
Copy link

Describe the question you have

I think the the usage of predict_in_sample() in pmdarima(python pkg) is same as fitted() in forecast(R pkg).

But I find out that the output of predict_in_sample() in pmdarima(python pkg) is different with output of fitted() in forecast(R pkg) when difference order isn't zero.

I use the following python codes to generate the output of predict_in_sample() in three different differencing order (d) :

import numpy as np
from pmdarima.arima import ARIMA, auto_arima

for diff_ord in range(1,4):
    model = ARIMA(order=(2,diff_ord,1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)

    ori_time_series = np.array([0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
                                0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
                                -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154])

    model = model.fit(ori_time_series)
    pred_in_sample = np.array(model.predict_in_sample())
    print(f"pred_in_sample: {list(pred_in_sample)}")

I copy the output of above python codes, then paste to a R script to compare the difference between predict_in_sample() in pmdarima and fitted() in forecast(R pkg), R script:

par(mfrow=c(3,2))

resid_py_arr = rbind(c(0.5074603452679165, -0.34007254404457005, 0.5791635465131197, -0.2603422455105438, -0.11715725264909946, -0.9906800274596668, 0.21001659120388472, 0.3475615524863871, 0.7611416545018854, -0.8381194770281715, 0.2623376398458589, -0.23206647709890738, 0.4220053862092831, 0.06219916252952257, -0.4222181747488597, 0.038538708232362495, -0.08313995790478867, -0.26009526647620496, -0.48253573807639416, 0.5137650759426555, 0.34360142378578173),
                     c(0.49261470998182477, -0.6081910499091976, 1.0569346650372444, -0.6076380990855064, -0.2378976168316683, -0.8352597431282236, 1.0357942689078028, 0.631678563908118, 0.2767179073401864, -1.6808983058879592, 0.5653496038522947, 0.09552482041758414, 0.3659123662710821, -0.3491598976266268, -0.6745863302200333, 0.27064401578860775, 0.05909012652824919, -0.3734973292295873, -0.4340483262207756, 0.9099684919779161, 0.09388976669825833),
                     c(0.4940135127419584, -0.8586664744669461, 1.3118624972776918, -2.3436128660405227, -0.11312237780455078, -0.2567555216685367, 1.8351322442185336, 0.16659999993838948, -0.7634839186197411, -2.006759899752263, 1.8390440448281202, 0.3681746393692201, -0.30073609275686164, -0.6107051995729558, -0.5143875708651311, 0.8646807292504737, 0.03706843182642394, -0.7386026479980223, -0.1930232885395262, 1.3558665622526413, -0.579683995736004))

fitted_val_py_arr = rbind(c(-0.007880175267916512, 0.49169989404457004, 0.28841210348688023, 0.5696976455105438, 0.32260810264909945, 0.5077959474596667, 0.47801250879611523, 0.4985613475136129, 0.07088064549811457, 0.1008287770281715, 0.3425456601541411, 0.6408112270989074, 0.15507516379071695, 0.21370781747047746, 0.21008435474885973, 0.3850643917676375, 0.4155697579047887, 0.18344750047620495, 0.27881111807639414, 0.4178551640573445, 0.23041397621421822),
                          c(0.006965460018175224, 0.7598183999091976, -0.18935901503724428, 0.9169934990855064, 0.4433484668316683, 0.35237566312822355, -0.34776516890780296, 0.21444433609188196, 0.5553043926598136, 0.9436076058879592, 0.0395336961477053, 0.31321992958241585, 0.21116818372891794, 0.6250668776266268, 0.4624525102200333, 0.15295908421139226, 0.2733396734717508, 0.2968495632295873, 0.23032370622077558, 0.02165174802208386, 0.48012563330174163),
                          c(0.005566657258041537, 1.0102938244669462, -0.44428684727769174, 2.6529682660405225, 0.3185732278045508, -0.22612855833146328, -1.1471031442185335, 0.6795229000616105, 1.595506218619741, 1.2694691997522631, -1.2341607448281202, 0.040570110630779865, 0.8778166427568617, 0.8866121795729558, 0.30225375086513107, -0.44107762925047367, 0.29536136817357606, 0.6619548819980223, -0.010701331460473806, -0.4242463222526414, 1.153699395736004))

y<-c(0.49958017, 0.15162735, 0.86757565, 0.3093554, 0.20545085, -0.48288408, 0.6880291,
     0.8461229, 0.8320223, -0.7372907, 0.6048833, 0.40874475, 0.57708055, 0.27590698,
     -0.21213382, 0.4236031, 0.3324298, -0.076647766, -0.20372462, 0.93162024, 0.5740154)  # The statistical part of the question is understanding that the in-sample one-step-ahead forecasts of an ARIMA model are actually the fitted values of that model. In R, the method fitted applied on model output object normally returns the fitted values of the model. However, the method is not applicable to the output of function arima. There is a workaround: fitted values equal original values minus residuals. Residuals can be extracted from a fitted object using the method residuals (and that applies to the output of function arima).

for (dif_ord in seq(1:3)) {
  #  Better still, use the forecast package which does have a fitted method for outputs from Arima and auto.arima. – Rob Hyndman Feb 26, 2016 at 9:49
  #install.packages('forecast')
  library(forecast)
  fit.model.2 <- Arima(y, order = c(2, dif_ord, 1))
  
  resid_r_forecast_arima <- residuals(fit.model.2)
  resid_py <- resid_py_arr[dif_ord,]

  plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
  axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA)  # Add horizontal grid 
  axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA)  # Add vertical grid
  lines(resid_r_forecast_arima, col=2, lty=2)
  lines(resid_py, col=3, lty=3)
  legend(13, -1, c("origin", "resid_r", "resid_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=0.9)
  mtext(paste("Check residual series trend for diff order:", dif_ord))
  
  
  
  fitted_val_r_forecast_arima <- fitted(fit.model.2)
  fitted_val_py <- fitted_val_py_arr[dif_ord,]
  plot.ts(y, xaxp = c(0, 21, 21), ylim = c(-2,2))
  axis(2, at = seq(-1.5, 1.5, 0.5), tck = 1, lty = 2, col = "grey", labels = NA)  # Add horizontal grid 
  axis(1, at = 1:21, tck = 1, lty = 2, col = "grey", labels = NA)  # Add vertical grid
  lines(fitted_val_r_forecast_arima, col=2, lty=2)
  lines(fitted_val_py, col=3, lty=3)
  legend(13, -1, c("origin", "fitted_r", "fitted_py"), col=1:3, lty=1:3, cex=1, ncol=3, y.intersp=0, x.intersp=0, text.width=1)
  mtext(paste("Check fitted series trend for diff order:", dif_ord))
}

output image of R script:
image

  • I observe that: for the first d(difference order) term has huge difference, then they getting close after d(difference order) term.
  • And I think the output of forecast(R package) is correct., based on:

My questions are:

  • Do I have any misunderstanding on above description? If no,
  • How should I adjust the parameters of predict_in_sample() in pmdarima to get the same output with fitted() in forecast?
    • I have tried the the parameter:start, but it only shorten the length of output of predict_in_sample().

Versions (if necessary)

No response

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant