number of residuals from auto_arima is different from what I get from R #181

fissehab · 2019-08-14T17:27:36Z

I am comparing results of auto ARIMA with R (forecast package) and Python (pmdarima package). One of the issues I am getting is the length of the residuals in R and Python are different when d is not zero. For example, in the code shown below, in R, the length of the residuals is the same as the time series data but in python it is the length of the time series data minus d. Further, I also see significant changes in the order I get from R vs Python. What are the causes of the differences?
R

set.seed(250)
ts1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 200)
set.seed(270)
ts2 = arima.sim(list(order = c(1,2,0), ar = 0.7), n = 200)
my_fit1 <- auto.arima(ts1, stepwise = FALSE)
resid1 = as.numeric(residuals(my_fit1))
my_fit2 <- auto.arima(ts2, stepwise = FALSE)
resid2 = as.numeric(residuals(my_fit2))
#to use with Python
 ts1 = data.frame(ts1 = ts1)
 write.csv(ts1, 'ts1.csv', row.names = F)
 ts2 = data.frame(ts2 = ts2)
 write.csv(ts2, 'ts2.csv', row.names = F)

Python

from pmdarima.arima import auto_arima
ts1 = pd.read_csv('ts1.csv')
ts2 = pd.read_csv('ts2.csv')
my_fit1 = auto_arima(y = ts1.ts1, 
                  start_p=1, 
                  start_q=1,
                  max_p=3, 
                  max_q=3,
                  seasonal=False,
                  d = None,  
                  error_action='ignore',  
                  suppress_warnings=True, 
                  stepwise=False)

 my_fit2 = auto_arima(y = ts2.ts2, 
                  start_p=1, 
                  start_q=1,
                  max_p=3, 
                  max_q=3,
                  seasonal=False,
                  d = None,  
                  error_action='ignore',  
                  suppress_warnings=True, 
                  stepwise=False)```

The text was updated successfully, but these errors were encountered:

tgsmith61591 · 2019-08-15T12:26:17Z

So this seems like a duplicate of #175. We currently don't have a way of backing into residuals for differenced arrays based on the way statsmodels computes its time series models. It actually stores the differenced endogenous array internally, so you're probably getting d fewer residuals. I kind of wonder whether we could back into this if we used something like mentioned in #180... in any case, there are some improvements coming in 1.3.0 that may help but this is still an open problem.

Regarding

Further, I also see significant changes in the order I get from R vs Python. What are the causes of the differences?

Can you please provide a minimal reproducible dataset? Otherwise I can't help too much...

fissehab · 2019-08-15T14:51:11Z

Thank you. Is there a way to get the fitted values without the residuals. I want the residuals to get the fitted values (fitted values = actual values- residuals)? If there is a way to get the fitted values directly without using the residuals, it will solve my problem. Final question, are the coefficients for the original data or for the differenced data when d>=1?

fissehab · 2019-08-15T14:58:48Z

Regarding providing a minimal reproducible dataset, if you run the code above and see the orders they are different in R and Python. I generated data with R and saved it to use with Python.

aaronreidsmith · 2019-08-15T17:57:32Z

@tgsmith61591 I will let you answer from an algorithm standpoint, but I have expanded the example, and attached the CSVs from @fissehab's R code to help you out.

Files

ts1.csv
ts2.csv

R

> library('forecast')
> set.seed(250)
> ts1 = arima.sim(list(order = c(1,1,0), ar = 0.7), n = 200)
> set.seed(270)
> ts2 = arima.sim(list(order = c(1,2,0), ar = 0.7), n = 200)
> my_fit1 <- auto.arima(ts1, stepwise = FALSE)
> arimaorder(my_fit1)
p d q 
1 1 0 
> resid1 = as.numeric(residuals(my_fit1))
> resid1
  [1]  0.00000000 -0.60364095  0.23826623  0.55722769 -0.23020760 -0.41220075
  [7]  0.78253425 -1.87075189  1.19027884 -1.11793094  1.44801661  0.58545044
 [13] -0.80168133  0.14150800 -1.26133487  0.69509677  0.37075834 -0.15454742
 [19] -0.07506156 -0.65475479 -0.39363677  0.09495343 -2.11172543 -1.39594151
 [25]  2.26873247  0.76211905  2.62624271  0.20756791 -0.17519514  0.15366845
 [31] -0.75394396  0.40269526  1.30275568  0.03974311 -0.86914667  0.74654955
 [37] -1.42677596 -0.11764568 -0.36827530  0.63085923 -0.35568819  0.53503752
 [43] -1.11812136 -1.20394944  0.03298737 -0.46537277  1.46181665  0.55329102
 [49]  0.65954289  1.07228447 -0.32819218 -0.60029496  0.95860825  0.26066664
 [55] -2.09685486  0.27539522  0.18676738  1.31785804  0.74285905  0.85642125
 [61]  1.94737305  1.38570623 -1.26571979 -0.25598437  0.83014839 -0.13828312
 [67] -1.25833457  0.52988298 -0.18840990 -1.62258951 -1.69961377  1.26925885
 [73] -1.23917939 -0.99038214  0.56235865  0.31754259 -1.62379398  0.70405516
 [79]  0.33092129 -0.03768896  1.52509539  0.63282506  1.15508415  0.15291803
 [85]  0.22988662 -0.29258460 -1.27369507  0.45285668 -0.11829754 -0.05201585
 [91]  2.11289020  0.04490440 -2.17809348 -0.57417372  0.51716956 -0.79458069
 [97]  0.89607635  0.09921942 -0.80588473 -0.87131186  1.74297724  0.09051064
[103]  0.61534240 -0.33864617  1.52146018 -0.83787263 -0.12031748 -0.18325604
[109] -1.27093387 -0.57869311 -1.43018033 -0.59291123  0.03099888  0.45575181
[115] -0.11544744 -0.41749196  0.51077514 -0.98883678  1.16138394  0.54893279
[121]  1.42583180 -0.61288395  0.76596722  0.83529175  0.10463794  0.69803678
[127]  0.81147072  0.16419393  0.18308735  0.63020989  0.08309077 -0.59091896
[133] -1.17471453 -2.00908007 -1.37471413  2.24625050 -2.66882416 -0.12291356
[139]  1.56299532  0.79906197  1.03739653  0.67977696  0.73662729 -0.73744140
[145]  2.11993889 -0.66426538 -0.46323626 -0.09076647 -0.87879961  0.55516104
[151]  0.20391274 -0.45673926  2.68615995  0.16879682  1.53363369  0.02861250
[157]  0.20392584 -0.23056804  1.41893053  2.46470838  0.69759568  0.32439179
[163] -1.44719100 -0.22394274  0.46879718 -0.54604309 -0.02741070 -0.77379035
[169]  1.00404773 -2.29133983 -0.14150747  0.81702877 -1.06178458  0.84844974
[175] -1.91793674 -0.99835482 -1.95073835 -0.64555812 -0.22738434  0.61966177
[181]  1.37577810 -1.83137609 -0.41625198 -0.68598856  0.56554510  0.21220398
[187]  0.65734704  0.24893157 -0.89582415 -0.24703034  0.41341682  1.05832060
[193]  0.31943008 -1.42929071 -0.55284699 -0.29899667  0.41514233  0.80769373
[199] -0.77864033 -0.63165443  1.26494252
> my_fit2 <- auto.arima(ts2, stepwise = FALSE)
> arimaorder(my_fit2)
p d q 
1 2 1 
> resid2 = as.numeric(residuals(my_fit2))
> resid2
  [1]  0.000000000  0.000000000  0.553135704  0.078884889 -0.772116810
  [6]  1.191752479 -0.896232626 -2.062970105 -0.726017862  0.063279693
 [11]  0.534729849 -0.020860066 -0.518894344 -1.407545106  1.629950431
 [16]  0.237475847  0.929761428  0.285019547  0.002373432 -1.066614459
 [21]  0.170374898  0.826748954  0.453558464  0.806905185 -0.201452123
 [26]  0.927365195 -0.315056029 -1.605456579  1.001293851 -0.583167870
 [31] -0.065537958  0.083631143 -0.943534835 -0.897012857  0.711878920
 [36] -1.667809795 -1.275387241 -0.776376336  0.156093004  0.385574839
 [41] -1.054707179  0.892523106 -0.118777249  0.002475682  0.321135527
 [46]  1.749487912  1.020931367  1.302966504 -0.334985031  0.948290240
 [51] -0.073429529  0.787138932 -0.003783066  1.471352449  0.022863341
 [56]  0.952155901 -0.656101628  0.354626629 -0.283058924  0.389940980
 [61] -1.516544803 -3.240804108 -0.534250486 -1.476116438 -1.722021722
 [66]  0.370818601  0.364762301  1.626126667 -0.713186461  0.653185814
 [71] -1.262779853 -0.386501740  1.658904794 -1.083788893  0.750618921
 [76]  1.394887238  0.807662624 -0.323083799  0.133315768  1.094482551
 [81]  0.794963951 -1.044287942 -1.467871008 -0.520638586 -0.292278329
 [86] -0.805503298 -0.682879043  1.010176905  1.536099988 -0.011595607
 [91] -1.199649546  0.281641646  1.029089762  1.361362113 -1.048213240
 [96]  0.546301033  0.561122104 -0.352473797 -0.932481643 -0.748541698
[101]  1.120561884 -0.736398291 -0.792149625  1.265344789 -0.519407130
[106]  0.452189280 -0.874618620  0.702206300  1.505288803  1.559516107
[111] -1.179830908  0.338015212  0.597624322  0.580372875 -1.107022221
[116]  0.555363872 -1.225402072 -0.783824198  0.689679399  1.590894096
[121] -1.133090782  1.412550135 -0.900777500 -0.082785372  0.859001741
[126]  0.928419242  1.134220525  0.754978551 -0.973236489  0.599800489
[131] -0.132953065 -0.276302693  0.894958563  0.525032841  1.342282351
[136] -0.462901967 -0.085348825  0.454797338 -0.306347744 -0.285569856
[141]  0.235856099  0.093424147 -1.031979489 -1.663810205  0.365205998
[146]  0.953019554  0.593837091 -0.298025980  2.177322013 -1.598080966
[151] -1.597428462  0.717544322 -0.338993484 -1.339025519 -0.685849756
[156] -0.962702048  1.418138903  0.276621679 -1.961510365  1.236341138
[161]  0.587880092  0.753530428 -0.481849596  0.904094056 -1.013752658
[166] -1.771149402  1.229417875 -0.672951165 -1.554274939 -1.143091548
[171]  0.553663878  1.577028296 -0.582703829  1.220492864 -0.769572292
[176]  0.703472326 -1.164729050  1.687470894 -0.531875777  0.096395906
[181] -1.808910934 -0.582637245 -0.523091826  0.300103475 -0.960595596
[186]  2.297750029  0.444818093 -1.610805595 -1.501848340 -0.551919250
[191]  1.145908843  1.241342836 -1.824136040  0.059812317 -1.303889251
[196]  0.387169543 -0.430681201 -0.322139411 -2.135068754  0.412077770
[201] -1.522639052 -2.161560121
> ts1 = data.frame(ts1 = ts1)
> write.csv(ts1, 'ts1.csv', row.names = F)
> ts2 = data.frame(ts2 = ts2)
> write.csv(ts2, 'ts2.csv', row.names = F)

Python

>>> import pandas as pd
>>> from pmdarima.arima import auto_arima
>>> ts1 = pd.read_csv('ts1.csv')
>>> ts2 = pd.read_csv('ts2.csv')
>>> my_fit1 = auto_arima(y=ts1.ts1, 
...                      start_p=1, 
...                      start_q=1,
...                      max_p=3, 
...                      max_q=3,
...                      seasonal=False,
...                      d=None,  
...                      error_action='ignore',  
...                      suppress_warnings=True, 
...                      stepwise=False)
>>> my_fit1.order
(2, 1, 2)
>>> my_fit1.resid()
array([-9.14538020e-01,  1.93464169e-01,  5.24371064e-01, -2.92925696e-01,
       -5.18102136e-01,  6.89138707e-01, -1.92777588e+00,  1.04641618e+00,
       -1.13247250e+00,  1.31099867e+00,  5.36093529e-01, -9.33193359e-01,
       -2.61315078e-02, -1.35341228e+00,  5.62208818e-01,  3.23055935e-01,
       -2.56341826e-01, -2.10124293e-01, -7.78631506e-01, -5.34063831e-01,
       -2.81301011e-02, -2.22997218e+00, -1.62061311e+00,  2.08168716e+00,
        6.67401742e-01,  2.36174743e+00, -2.95062490e-04, -4.78868268e-01,
       -7.95617995e-02, -8.96236584e-01,  2.68204549e-01,  1.25381087e+00,
        2.59122000e-02, -9.40936265e-01,  6.72041857e-01, -1.39607346e+00,
       -1.88237903e-01, -3.62619233e-01,  6.00114482e-01, -3.68871221e-01,
        4.44813850e-01, -1.17361653e+00, -1.34713923e+00, -1.05476073e-01,
       -5.65589374e-01,  1.28761411e+00,  4.29824178e-01,  4.51267661e-01,
        8.75801447e-01, -4.80762914e-01, -7.85236776e-01,  8.14348124e-01,
        2.22830032e-01, -2.16703941e+00,  1.12453174e-01,  1.64543106e-01,
        1.25540831e+00,  7.06172258e-01,  7.75858444e-01,  1.89455659e+00,
        1.42475300e+00, -1.20592661e+00, -2.44709101e-01,  9.87742863e-01,
        1.10515575e-01, -1.04865152e+00,  7.05049200e-01,  8.80124260e-02,
       -1.40753141e+00, -1.56989975e+00,  1.38072022e+00, -1.03004202e+00,
       -9.92185645e-01,  5.35830220e-01,  3.20254047e-01, -1.69565097e+00,
        4.99317668e-01,  2.30276822e-01, -2.02685833e-01,  1.31931584e+00,
        5.03635370e-01,  9.80913068e-01,  4.21565533e-02,  1.15157303e-01,
       -3.42533852e-01, -1.30229213e+00,  4.08596706e-01, -5.96888514e-02,
       -4.58378958e-02,  2.10906641e+00,  1.43844338e-01, -2.19172583e+00,
       -6.62125322e-01,  5.45546633e-01, -7.38654656e-01,  8.38459924e-01,
        1.05828275e-01, -8.71155353e-01, -9.89231359e-01,  1.62111435e+00,
        8.67467996e-02,  4.91681918e-01, -4.22675070e-01,  1.40624041e+00,
       -8.28313058e-01, -2.20171821e-01, -2.07097737e-01, -1.28579546e+00,
       -6.53110878e-01, -1.48003301e+00, -7.29491219e-01, -1.13993685e-01,
        2.71657248e-01, -3.45193780e-01, -7.24522597e-01,  1.65722356e-01,
       -1.29986193e+00,  7.50757296e-01,  2.46864814e-01,  1.07197921e+00,
       -9.09203788e-01,  3.88663220e-01,  5.82687708e-01, -1.13640149e-01,
        4.76265114e-01,  6.66810232e-01,  6.44259755e-02,  9.14453151e-02,
        5.88158977e-01,  1.04871011e-01, -5.66836931e-01, -1.15393751e+00,
       -1.99852351e+00, -1.41606643e+00,  2.19538125e+00, -2.60561076e+00,
       -4.15702414e-01,  1.37727224e+00,  6.40497831e-01,  7.86005328e-01,
        4.43024921e-01,  5.11265280e-01, -9.09830321e-01,  1.92113663e+00,
       -6.54437362e-01, -5.75093869e-01, -1.36377939e-01, -8.74536898e-01,
        5.20006723e-01,  2.39965496e-01, -4.62443126e-01,  2.63519378e+00,
        2.90901799e-01,  1.52137984e+00,  1.48199054e-01,  3.01543427e-01,
       -5.14341118e-02,  1.62868465e+00,  2.80421117e+00,  1.12419973e+00,
        7.11614556e-01, -9.83000574e-01,  2.31689031e-01,  1.05120074e+00,
        8.21582203e-02,  5.36102251e-01, -1.85103651e-01,  1.53671091e+00,
       -1.67855488e+00,  2.60659770e-01,  1.32153145e+00, -5.58644381e-01,
        1.19774854e+00, -1.48691621e+00, -7.53739567e-01, -1.67825080e+00,
       -4.85561825e-01, -7.72185571e-02,  6.88235934e-01,  1.39716245e+00,
       -1.83435110e+00, -6.25850082e-01, -8.15777532e-01,  3.87956568e-01,
        6.09959713e-02,  4.43642400e-01,  4.06122799e-02, -1.12948633e+00,
       -5.32393800e-01,  1.74436469e-01,  8.44546164e-01,  1.26966200e-01,
       -1.66073353e+00, -8.58371590e-01, -5.37814226e-01,  1.70941117e-01,
        5.69655669e-01, -1.02394265e+00, -9.75773478e-01,  9.34447674e-01])
>>> my_fit2 = auto_arima(y=ts2.ts2, 
...                      start_p=1, 
...                      start_q=1,
...                      max_p=3, 
...                      max_q=3,
...                      seasonal=False,
...                      d=None,  
...                      error_action='ignore',  
...                      suppress_warnings=True, 
...                      stepwise=False)
>>> my_fit2.order
(3, 2, 2)
>>> my_fit2.resid()
array([ 0.93498424,  0.11865935, -0.74539659,  1.18472186, -0.84989407,
       -2.07898243, -0.65103113,  0.01591816,  0.53588411,  0.309986  ,
       -0.35468874, -1.5608997 ,  1.61819017,  0.54450884,  0.91216129,
        0.34138055,  0.04778707, -1.06332563,  0.08631281,  0.87614061,
        0.52674929,  0.88340722, -0.14389948,  0.85723075, -0.27193337,
       -1.62392745,  1.00179547, -0.53456754, -0.1381289 ,  0.32301899,
       -0.90361638, -1.03319164,  0.8584496 , -1.47448276, -1.39528868,
       -0.63715174,  0.29869234,  0.44270045, -0.92199649,  1.05853092,
       -0.02445317, -0.12868128,  0.52885401,  1.8909318 ,  0.90287131,
        1.284364  , -0.08951854,  0.77761604, -0.27268465,  0.82464474,
        0.19805207,  1.25957484, -0.03232757,  1.08332586, -0.56617578,
        0.05424107, -0.21702669,  0.54297404, -1.55229714, -3.40037476,
       -0.36412127, -1.27776819, -1.88508363,  0.69261049,  0.79503463,
        1.49262823, -0.59685356,  0.95254585, -1.20456255, -0.7753768 ,
        1.9091864 , -0.75539544,  0.45743979,  1.48527013,  1.01771099,
       -0.36607227,  0.00704966,  1.16703819,  0.8219478 , -1.12871576,
       -1.44713619, -0.4547927 , -0.39195356, -0.7827503 , -0.3986085 ,
        1.11149303,  1.44836935,  0.12766762, -0.94883314,  0.1802624 ,
        0.83712718,  1.5077454 , -0.76929138,  0.36274145,  0.47086796,
       -0.27351047, -0.81009948, -0.82348303,  1.04339551, -0.54340087,
       -0.71608811,  1.30810891, -0.51056589,  0.45613281, -0.61290227,
        0.58790022,  1.47241659,  1.7185846 , -1.05461664,  0.16908157,
        0.59288033,  0.55928973, -1.04305812,  0.56017679, -1.2062302 ,
       -0.90193199,  0.89157858,  1.69748227, -1.20174913,  1.49020802,
       -0.61467375, -0.42063414,  0.90665987,  1.20533816,  1.07132981,
        0.6365093 , -0.84098777,  0.6182845 , -0.27519097, -0.39420656,
        1.16169109,  0.58940555,  1.0872276 , -0.29210526,  0.06531636,
        0.29699429, -0.48434988, -0.07645545,  0.40779051, -0.13670244,
       -1.05051871, -1.34366978,  0.35614509,  0.76706391,  0.79550408,
        0.09043893,  2.0313036 , -1.76704674, -1.50855526,  1.03146735,
       -0.62845911, -1.52246402, -0.08430522, -0.85495369,  0.9015557 ,
        0.65746577, -1.43956086,  0.95173401,  0.39331314,  1.05351368,
       -0.11704104,  0.54070703, -1.13058082, -1.46892611,  1.33434914,
       -0.94006593, -1.56265535, -0.57374688,  0.46179756,  1.22270341,
       -0.11840036,  1.64146156, -1.0830073 ,  0.33026865, -0.54255066,
        1.71584151, -0.93671821,  0.12367594, -1.18606485, -0.92092432,
       -0.86176041,  0.76895933, -0.55705392,  1.88423762,  0.54522791,
       -1.21461875, -1.49894922, -0.92082694,  1.24368024,  1.71859071,
       -1.86053481, -0.18252227, -0.96489466,  0.37905162, -0.50170363,
       -0.21354372, -1.83719068,  0.30092296, -1.51112236, -2.01598809])

fissehab · 2019-08-17T12:36:19Z

From the orders @aaronreidsmith has shown above, the R results are more close to the correct values than the Python orders.

tgsmith61591 · 2019-08-18T00:40:23Z

It's because you have stepwise=False in the python code. When stepwise is False, the model fits are akin to RandomizedSearchCV in scikit-learn. Without seeding the random state, you can't guarantee deterministic results. When stepwise=True you get the exact same results (and it's much faster).

my_fit1 = pm.auto_arima(y=ts1, 
                        start_p=1, 
                        start_q=1,
                        max_p=3, 
                        max_q=3,
                        seasonal=False,
                        d=None,  
                        trace=True,
                        error_action='ignore',  
                        suppress_warnings=True, 
                        stepwise=True)

my_fit2 = pm.auto_arima(y=ts2, 
                        start_p=1, 
                        start_q=1,
                        max_p=3, 
                        max_q=3,
                        seasonal=False,
                        d=None,
                        trace=True,
                        error_action='ignore',  
                        suppress_warnings=True, 
                        stepwise=True)

Run the above and you'll get:

>>> my_fit1.order
(1, 1, 0)
>>> my_fit2.order
(1, 2, 1)

talaowis · 2020-05-09T11:45:28Z

@tgsmith61591
how can I get accurate residuals of the same size as the dataset from python?

ARIMA Python Residuals are not accurate and the problem I am also facing is that I am fitting a set of 57 row but the prediction size is 55 ?

Noting That the dataset is a daily time series

"""Here is the code""""
def evaluate_arima_model(X, arima_order,date):
# prepare training dataset
train_size = int(len(X) * 0.75);

train, test = X[0:train_size], X[train_size:];
history = [x for x in train]
predictions = list()
for t in range(len(test)):
    difference(X, interval=arima_order[1])
    model = ARIMA(history, order=arima_order,dates=date.values,freq='D')
    model_fit = model.fit(method="css");
    yhat = model_fit.forecast()[0]
    predictions.append(yhat)
    history.append(test[t])
# calculate out of sample error
predictions=[[a[0]] for a in predictions]

RMSE = sqrt(mean_squared_error(test, predictions));
MAPE=mean_absolute_percentage_error(test, predictions)

MSLE=msle(test, predictions)
RMSLE=rmsle(test, predictions)
R2=r2_score(test, predictions)
return RMSE,MAPE,MSLE,RMSLE,R2,model_fit,train, test,history

def evaluate_models(resultsdf,index,dataset,datasetAS_DF):
f= open(path+"country_order_RMSE.txt","a+")
p = d = q = range(0, 3)
bhistory=None
bestMAPE,bestMSLE,bestRMSLE,bestR2=float("inf"),float("inf"),float("inf"),float("inf")
# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
dataset = dataset.astype('float32')
best_score, best_cfg,best_model = float("inf"), None, None
for order in pdq:
try:

        print("hello trying to evaluate");rmse,MAPE,MSLE,RMSLE,R2,model_fit,train, test,history = evaluate_arima_model(dataset, order,datasetAS_DF['Date']);print("rmse",rmse)
        if rmse < best_score:
            best_score, best_cfg = rmse, order
            best_model=model_fit
            bestMAPE,bestMSLE,bestRMSLE,bestR2=MAPE,MSLE,RMSLE,R2
            bhistory=history
        print('ARIMA%s MSE=%.3f' % (order,rmse))
    except:
        print("ARIMA IS NOT APPROPRIATE")
best_model.save(path+'models/'+index+'/ARIMAmodel.pkl')
print("HISTOOOOOOOORY")
print(len(bhistory))
print("RESIDUALLLLS",best_model.resid)
resid_arr=best_model.resid
fittedvalues=best_model.fittedvalues
residuals = pd.DataFrame(best_model.resid,columns=["Residuals"])
residuals.plot()
print("FITTTTTTTTTTED")
print(len(datasetAS_DF))               **############dataset size is 57**
print(len(resid_arr))                            **############residuals size is 55**

tgsmith61591 · 2020-05-09T12:06:03Z

@talaowis I don't even think you're using our package. It looks like you're fitting a stats models ARIMA directly.

talaowis · 2020-05-09T13:01:43Z

Hello @tgsmith61591 ,now I have switched to use your package put it is the same !!
the dataset size is not the same as the residuals result size
and the residuals are not accurate

def evaluate_arima_model(X, arima_order,date):
# prepare training dataset
train_size = int(len(X) * 0.75);

train, test = X[0:train_size], X[train_size:];
history = [x for x in train]
# make predictions
predictions = list()
for t in range(len(test)):
    model = ARIMA(order=arima_order)
    model_fit = model.fit(history);print("code breaks!!!")
    yhat = model_fit.predict(1)
    print("yhaaaaaaaaaaaaaaaaaaaaat",yhat)
    predictions.append(yhat)
    history.append(test[t])
# calculate out of sample error
predictions=[[a[0]] for a in predictions]

print("predictions",predictions);
RMSE = sqrt(mean_squared_error(test, predictions));
MAPE=mean_absolute_percentage_error(test, predictions)
MSLE=msle(test, predictions)
RMSLE=rmsle(test, predictions)
R2=r2_score(test, predictions)
return RMSE,MAPE,MSLE,RMSLE,R2,model_fit,train, test,history

def evaluate_models(resultsdf,index,dataset,datasetAS_DF):
f= open(path+"country_order_RMSE.txt","a+")
p = d = q = range(0, 3)
bhistory=None
bestMAPE,bestMSLE,bestRMSLE,bestR2=float("inf"),float("inf"),float("inf"),float("inf")
# Generate all different combinations of p, q and q triplets
pdq = list(itertools.product(p, d, q))
dataset = dataset.astype('float32')
best_score, best_cfg,best_model = float("inf"), None, None
for order in pdq:
try:

        rmse,MAPE,MSLE,RMSLE,R2,model_fit,train, test,history = evaluate_arima_model(dataset, order,datasetAS_DF['Date']);
        if rmse < best_score:
            best_score, best_cfg = rmse, order
            best_model=model_fit
            bestMAPE,bestMSLE,bestRMSLE,bestR2=MAPE,MSLE,RMSLE,R2
            bhistory=history
        print('ARIMA%s MSE=%.3f' % (order,rmse))
    except:
        print("ARIMA IS NOT APPROPRIATE")
# Serialize with Pickle
with open(path+'models/'+index+'/ARIMAmodel.pkl', 'wb') as pkl:
    pickle.dump(best_model, pkl)
print("HISTOOOOOOOORY")
print(len(bhistory))
print("RESIDUALLLLS",best_model.resid)
resid_arr=best_model.resid()
residuals = pd.DataFrame(resid_arr,columns=["Residuals"])
residuals.plot()
print("FITTTTTTTTTTED")
**print(len(datasetAS_DF)) #dataset size is 57
print(len(resid_arr)) residuals size is 56 **
**predictions = list(best_model.predict_in_sample()) #that is how i get the predictions**
print(len(predictions))
**#fill first resduals with 0-that's because ARIMA does not return 0 residuals
while len(datasetAS_DF)!=len(resid_arr) and len(resid_arr)<len(datasetAS_DF):
    resid_arr=np.concatenate([[0],resid_arr])
    predictions.insert(0,0)**

residuals = pd.DataFrame(resid_arr,columns=["Residuals"])
predictions = pd.DataFrame(predictions,columns=["predictions"])
datasetAS_DF=datasetAS_DF.reset_index(drop=True)
datasetAS_DF=pd.concat([datasetAS_DF,residuals],axis=1)
datasetAS_DF=pd.concat([datasetAS_DF,predictions],axis=1)
datasetAS_DF.to_csv(path+'models/'+index+'/AfterARIMA.csv')
plt.savefig(path+'models/'+index+"/residuals.png")
plt.clf()
residuals.plot(kind='kde')
plt.savefig(path+'models/'+index+"/KDE_residuals.png")
plt.clf()
print('Best ARIMA%s RMSE=%.3f' % (best_cfg, best_score))
#resultsdf=pd.DataFrame([{"RMSE":"","MAPE":"","MSLE":"","RMSLE":"","R2":""}
resultsdf.loc[index,"RMSE"]=rmse
resultsdf.loc[index,"MAPE"]=MAPE
resultsdf.loc[index,"MSLE"]=MSLE
resultsdf.loc[index,"RMSLE"]=RMSLE
resultsdf.loc[index,"R2"]=R2
f.write("Country   "+index+"   should have this order  "+str(best_cfg)+"   ,(RMSE)Loss Function Value= "+str(best_score)+"\n \n")
f.close()

talaowis · 2020-05-09T13:04:01Z

Here is a sample of the produced results :

Date	Actual	Residuals	predictions
3/12/2020	0	0	0
3/13/2020	0	-0.14083	0.140829
3/14/2020	0	-0.14083	0.140829
3/15/2020	0	-0.14842	0.148417
3/16/2020	0	-0.14885	0.148849
3/17/2020	0	-0.14887	0.148873
3/18/2020	0	-0.14887	0.148874
3/19/2020	0	-0.14887	0.148874
4/30/2020	1	-0.14887	0.148874
5/1/2020	1	0.851126	0.148874
5/2/2020	3	-0.09483	1.094834
5/3/2020	3	1.854046	1.145954
5/4/2020	3	-0.04064	3.040636
5/5/2020	3	-0.14302	3.143025
5/6/2020	8	-0.14856	3.148558
5/7/2020	8	4.851143	3.148857

talaowis · 2020-05-09T13:08:39Z

the data set I have posted as an example above is not the whole one , I have just taken a part to show you the results
note that arima order is (0,1,1) the first row I have inserted zeros because the residuals size does not match the dataset size
the other bold rows show you that the residuals results are not accurate

Date	Actual	Residuals	predictions
**3/12/2020	0	0	0**
3/13/2020	0	-0.14083	0.140829
3/14/2020	0	-0.14083	0.140829
3/15/2020	0	-0.14842	0.148417
3/16/2020	0	-0.14885	0.148849
3/17/2020	0	-0.14887	0.148873
3/18/2020	0	-0.14887	0.148874
3/19/2020	0	-0.14887	0.148874
**4/30/2020	1	-0.14887	0.148874**
5/1/2020	1	0.851126	0.148874
**5/2/2020	3	-0.09483	1.094834**
5/3/2020	3	1.854046	1.145954
5/4/2020	3	-0.04064	3.040636
5/5/2020	3	-0.14302	3.143025
**5/6/2020	8	-0.14856	3.148558**
5/7/2020	8	4.851143	3.148857

tgsmith61591 · 2020-05-09T13:26:52Z

You have differencing in your order (0, 1, 1). When you fit an ARIMA with differencing, you fit n - d samples, so you'll get n - d residuals back. That appears to be exactly what's happening, and that's perfectly expected.

talaowis · 2020-05-09T13:39:28Z

then the predicted value assumes no inverse difference ?

tgsmith61591 · 2020-05-09T14:11:48Z

When you produce in-sample predictions, the model predicts as many values as it was trained on. In this case, n - d, and the predictions start at d, in this case 1. Any residuals before that would be nan anyways, so they are not meaningful.

talaowis · 2020-05-09T22:46:11Z

ok thank you

fissehab added the ❔ : question label Aug 14, 2019

tgsmith61591 closed this as completed Aug 18, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

number of residuals from auto_arima is different from what I get from R #181

number of residuals from auto_arima is different from what I get from R #181

fissehab commented Aug 14, 2019 •

edited

tgsmith61591 commented Aug 15, 2019

fissehab commented Aug 15, 2019

fissehab commented Aug 15, 2019

aaronreidsmith commented Aug 15, 2019

fissehab commented Aug 17, 2019

tgsmith61591 commented Aug 18, 2019

talaowis commented May 9, 2020

tgsmith61591 commented May 9, 2020

talaowis commented May 9, 2020

talaowis commented May 9, 2020

talaowis commented May 9, 2020

tgsmith61591 commented May 9, 2020

talaowis commented May 9, 2020

tgsmith61591 commented May 9, 2020

talaowis commented May 9, 2020

number of residuals from auto_arima is different from what I get from R #181

number of residuals from auto_arima is different from what I get from R #181

Comments

fissehab commented Aug 14, 2019 • edited

tgsmith61591 commented Aug 15, 2019

fissehab commented Aug 15, 2019

fissehab commented Aug 15, 2019

aaronreidsmith commented Aug 15, 2019

Files

R

Python

fissehab commented Aug 17, 2019

tgsmith61591 commented Aug 18, 2019

talaowis commented May 9, 2020

tgsmith61591 commented May 9, 2020

talaowis commented May 9, 2020

talaowis commented May 9, 2020

talaowis commented May 9, 2020

tgsmith61591 commented May 9, 2020

talaowis commented May 9, 2020

tgsmith61591 commented May 9, 2020

talaowis commented May 9, 2020

fissehab commented Aug 14, 2019 •

edited