## [Autoregressive Prediction with Rolling Mechanism for Time Series Forecasting with Small Sample Size](https://www.hindawi.com/journals/mpe/2014/572173/)


Reasonable prediction makes significant practical sense to stochastic and unstable time series analysis with small or limited
sample size. Motivated by the rolling idea in grey theory and the practical relevance of very short-term forecasting or 1-step-ahead
prediction, a novel autoregressive (AR) prediction approach with rolling mechanism is proposed. In the modeling procedure, a new
developed AR equation, which can be used to model nonstationary time series, is constructed in each prediction step. Meanwhile,
the data window, for the next step ahead forecasting, rolls on by adding the most recent derived prediction result while deleting
the first value of the former used sample data set. This rolling mechanism is an efficient technique for its advantages of improved
forecasting accuracy, applicability in the case of limited and unstable data situations, and requirement of little computational effort.
The general performance, influence of sample size, nonlinearity dynamic mechanism, and significance of the observed trends, as
well as innovation variance, are illustrated and verified with Monte Carlo simulations. The proposed methodology is then applied
to several practical data sets, including multiple building settlement sequences and two economic series.

### Intro

Most analyses are
based on the assumption that the probabilistic properties
of the underlying system are time-invariant; that is, the
focused process is steady. Although this assumption is very
useful to construct simple models, it does not seem to be
the best strategy in practice. The reason is that systems
with time-varying probabilistic properties are common in
practical engineering.

Although we can construct regression
model with a few data points, accurate prediction cannot
be achieved from the simplicity of linear model.
Therefore,
linear methodology is sometimes inadequate for situations
where the relationships between the samples are not linear with time, and then artificial intelligence techniques, such as
expert system and neural network, have been developed.

Meanwhile, [grey theory](http://help.prognoz.com/en/mergedProjects/Lib/02_time_series_analysis/grey.htm) constructs a grey differential equation to predict with as few as four data points by accumulated generating operation
technique. Though the grey prediction model has been
successfully applied in various fields and has demonstrated
satisfactory results, its prediction performance still could be
improved, because the grey forecasting model is constructed
of exponential function. Consequently, it may derive worse
prediction precise when more random data sets exist.

__The K-nearest neighbor approach__ is conceptually simple to pattern recognition problems, where an
unknown pattern is classified according to the majority of the
class memberships of its K-nearest neighbors in the training
set. Moreover, local prediction, proposed by Farmer
and Sidorowich [[Predicting chaotic time series](https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.59.845)], derives forecasting based on a suitable
statistic of the next values assigned L previous samples. 


We place ourselves in a parametric probabilistic forecast-
ing framework under small sample size, for which simple
linear models are recommended, such as AR model and grey
prediction model, because these simple linear models are
frequently found to produce smaller prediction errors than
techniques with complicated model forms due to their parsimonious form.


However, two issues should be noticed.
- On one hand, linear approaches may output unsatisfactory forecasting accuracy when the focused system illustrates a nonlinear trend.  Mass work on model structural change has been conducted. To settle this problem, similar with the basic idea of K-nearest neighbor and local prediction approaches, many scholars have recommended using only recent data to increase future forecasting accuracy if chaotic data exist. Based on this point of view, grey model GM(1,1) rolling model, called rolling check, was proposed by Wen. The same technique called grey prediction with rolling mechanism (GPRM) can only be utilized in one-step prediction.
- On the other hand, an AR model can only be established for time series that satisfies the stationarity condition; that is, a stationary solution to the corresponding AR characteristic equation exists if and only if all roots exceed unity in absolute value (modulus). Consequently, AR models cannot be established for modeling nonstationary time series.

Motivated by the GPRM approach and the practical
relevance of very short-term forecasting or 1-step-ahead
prediction elucidated above, the first objective of this study
is to construct a novel prediction model with the rolling
mechanism to improve the forecasting precision. Therefore,
the sample data set and model parameters are evolved in
each prediction step. 

The second objective of this study is to
develop an autoregressive model that can be used to model
nonstationary time series. Consequently, this autoregression
is different from the AR model in the time series analysis
literature. We also call it autoregression, because the current
value of the series is also a linear combination of several most
recent past values of itself plus an “innovation” term that
incorporates everything new in the series that is not explained
by the past values, but it can be used to model nonstationary
time series.

### AR Model Introduction

AR(p) model: $$\Phi(L) x_t = \phi_0 + e_t, e_t \sim NID[0,\sigma_e^2]$$

where $\Phi(L) = 1− \phi_1 L−\dots−\phi_p L^p$, $\phi_0$ is a constant relating to series mean. 

It is well known in the literature that a stationarity con-
dition has to be satisfied for the AR(p) process; that is, subject
to the restriction that $e_t$ is independent of $x_{t−1}, \dots $  and
that $\sigma_e^2 > 0$, a stationary solution exists if and only if the
root of the AR characteristic equation exceeds 1 in absolute
value (modulus). 

According to least-squares method, model parameters
can be calculated by

$$ \phi_0 = x_0 − \sum \phi_i \bar{x}_i \\ (\phi_1,\dots,\phi_p)^T = (L_{\eta\eta})^{-1}_{p\times p} (L_\eta)_{p\times 1}$$

where $\bar{x}_i = \frac{1}{n-p} \sum x_{t-p}, L_{\eta\eta} =(S_{ij})_{p\times p},  L_\eta = (S_1, \dots, S_p)^T$

$$S_{ij} = \sum(x_{t-i} - \bar{x}_i )(x_{t-j} - \bar{x}_j) \ i,j=\overline{1,p} \\  S_{i} = \sum(x_{t} - \bar{x}_0 )(x_{t-i} - \bar{x}_i) i=\overline{1,p}  $$


Based on the estimated coefficients $\hat{\phi}_0, \dots, \hat{\phi}_p$ the AR(p) prediction equation can be determined as:

$$\hat{x}_{n+l|n} = \hat{\phi}_0  + \sum \hat{\phi}_i x_{n+l-i|n}$$

### ARPRM Model Construction (Autoregressive Prediction with Rolling Mechanism for Time)

Based on initial observational sample, the 1-step-ahead ARPRM($p_1$)
model can be established as
$$x_t = \eta_{10} + \sum \eta_{1i} x_{t-i} + e_{1t}$$
2-step-ahead ARPRM($p_2$) model can be found as
$$x_t = \eta_{20} + \eta_{21} \hat{x}_{n+1} + \sum \eta_{2i} x_{t-i} + e_{2t} $$
Analogically, considering the $l$-step-ahead prediction,
one can first form a new sample with general notations
$x_l^*,\dots,x^*_{n+l-1}$ according to the rolling mechanism mentioned above, where
$$x_k^* = \begin{cases} x_k \ \ \ k\leq n \\
                     \hat{x}_{k|n} \ k > n
       \end{cases}$$
It can be seen that $x_k^∗$ is an original observation data if $k \leq n$, while it will be a prediction result of a previous step when
 $k > n$.  

While aforementioned procedure is considered to add
one forecasting result and delete one sample value in each
prediction step, the adding and deleting number can be
unequal; that is, one can add one and delete more sample
values at each prediction step without departing from the
spirit of the proposed method. Facts indicate that redefinition
for each step can modify the ARPRM model coefficients
in each prediction step according to the metabolic sample,
and the prediction accuracy can consequently be effectively
improved.



##### Parameter Estimation. 

First, model order $p_l$ for the l-step-ahead ARPRM ($p_l$)  can be determined by AIC rule. When $p_l$ increases from one,
the calculated result should enable the flowing formula to achieve its minimum:
$$AIC(p_l) = \ln{\hat{\sigma}^2_{le}} + \frac{2p_l}{n+l-1}$$
Then, according to least-squares method, the autoregressive coefficient $\eta_{l0}$ can be calculated by
$$\eta_{l0} = \bar{x}_{l0} -\sum \eta_{li} \bar{x}_{li}$$
The autoregressive coefficients $\eta_{li}$, can be derived by
$$(\eta_{l1}, \dots, \eta_{lp_l})^T = (L_{l\eta\eta})^{-1}_{p_l\times p_l} (L_{l\eta})_{p_l\times 1}$$
where $ L_{l\eta\eta} =(S_{ij})_{p_l\times p_l},  L_{l\eta} = (S_{l1}, \dots, S_{lp_l})^T$
$$S_{lij} = \sum(x^*_{t-i} - \bar{x}_{li} )(x^*_{t-j} - \bar{x}_{lj}) \ i,j=\overline{1,p_l} \\  S_{li} = \sum(x^*_{t} - \bar{x}_{l0} )(x^*_{t-i} - \bar{x}_{li}) i=\overline{1,p_l}  $$
In addition: 
$$\hat{σ}^2_{le} = \frac{1}{n+l-p_l -1 } \sum (x_t^* - \eta_{l0} - \sum\eta_{li} \bar{x}_{li})^2$$
Then, the l-step-ahead prediction value:
$$\hat{x}_{n+l|n} = \eta_{l0}  + \sum \eta_{li}  x^*_{n+l−i|n}$$
And its mean square error:
$$\hat{σ}^2_{n+l|n} = E(x_{n+l} − \hat{x}_{n+l|n} )^2 = \sum α_{li}^2 σ^2_{(l+1−i)e}$$
where $η_{li} = 0$ when $i > p_l$ and $α_{ki} = \sum \eta_{kj}α_{(k-j)(i-j)}, α_{k1}=1$

__Best Unbiased Nature: The prediction result obtained by
the proposed ARPRM method shows best unbiased property,
which can be deduced from the least-square error forecasting
method.__

### Simulation Study

Results show that the trends do affect the performance of the proposed method, and we further found that the trend nonlinearity plays a much
more important role in forecasting performance than trend
significance. 

Results: 
- The percent relative error is upward as forecasting step l increasing from 1 to 5 for one experiment model under the same sample size. 
- Forecasting performance decreases when sample size becomes smaller. The change is distinct, because this is a small sample problem where __10 or less data is used to conduct 5-step-ahead predictions__. 
- The performance of the ARPRM method decreases with the increasing of variance of the innovation. This is also consistent with our general understanding. The reason is that larger variance means greater uncertainty, which will lead to lower forecasting performance. There is an issue worth noticing.


### Empirical Applications

Before AR model construction, the sample size of each data set is increased from 18 to 35 through __linear interpolation__. And this is so called __AR model with linear interpolation.__

To demonstrate the effectiveness and reasonability, the
prediction results are analyzed by the index of average relative
percentage error 
$$ARPE = (1/10) \sum_{k=1}^{10} |x_t^{(k)} − \hat{x}^{(k)}_t| 100/x_t^{(k)}$$
where $\hat{x}_t^{(k)}$ is the prediction result corresponding to observation point $k = 1, 2, . . . , 10$. 

Results:
- The most precise forecast is given by ARPRM, the following one is obtained by AR model with linear interpolation, and a considerably unreasonable accuracy is got by the GM(1,1) model and GM(1,1) rolling model. 
- The comparative results for economic data shows that ARPRM derives the best prediction accuracy, and GM(1,1) model and GM(1,1) rolling model also give good forecast results for the Chinese annual power load forecast. Although AR model with linear interpolation provides the worst prediction, its accuracy is still acceptable. 
- Prediction accuracy of ARPRM and AR model with linear interpolation method is relatively stable; the prediction accuracy of GM(1,1) model and GM(1,1) rolling model cannot be satisfied when nonexponential data exist
