### The Moving Block Bootstrap

Application of the residual based bootstrap methods is straightforward if the error distribution is specified to be an ARMA(p,q) process with known p and q

However, if the structure of serial |correlation is not tractable or is misspecified, the residual based 
methods will give inconsistent estimates

Divide the data of $n$ observations into blocks of length $l$ and select $b$ of these blocks (with repeats allowed) 


** NBB - nonoverlapping blocks bootstrap **

> Carlstein (1986) – first discussed the idea of bootstrapping blocks of observations rather 
than the individual observations.

Number of blocks: $\frac{n}{l} = b$  

High probability of missing entire blocks in the Carlstein scheme (non overlapping blocks) $ \rightarrow $ not often used

** MBB - moving blocks bootstrap **


> Künsch (1989) and Singh (1992) – independently introduced a more general BS
procedure, the moving block BS (MBB) which is applicable to stationary time series data. In this method the blocks of observations are overlapping.

Number of blocks: $n - l + 1$  

>> IDEA: MBB for short clusterized time series

#### Problems with MBB

1. The pseudo time series generated by the moving block method is not stationary, even if the original series $\{x_t\}$ is stationary


> Politis and Romano (1994):  **A stationary bootstrap method**

 Sampling blocks of random length, where the length of each block has a geometric distribution. They show that the pseudo time series generated by the stationary bootstrap method is indeed stationary

The application of stationary bootstrap is less sensitive to the choice of $p$ than the application of moving block bootstrap is to the choice of $l$

2. The mean $\bar{x}^*_n$ of the moving block bootstrap is biased in the sense that: $$E(\bar{x}^*_n | x_1, ... , x_n) \neq \bar{x}_n $$


3. The MBB estimator of the variance of $\sqrt{n} \bar{x}_n$  is also biased

> Davidson and Hall (1993): ** modification **

 Usual estimator: $\hat{\sigma}^2 = n^{-1}\sum^n_{i=1}(x_i - \bar{x}_n )^2$

 Modification:  $\tilde{\sigma}^2 = n^{-1}\sum^n_{i=1}\left((x_i - \bar{x}_n )^2  + \sum^{i-1}_{k=1} \sum^{n-k}_{i=1} (x_i - \bar{x}_n ) (x_{i+k} - \bar{x}_n ) \right)$
 
 With this modification the bootstrap can improve substantially on the normal approximation

#### Optimal Length of Blocks

Interested in minimizing the MSE of the block bootstrap estimate
of the variance of a general statistic

Carlstein’s rules for non-overlapping blocks: 

- As the block size increases: variance $\uparrow$,  bias  $\downarrow$

- As the dependency among the $x_i$ gets stronger  a longer block size is needed

- Optimal block size for AR(1) model $x_t = \phi x_{t-1} + e_t $  is $l^* = \left( \dfrac{2\phi}{1-\phi^2}  \right)^{2/3} n^{2/3}$ 

- Carlstein optimal block size:  $ l = n^{1/3} \rho^{-2/3}$

- Künsch optimal block size: $ l = (3/2 * n)^{1/3} \rho^{-2/3} $, where the covariance of $x_t$ at lag $j$:
 
$$ \rho  = \dfrac{\gamma(0) + 2 \sum^{\infty}_{j=1} \gamma(j) }{ \sum^{\infty}_{j=1} j \gamma(j)}$$ 

- Hall and Horowitz’s rules for AR(1): $ \rho = (1-\phi^2)/\phi$


### [Resampling Methods for Time Series](http://www-stat.wharton.upenn.edu/~stine/stat910/lectures/13_bootstrap.pdf)

####  Subsampling

Subsampling relies on the same ideas that we used in proving the CLT for a stationary process: arrange the data into blocks, and rely on the blocks becoming less dependent as they get farther apart. The key assumption is that the distribution of the statistic has the form 
$$\sqrt{ n}( \hat{θ} − θ) ∼ G$$
Procedure 
1. Arrange the time series $X_1,  . . . , X_n$ into $N$ overlapping blocks, each of length $b$, with overlap
2. Treat each of the $N$ blocks as if it were the time series of interest, computing the relevant statistic, obtaining $\tilde{θ}_1,  . . . , \tilde{θ}_N$.  
3. Estimate the sampling properties of the estimator computed from the original time series from the rescaled empirical distribution of the $\tilde{\theta}_n$, 
$$\tilde{F}_b(x) = 1/N \sum_i H_{\sqrt{ b}( \hat{θ}_i − \hat{θ})} (x)$$

For iid samples, the theory resembles that used when studying leave-out-several versions of the jackknife and cross-validation. 
Subsampling provably works in many applications with the type of “weak” dependence associated with ARMA processes. 