# Small Sample

##### What is the minimum number of observations required to perform a statistically significant panel analysis?

For instance, Ferron et al. (2009) showed that a Kenward-Roger correction can provide appropriate standard errors well into the single digits for a fairly simple model and Browne and Draper (2006) were able to obtain unbiased variance components with REML estimation with only 6 units at the highest level (again for a simple model). Gelman (2006) showed that Bayesian methods can produce unbiased estimates of variance components with as few as 3 units at the highest level using a carefully considered, weakly informative prior. If you are worried that 11 is not enough at the highest level, keeping the model as simple as possible is the best way to guard against possible small sample bias (beside collecting more data which is usually not a reasonable suggestion).

> Restricted (or residual, or reduced) maximum likelihood (REML) approach is a particular form of maximum likelihood estimation that does not base estimates on a maximum likelihood fit of all the information, but instead uses a likelihood function calculated from a transformed set of data, so that nuisance parameters have no effect.
In the case of variance component estimation, the original data set is replaced by a set of contrasts calculated from the data, and the likelihood function is calculated from the probability distribution of these contrasts, according to the model for the complete data set. In particular, REML is used as a method for fitting linear mixed models. In contrast to the earlier maximum likelihood estimation, REML can produce unbiased estimates of variance and covariance parameters.

[Bias Reduction of Autoregressive Estimates in Time Series Regression Model through Restricted Maximum Likelihood](https://www.jstor.org/stable/2669758?seq=1#page_scan_tab_contents)

##### What is the sufficient sample size in multilevel analysis?

 I don't know of any small sample studies that have gone beyond two-levels. In two-level models, without using any small sample correction (e.g., Kenward-Roger), with continuous outcomes, about 20 units are needed at the highest level to obtain unbiased estimates (power will be quite low though). With discrete outcomes, about 50 units are needed at the highest level with at least 5 observations per cluster. 

A relevant paper might be McNeish & Stapleton (2014, Educational Psychology Review) which is a review paper and contains several references that may be helpful for the specifics of your situation. 


##### Best method for short time-series

Compare the robustness of different methods to these simple ones, e.g., by not only assessing average accuracy out-of-sample, but also the error variance, using your favorite error measure.

The first approach is to use standard/linear time series models (AR, MA, ARMA, etc.), but to pay attention to certain parameters, as described in this post [1] by Rob Hyndman, who does not need an introduction in time series and forecasting world. The second approach, referred to by most of the related literature that I have seen, suggest using non-linear time series models, in particular, the threshold models [2], which include threshold autoregressive model (TAR), self-exiting TAR (SETAR), threshold autoregressive moving average model (TARMA), and TARMAX model, which extends TAR model to exogenous time series. Excellent overviews of the non-linear time series models, including threshold models, can be found in this paper [3] and this paper [4].

Stationarity can be a bit tricky when dealing with Bayesian time series models. One choice is to enforce constraints on parameters. Or, you could not. This is fine if you just want to look at the distribution of the parameters. However, if you want to generate the posterior predictive, then you might have a lot of forecasts that explode.

The Stan documentation provides a few examples where they put constraints on the parameters of time series models to ensure stationarity. This is possible for the relatively simple models they use, but it can be pretty much impossible in more complicated time series models. If you really wanted to enforce stationarity, you could use a Metropolis-Hastings algorithm and throw out any coefficients that are improper. However, this requires a lot of eigenvalues to be calculated, which will slow things down.

##### ANDERSON-DARLING TEST

According to the Anderson-Darling, the minimum sample size is n >5 or at least 6 elements. The test is used to determine the characteristic of the data distribution. In case where the population mean and variance is unknown, the critical value is 0.787 for 0.95 confidence interval. 



### [Fitting models to short time series](https://robjhyndman.com/hyndsight/short-time-series/)

Using least squares estimation, or some other non-regularized estimation method, it is possible to estimate a model only if you have more observations than parameters.  (If you use the LASSO, or some other regularization technique, it is possible to estimate a model with fewer observations than parameters.) However, there is no guarantee that a fitted model will be any good for forecasting, especially when the data are noisy.



The only reasonable approach is to first check that there are enough observations to estimate the model, and then to test if the model performs well out-of-sample. With short series, there is not enough data to allow some observations to be witheld for testing purposes. However, the AIC can be used as a [proxy for the one-step forecast out-of-sample MSE](https://robjhyndman.com/hyndsight/aic/). The AIC allows both the number of parameters and the amount of noise to be taken into account.


What tends to happen with short series is that the AIC suggests very simple models because anything with more than one or two parameters will produce poor forecasts due to the estimation error.  After applying the auto.arima() function from the forecast package in R to all the series from the M-competition, 32 of 144 series had models with zero parameters (random walks), 95 had models with one parameter.

Seasonal models bring their own difficulties because the seasonality usually takes up m-1 
 degrees of freedom where 
m
 is the seasonal period. Fourier terms are one way to reduce the problem — useful whenever the ratio of 
m
 to sample size is large. 
 
 Consequently, at least $p+q+P+Q+d+mD+1$ observations
are required to estimate a seasonal ARIMA model. 